Troy Hunt has released information about yet another data leak affecting more than 8 million GitHub profiles. The security researcher, however, made sure to highlight that the leak isn’t a result of a vulnerability found on GitHub:
(…)This incident is not about any sort of security vulnerability on GitHub’s behalf, rather it relates to a trove of data from their site which was inappropriately scraped and then inadvertently exposed due to a vulnerability in another service. My data. Probably your data if you’re in the software industry. Millions of people’s data.
More particularly, technology recruitment site GeekedIn has scraped 8 million GitHub profiles and has left the data exposed in an unsecured MongoDB database. The backup of the database has been downloaded by one third party but others may have downloaded it as well. Worse is that researchers suspect that the data is up for sale online.
This is not the first security-related incident involving MongoDB. For example, the reason for a data breach that occurred earlier this year – the one of dating site BeautifulPeople(.)com – was indeed a vulnerable MongoDB database. The incident exposed over a million users of the service.
As for the GitHub incident, Troy Hunt received the backup file which also had his own information. He immediately notified GitHub of what has happened. As always, Troy Hunt has provided us with a very detailed analysis of what exactly happened. If you’re on GitHub, make sure to read it.
What’s inside the backup file?
8.2 million unique email addresses, which belong to users of GitHub, Bitbucket (web-based hosting service for projects), and highly likely other services as well.
The records have users names, usernames, email addresses, geo location, list of professional skills, years of professional experience.
What did GeekedIn do?
The company scraped the information and created its own database. The scandalous thing is that GeekedIn is offering this information to companies looking for developers. This “service”, however, is not for free, which directly violates GitHub’s permission for data scraping.
GitHub permits other companies to scrape their users’ data only if it is done for the same purpose for which the users gave their information to GitHub. This is what the company told Troy Hunt:
Using scraped information for a commercial purpose violates our privacy statement and we do not condone this kind of use.
It took some time and lots of efforts for Hunt to establish contact with GeekedIn. Eventually, they promised to secure the data.