A new massive data breach was recently discovered by security researchers. Apparently, on October 16, 2019 Data Viper security researchers Bob Diachenko and Vinny Troia discovered a wide-open Elasticsearch server which contained “an unprecedented 4 billion user accounts spanning more than 4 terabytes of data.”
This server was accessible without the need of authentication, which exposed the data of more than 1.2 billion unique individuals. This makes the data breach one of the largest data leaks from a single source so far.
Information of 1.2 Billion Individuals Exposed
The Elastic search server that contained the exposed information was unprotected and accessible through a web browser at https://35.199.58.125:9200, the researchers said, adding that no password or any form of authentication was in place to protect the data.
It should be noted that Elasticsearch stores its information in an index, which is similar to a database. Most of the data spanned 4 separate data indexes, labeled “PDL” and “OXY”, with information on approximately 1 billion people per index. Each user record had one of these two labels as a source field.
Based on Data Viper’s analysis of the data, it is now safe to assume that the data in the PDL indexes came from People Data Labs, which is a data aggregator and enrichment company.
The researchers de-duplicated the nearly 3 billion People Data Labs (PDL) user records and discovered 1.2 billion unique people and 650 million unique email addresses. These numbers correspond to the statistics provided on the People Data Labs website.
“The data within the three different PDL indexes also varied slightly, some focusing on scraped LinkedIN information, email addresses and phone numbers, while other indexes provided information on individual social media profiles such as a person’s Facebook, Twitter, and Github URLs,” the report said.
It is noteworthy that the PDL application can be used to search more than 1.5 billion unique people as well as personal addresses, email addresses, LinkedIn URLs, and phone numbers.
The data discovered on the open Elasticsearch server was almost a complete match to the data being returned by the People Data Labs API. The only difference being the data returned by the PDL also contained education histories. There was no education information in any of the data downloaded from the server. Everything else was exactly the same, including accounts with multiple email addresses and multiple phone numbers.
As for the data labeled with “OXY”, an analysis led the researchers to OxyData.io, which is also a data enrichment company. “OxyData’s website claims to have 4TB of user data (exactly the amount discovered), but only 380 million people profiles,” the report said. Most of this data was scraped from LinkedIn, including recruiter details. The researchers contacted OxyData and found out that the server didn’t even belong to them.
The company refused giving access to the API so that the researchers could test and compare profiles. However, the company sent a copy of the researcher’s own record, which also confirmed that it was scraped from LinkedIn and matched with the individual.
“Due to the sheer amount of personal information included, combined with the complexities of identifying the data owner, this has the potential to raise questions on the effectiveness of our current privacy and breach notification laws,” Data Viper concluded.