Home > Cyber News > The Big Data Obsession: Share, Collect, Breach, Repeat
CYBER NEWS

The Big Data Obsession: Share, Collect, Breach, Repeat

DATA COLLECTION

“If you torture the data long enough, it will confess.”
– Ronald Coase, Economist

Big data. Data collection. Data mining. Data aggregation. Data technology. Data privacy. Data breach. What do all those big data terms mean and how are they related – to each other, and to us? Why should we care about their meaning? This article is an attempt to explain anything (we could think of) associated with you, the user, data and web. An attempt, because when it comes to big data no explanation is big enough.

First Thing’s First. What Is Big Data?

Big data is a relatively new term for something that has always been around. The term illustrates the exponential growth and availability of data – structured and unstructured. Some experts even say that big data is as important to modern businesses as the Internet itself. They are not wrong.

In 2001, industry analyst Doug Laney outlined a very coherent definition of big data, labeled the three Vs of big data: volume, velocity and variety.

  • Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected.
  • Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
  • Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.

Does all of this seem abstract to you? Like you can’t relate to the subject at all? Think again. Because you are part of the process, at least your digital presence is. Because big (online) data is being generated by everything… and everyone connected via the Web. As a result, big data is arriving from various sources, and deriving relevant value from it requires optimal processing power and proper analytics capabilities. Data is the new most valuable exchange unit, and is perhaps more valuable than money. Business-wise, data is the new currency, and everybody wants some, or all (Google, Microsoft?) of it.

This is how we get to data mining and data aggregation. Once you have collected all the data, what shall you do with it?

The Difference between Data Collection, Data Mining and Data Aggregation

What’s Data Collection?

Data collection is precisely what it states it is – the accumulation of information, typically via software (data collection tools). There are many different types of data collection techniques. If you follow SensorsTechForum regularly, you might have read a thing or two about the shady practices of online data collection, employed by third parties. Data collection can relate to different approaches and results, and depending on the field you’re looking into, you will get a different definition of the term.

However, being an online user, you should definitely be interested in all the ways online services acquire your personally identifiable information. Your PII is what makes you valuable. The more you, freely and willingly, share about yourself, the easier it is for businesses to “get” to you.

Here is a list of basic and mandatory data collection techniques, without which your favorite services would not be able to exist:

  • Cookies
  • Active Web Contents
  • JavaScript
  • Fingerprinting of Browser (HTTP) Header
  • Browser Cache
  • Webbugs
  • IP Address
  • MAC Address

Now, a more interactive display:
data-collection-infographics-stforum-22

What Is Data Mining?

Data mining, on the other hand, requires a piece of software and a computational process that helps you discover patterns in extensive data settings. Data mining is as crucial to modern marketing and business development as are the investments. Many businesses invest in data mining – to increase their profit and product positioning through sales forecasting. This is how you get to grasp the behavior (and preferences) of your customers, and improve your future approaches.

Data mining involves the employment of artificial intelligence, machine learning, statistics, predictive analysis, and database systems. Thanks to data mining, you can find important patterns, and this knowledge, as mentioned above, can help you draw conclusions. Data will not mean anything to your business if you can’t derive value from it.

What about Data Aggregation?

Data aggregation is the case of summarizing gathered data mainly for analytical purposes. Why would you want to aggregate data? To get more insight about specific groups of people (like your customers – current and potential) and be able to group them by age, profession, income, etc. Why is this process valuable for businesses? To improve personalization, and make your customers happy with the service you’re offering.

If you pay close or any attention to privacy policies, you know exactly what we mean.

You’re a Google user, aren’t you? Are you acquainted with Google’s privacy policy?

This is an excerpt, click on the accordion to read it:

Google's Privacy Policy
Information we collect
We collect information to provide better services to all of our users – from figuring out basic stuff like which language you speak, to more complex things like which ads you’ll find most useful, the people who matter most to you online, or which YouTube videos you might like.
We collect information in the following ways:
Information you give us. For example, many of our services require you to sign up for a Google Account. When you do, we’ll ask for personal information, like your name, email address, telephone number or credit card to store with your account. If you want to take full advantage of the sharing features we offer, we might also ask you to create a publicly visible Google Profile, which may include your name and photo.
Information we get from your use of our services. We collect information about the services that you use and how you use them, like when you watch a video on YouTube, visit a website that uses our advertising services, or view and interact with our ads and content. This information includes:
Device information
We collect device-specific information (such as your hardware model, operating system version, unique device identifiers, and mobile network information including phone number). Google may associate your device identifiers or phone number with your Google Account.
Log information
When you use our services or view content provided by Google, we automatically collect and store certain information in server logs. This includes:
details of how you used our service, such as your search queries.
telephony log information like your phone number, calling-party number, forwarding numbers, time and date of calls, duration of calls, SMS routing information and types of calls.
Internet protocol address.
device event information such as crashes, system activity, hardware settings, browser type, browser language, the date and time of your request and referral URL.
cookies that may uniquely identify your browser or your Google Account.
Location information
When you use Google services, we may collect and process information about your actual location. We use various technologies to determine location, including IP address, GPS, and other sensors that may, for example, provide Google with information on nearby devices, Wi-Fi access points and cell towers.
Unique application numbers
Certain services include a unique application number. This number and information about your installation (for example, the operating system type and application version number) may be sent to Google when you install or uninstall that service or when that service periodically contacts our servers, such as for automatic updates.
Local storage
We may collect and store information (including personal information) locally on your device using mechanisms such as browser web storage (including HTML 5) and application data caches.
Cookies and similar technologies
We and our partners use various technologies to collect and store information when you visit a Google service, and this may include using cookies or similar technologies to identify your browser or device. We also use these technologies to collect and store information when you interact with services we offer to our partners, such as advertising services or Google features that may appear on other sites. Our Google Analytics product helps businesses and site owners analyze the traffic to their websites and apps. When used in conjunction with our advertising services, such as those using the DoubleClick cookie, Google Analytics information is linked, by the Google Analytics customer or by Google, using Google technology, with information about visits to multiple sites.
Information we collect when you are signed in to Google, in addition to information we obtain about you from partners, may be associated with your Google Account. When information is associated with your Google Account, we treat it as personal information. For more information about how you can access, manage or delete information that is associated with your Google Account, visit the Transparency and choice section of this policy.

The Consequences of Big Data: Data Breaches

Where does the average PC user stand in all this big data mess? What happens to all this data when a major online service gets hacked?

The more you share about yourself, you automatically share knowledge about the people you know – your friends, and the friends of their friends… All this voluntary data sharing may just stab you in the back!

A highly personalized malicious campaign was started recently, aimed at LinkedIn users in Europe. The campaign’s payload was banking malware. Specific people received tailored malicious emails in different languages. The users’ credentials that were offered for sale on the black market after the mega LinkedIn breach from 2012 have apparently been put to use by cyber criminals. Perhaps this is just the beginning of a series of post-breach exploits.

Accounts can be leaked in other ways, too. Another fresh example concerns 32 million unique Twitter accounts. A hacker going by the name Tessa88, who apparently is involved with the recent mega breaches of LinkedIn, Tumblr, Myspace, is claiming to have obtained a Twitter database consisting of millions of accounts.

The database has email addresses (in some cases two per user), usernames, and plain-text passwords. Tessa88 is selling it for 10 Bitcoins, or approximately $5,820. LeakedSource believes that the leak of accounts is not because of a data breach but due to malware. Tens of millions of people have become infected by malware, and the malware sent home every saved username and password from browsers like Chrome and Firefox from all websites, including Twitter.

However, not only individuals’ personal information is susceptible to exploits. Nations are, too!

Rapid7, a security firm, has just released a vast report (“National Exposure Index: Inferring Internet Security Posture by Country through Port Scanning”) focused on the nations mostly exposed to risks of Internet-based attacks. Researchers found that wealthier and more developed countries are more endangered, mainly because of the high number of unsecured systems connected to the Internet. Read more about the national exposure research.

How Can We Safeguard Our Data?

The Business Approach: Data Loss Prevention Software (DLP)

Via the adoption of data loss prevention software which is designed to detect and prevent potential data breaches.

DLP software products rely on business rules to classify and safeguard confidential information so that unauthorized parties cannot share data to compromise the organization. If an employee tried to forward a business email outside the corporate domain or upload a corporate file to a consumer cloud storage service like Dropbox, the employee would be denied permission, as explained by TechTarget.

The User Approach: Tips for Online Privacy

  • 1. Do not reveal personal information recklessly, to unknown, unidentified parties.
  • 2. Turn on cookie notices in your Web browser, or use cookie management software.
  • 3. Keep a clean e-mail address, employ anti-spam techniques. You may not want to use the same e-mail address for all of your online accounts, desktop and mobile.
  • 4. Avoid sending personal e-mails to mailing lists. Separate your work computer from your personal one. Don’t keep sensitive information on your work machine.
  • 5. Be a smart online surfer and don’t click on random links. And avoid suspicious content!
  • 6. Do not, under any circumstances, reply to spammers.
  • 7. Pay close attention to privacy policy, even to the most legitimate of services. Realize that everybody wants your personal information!
  • 8. Remember that it’s up to you to decide what details you share about yourself. If a service or app seems too demanding, just don’t use it. There’s a better alternative, for sure.
  • 9. Don’t underestimate the importance of encryption!

What Is Data Encryption?

As explained by Heimdal’s Andra Zaharia, encryption is a process that transforms accessible data or information into an unintelligible code that cannot be read or understood by normal means. The encryption process uses a key and an algorithm to turn the accessible data into an encoded piece of information. The cyber security author has also provided a list of 9 free encryption tools to consider.

References


https://www.sas.com/en_ph/insights/big-data/what-is-big-data.html
https://www.import.io/post/data-mining-vs-data-collection/
https://searchsqlserver.techtarget.com/definition/data-aggregation
https://www.eff.org/wp/effs-top-12-ways-protect-your-online-privacy

Milena Dimitrova

An inspired writer and content manager who has been with SensorsTechForum since the project started. A professional with 10+ years of experience in creating engaging content. Focused on user privacy and malware development, she strongly believes in a world where cybersecurity plays a central role. If common sense makes no sense, she will be there to take notes. Those notes may later turn into articles! Follow Milena @Milenyim

More Posts

Follow Me:
Twitter

Leave a Comment

Your email address will not be published. Required fields are marked *

This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.
I Agree