“If you torture the data long enough, it will confess.”
– Ronald Coase, Economist
Big data. Data collection. Data mining. Data aggregation. Data technology. Data privacy. Data breach. What do all those big data terms mean and how are they related – to each other, and to us? Why should we care about their meaning? This article is an attempt to explain anything (we could think of) associated with you, the user, data and web. An attempt, because when it comes to big data no explanation is big enough.
First Thing’s First. What Is Big Data?
Big data is a relatively new term for something that has always been around. The term illustrates the exponential growth and availability of data – structured and unstructured. Some experts even say that big data is as important to modern businesses as the Internet itself. They are not wrong.
In 2001, industry analyst Doug Laney outlined a very coherent definition of big data, labeled the three Vs of big data: volume, velocity and variety.
- Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected.
- Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
- Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
Does all of this seem abstract to you? Like you can’t relate to the subject at all? Think again. Because you are part of the process, at least your digital presence is. Because big (online) data is being generated by everything… and everyone connected via the Web. As a result, big data is arriving from various sources, and deriving relevant value from it requires optimal processing power and proper analytics capabilities. Data is the new most valuable exchange unit, and is perhaps more valuable than money. Business-wise, data is the new currency, and everybody wants some, or all (Google, Microsoft?) of it.
This is how we get to data mining and data aggregation. Once you have collected all the data, what shall you do with it?
The Difference between Data Collection, Data Mining and Data Aggregation
What’s Data Collection?
Data collection is precisely what it states it is – the accumulation of information, typically via software (data collection tools). There are many different types of data collection techniques. If you follow SensorsTechForum regularly, you might have read a thing or two about the shady practices of online data collection, employed by third parties. Data collection can relate to different approaches and results, and depending on the field you’re looking into, you will get a different definition of the term.
However, being an online user, you should definitely be interested in all the ways online services acquire your personally identifiable information. Your PII is what makes you valuable. The more you, freely and willingly, share about yourself, the easier it is for businesses to “get” to you.
Here is a list of basic and mandatory data collection techniques, without which your favorite services would not be able to exist:
- Cookies
- Active Web Contents
- JavaScript
- Fingerprinting of Browser (HTTP) Header
- Browser Cache
- Webbugs
- IP Address
- MAC Address
Now, a more interactive display:
What Is Data Mining?
Data mining, on the other hand, requires a piece of software and a computational process that helps you discover patterns in extensive data settings. Data mining is as crucial to modern marketing and business development as are the investments. Many businesses invest in data mining – to increase their profit and product positioning through sales forecasting. This is how you get to grasp the behavior (and preferences) of your customers, and improve your future approaches.
Data mining involves the employment of artificial intelligence, machine learning, statistics, predictive analysis, and database systems. Thanks to data mining, you can find important patterns, and this knowledge, as mentioned above, can help you draw conclusions. Data will not mean anything to your business if you can’t derive value from it.
What about Data Aggregation?
Data aggregation is the case of summarizing gathered data mainly for analytical purposes. Why would you want to aggregate data? To get more insight about specific groups of people (like your customers – current and potential) and be able to group them by age, profession, income, etc. Why is this process valuable for businesses? To improve personalization, and make your customers happy with the service you’re offering.
If you pay close or any attention to privacy policies, you know exactly what we mean.
You’re a Google user, aren’t you? Are you acquainted with Google’s privacy policy?
This is an excerpt, click on the accordion to read it:
The Consequences of Big Data: Data Breaches
Where does the average PC user stand in all this big data mess? What happens to all this data when a major online service gets hacked?
The more you share about yourself, you automatically share knowledge about the people you know – your friends, and the friends of their friends… All this voluntary data sharing may just stab you in the back!
A highly personalized malicious campaign was started recently, aimed at LinkedIn users in Europe. The campaign’s payload was banking malware. Specific people received tailored malicious emails in different languages. The users’ credentials that were offered for sale on the black market after the mega LinkedIn breach from 2012 have apparently been put to use by cyber criminals. Perhaps this is just the beginning of a series of post-breach exploits.
Accounts can be leaked in other ways, too. Another fresh example concerns 32 million unique Twitter accounts. A hacker going by the name Tessa88, who apparently is involved with the recent mega breaches of LinkedIn, Tumblr, Myspace, is claiming to have obtained a Twitter database consisting of millions of accounts.
The database has email addresses (in some cases two per user), usernames, and plain-text passwords. Tessa88 is selling it for 10 Bitcoins, or approximately $5,820. LeakedSource believes that the leak of accounts is not because of a data breach but due to malware. Tens of millions of people have become infected by malware, and the malware sent home every saved username and password from browsers like Chrome and Firefox from all websites, including Twitter.
However, not only individuals’ personal information is susceptible to exploits. Nations are, too!
Rapid7, a security firm, has just released a vast report (“National Exposure Index: Inferring Internet Security Posture by Country through Port Scanning”) focused on the nations mostly exposed to risks of Internet-based attacks. Researchers found that wealthier and more developed countries are more endangered, mainly because of the high number of unsecured systems connected to the Internet. Read more about the national exposure research.
How Can We Safeguard Our Data?
The Business Approach: Data Loss Prevention Software (DLP)
Via the adoption of data loss prevention software which is designed to detect and prevent potential data breaches.
DLP software products rely on business rules to classify and safeguard confidential information so that unauthorized parties cannot share data to compromise the organization. If an employee tried to forward a business email outside the corporate domain or upload a corporate file to a consumer cloud storage service like Dropbox, the employee would be denied permission, as explained by TechTarget.
The User Approach: Tips for Online Privacy
- 1. Do not reveal personal information recklessly, to unknown, unidentified parties.
- 2. Turn on cookie notices in your Web browser, or use cookie management software.
- 3. Keep a clean e-mail address, employ anti-spam techniques. You may not want to use the same e-mail address for all of your online accounts, desktop and mobile.
- 4. Avoid sending personal e-mails to mailing lists. Separate your work computer from your personal one. Don’t keep sensitive information on your work machine.
- 5. Be a smart online surfer and don’t click on random links. And avoid suspicious content!
- 6. Do not, under any circumstances, reply to spammers.
- 7. Pay close attention to privacy policy, even to the most legitimate of services. Realize that everybody wants your personal information!
- 8. Remember that it’s up to you to decide what details you share about yourself. If a service or app seems too demanding, just don’t use it. There’s a better alternative, for sure.
- 9. Don’t underestimate the importance of encryption!
What Is Data Encryption?
As explained by Heimdal’s Andra Zaharia, encryption is a process that transforms accessible data or information into an unintelligible code that cannot be read or understood by normal means. The encryption process uses a key and an algorithm to turn the accessible data into an encoded piece of information. The cyber security author has also provided a list of 9 free encryption tools to consider.
References
https://www.sas.com/en_ph/insights/big-data/what-is-big-data.html
https://www.import.io/post/data-mining-vs-data-collection/
https://searchsqlserver.techtarget.com/definition/data-aggregation
https://www.eff.org/wp/effs-top-12-ways-protect-your-online-privacy