Skip navigation

There are some useful ways big data can serve us, but as we know quite well, there are some issues behind the function of it. One of the fundamental issues relates to the protection of personal information. We often say that we are well aware of the consequences of sharing contents online, since all data are supposedly “stored” somewhere permanently- but are we really aware of what happens to all those bits of data? Sometimes we delete our browsing history, and we tend to be even more cautious when using a public computer because of our fear of hacking. However, we are still ignorant of the fact that the bits of data now can be tracked backwards to “re-identify” individuals. I found an interesting article by Nate Anderson, titled “Anonymized data really isn’t—and here’s why not.” (Link: http://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/) The article covers a few cases where the anonymized data can be used to track back and re-identify the supposedly “anonymous” individuals. It is not only “creepy” that technology enables us to do such things, but it is also a huge threat to privacy. Today, the term “personal data” is not even worth distinguishing, because almost all information can be “personal” when combined with enough other relevant bits of data. Apparently, ZIP code, birthdate, and sex are the three bits of information that can uniquely identify about 87% of Americans. The tragedy is that there will be no way to “guarantee maximal usefulness and maximal privacy at the same time,” because sharing data indiscriminately and protecting privacy cannot be achieved simultaneously; one has to be compromised to achieve the other. Currently, it seems to be that people are more thrilled (than concerned) about growing storage capacity and the usefulness and convenience of big data. This issue will become more and more significant in the future, and it certainly seems like it cannot be resolved easily.