Skip navigation

Monthly Archives: April 2013

Over the course of the semester, it has been brought up a few times that our online identity is dissimilar to our real life person.  So when we talk about how invasive big data is; is it really invasive if our online persona isn’t really who we are?  Or is it the other way around.  Is our RL person a fake persona, and our online person who we truly are that we don’t let anyone see?  Is this why we are so afraid of the storage of Big Data; the Internet used to be a place to hide, where our actions had no repercussions; now with the expansion of Big Data (which is presented as something both negative and positive depending on who you ask), this is not the case.  What sites we visit, what we purchase, what movies we watch on Netflix are all stored now, in the form of Big Data.  I have mixed feelings about Big Data, on the one hand it is a little creepy on Facebook when on the side they advertise these companies that are so tailored to my likes; but, on the other hand it is helpful.  That’s how I discovered one of my favorite clothing stores.  However, I still do feel unnerved, that there’s an online file of everything I’ve done online; from my days of posting on Neopets forums to spending late nights shopping online.

Gitelman and Jackson’s pieces about the idea of “raw data” struck a significant chord with me this week. I have a variety of thoughts about how the study of “big data” – it seems to reveal as much about out society’s priorities and beliefs as it does about the specific topics it pertains to.

On one level, data, especially en masse can reveal much about specific topics. As Gitelman and Jackson specify, of course, this data is always “cooked.” Big data removes the possibility of drawing conclusions without data manipulation. Of course, even small amounts of data must be manipulated and analyzed (“cooked”) in order to be of any use, but never is this more relevant than with massive amounts of data, when the data itself must be handled by experts and writing code and managing computing power to extract, store, and analyze the data is often as herculean a task as the actual research process (identifying an issue, making a hypothesis, drawing conclusions, etc.).

On a perhaps more interesting level, however, big data reveals much about our society and its processes and values. Many theorists over time (from Ian Hacking to Helen Longino) have explained the ways in which science, considered in much of Western society to be the epitome of objectivity, is closely tied to and affected by social circumstances. In relation to data, specifically, Gitelman and Jackson take this idea even further. They write, “Objectivity is situated and historically specific; it comes from somewhere and is the result of ongoing changes to the conditions of inquiry, conditions that are at once material, social, and ethical” (4). In this context, analyses of science as a socially-affected process are part of a larger pattern of socio-historical “objectivity.” Gitelman and Jackson reach this conclusion through their analysis of data and the ways in which data cannot be “raw” because it is always collected, handled, and reflected upon within a specific cultural and historical context.

In an analytical important move, Gitelman and Jackson take this conclusion even further. By alleging that all knowledge is culturally and historically situated, they are able to expand the very idea of data to include traditionally “subjective” stores of information – literature, art, and other humanities-produced work. This may at first appear to be a somewhat cyclical logic: by examining the ways in which data is worked on and utilized, the authors are able to expand the very definition of data itself. This is, in my opinion, a very powerful move, because it puts into reach of social analysis ideas and practices considered objective – the sciences, for instance – as well as those areas in the middle of the subjective/objective continuum, such as photorealistic art and photography.

Other miscellaneous thoughts: The authors reference the idea that “numbers are objective” – this reminded me a lot of Bertrand Russell’s work in philosophy of mathematics on his Principia Mathematica, and the way even the very basic principles of mathematics cannot be objectively confirmed. I loved the way these authors use big data, a “hot topic” these days, to tie into theory of science and science/technology studies. It seems to me to be a strong theoretical move that analyses and utilizes rather than falls into the hype around the topic.

Are data imperatively objective? How is it possible that a simple equation such as 2+2=4 anything outside of the realm of objectivity? Placed in technological terms: can a byte of information realize its existence outside of the realm of human subjection? Do these examples of data hold universally true independent of the skews of perception and thought?

If there is such skepticism surrounding the notions of concrete bytes, imagine the future of data storage and processing when the sheer scale of data reaches into the petabytes and beyond. Indeed, any notion regarding data instability will garner exponential shockwaves when bytes reach into the quintillions. Is there a limit to such an enormous growth rate of data? Will the “cooking” process occupy such vast resources that the very subjectivity/objectivity of data will be impossible to assess?

Perhaps a de-gridding of contemporary society is mandated if we are to continue to grow and prosper at such an expansive rate. All of this technology imperialism associated with contemporary data structures might very well lead to a future of unintelligibility. As technology progresses to a singularity of big data, the prospect of getting “off the grid” seems more and more appealing. It’s not that progressive data technology is a negative possibility; the unfavorable outcome arises when there are simply too many bytes. Before taking the leap from terabytes to petabytes, a re-analysis of the objective stability of information is necessitated – before we are overwhelmed by the sheer mass of big data.

In Lisa Gitelman and Virginia Jackson’s piece of literature Raw Data big data is the discussion of topic. It states in the article that “At a certain level the collection and management of data may be said to presuppose interpretation. “Data [do] not just exist,” Lev Manovich explains, they have to be “generated.” Data needs to be imagined as data to exist and function as such, and the imagination of data entails an interpretive base.”

This goes back to our discussion about the information online and if this data or the polls taken about our character etc. can actually be taken seriously or seen as accurate knowledge. Like the quote above it shows how large data is taken and manipulated into a interpretive basis for what we think we know. This data is generated and shaped based of presumed thoughts that can affect the big data or pulled information.

This then turns to the thought of what can we actually trust online. Is it the people we can’t trust, the large data collectors, or the gathered information that is taken and displayed as genuine information?

There are some useful ways big data can serve us, but as we know quite well, there are some issues behind the function of it. One of the fundamental issues relates to the protection of personal information. We often say that we are well aware of the consequences of sharing contents online, since all data are supposedly “stored” somewhere permanently- but are we really aware of what happens to all those bits of data? Sometimes we delete our browsing history, and we tend to be even more cautious when using a public computer because of our fear of hacking. However, we are still ignorant of the fact that the bits of data now can be tracked backwards to “re-identify” individuals. I found an interesting article by Nate Anderson, titled “Anonymized data really isn’t—and here’s why not.” (Link: http://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/) The article covers a few cases where the anonymized data can be used to track back and re-identify the supposedly “anonymous” individuals. It is not only “creepy” that technology enables us to do such things, but it is also a huge threat to privacy. Today, the term “personal data” is not even worth distinguishing, because almost all information can be “personal” when combined with enough other relevant bits of data. Apparently, ZIP code, birthdate, and sex are the three bits of information that can uniquely identify about 87% of Americans. The tragedy is that there will be no way to “guarantee maximal usefulness and maximal privacy at the same time,” because sharing data indiscriminately and protecting privacy cannot be achieved simultaneously; one has to be compromised to achieve the other. Currently, it seems to be that people are more thrilled (than concerned) about growing storage capacity and the usefulness and convenience of big data. This issue will become more and more significant in the future, and it certainly seems like it cannot be resolved easily.

Big Data

We live in the era of Big Data.

Google and Facebook are two of the most viewed entities on the Internet today. If not for them and their collection of our information the world as we know it would be completely different. It consumes our lives and we depend on it so much that we could not complete our daily tasks without it. Big data is the phone that you check the time on, the computer that I am currently writing this response on, it’s even in your monthly doctor checkup that you have to attend. From Lisa Gitelman and Virginia Jackson’s article ‘Raw Data’ “Our data isn’t just telling us what’s going on in the world, it’s actually telling us where the world is going.”

But imagine if we “spent a day ‘off the grid.’” (Raw Data, Gitelman and Virginia) You would have to leave your credit and debit cards, buss and train passes, school or work IDs, passport, and cell phone and computers at home. Do you think you could survive? How long do you think you could actually last without the information that big data has given you access to?

Markus Persson, aka Notch, is an internet hero. He created the bafflingly popular game Minecraft, in which people build astonishing things out of an infinite supply of cubes. In June of 2012, he posted a pair of surprising Tweets: surprising because I saw them the same day we, in class, brought back the question: what if the ‘windows’ of our computer screens are actually becoming mirrors?

Personal biases and eager-to-please algorithms mean that more than ever, our net activity is exposing us to things that we already know and like. But even more than that, increasingly, we define ourselves by the activities on our screens. What does it mean to be a “gamer”? Why does Persson– an individual whose fame and fortune have been made by offering the internet’s denizens an infinite playground of creativity and ingenuity– find himself doubting the very world in which he has found so much success? Wark would say that we are all gamers, playing in our own reality gamespace. However, that doesn’t really address the question of the people who take on gaming as an identity. Is gaming simply, as Wark and others seem to believe, a way of anesthetizing ourselves from the truth of the world?

Much of this course has focused on the “Delightful Creepiness” of the Internet and new media. However, I find that conclusion unsatisfying. Persson’s second Tweet is tongue-in-cheek, but I believe it can be interpreted as salient. It is in the quiet, dark moments of our life that we doubt ourselves. And in the words of Shakespeare, “Our doubts are traitors, and make us lose the good we oft might win, by fearing to attempt.” The Internet is a new frontier, as infinite as the endlessly self-generating map of a Minecraft server; like the universe itself, it expands as soon as you reach the border. To treat the Internet with trepidation and paranoia is to limit oneself from exploring its possibilities.

For my final blog post, I would like to look back at the Matrix and examine it in terms of the readings that we have done since we watched it in class.  One of the most appropriate writings is Manovich’s navigable space.  The Matrix is filled with the idea of navigating through a simulation.  In fact, that is one of the core themes of the film.  The entire world that most people know is a series of simulations that they can navigate so freely that they have no clue that they are in a simulated environment.  If the simulation was not this perfect, then humanity would realize that something was wrong and rebel.  When the protagonists enter the Matrix, they are able to enter the main simulation, but also travel between simulations.  This is the one remnant of the system that is more akin to browsing the internet.  Instead of being able to walk from one simulation to another, they tell their computer operator to switch them from one simulation to another.

The second idea that I would like to link to the Matrix is Keenan’s idea of windows and exposure.  There are a couple of places in the Matrix that really relate to this idea.  The first is the Matrix itself.  When the characters are in the code, the initial thought is that they are in there own world and are experiencing the world without truly being at risk.  However, it is later revealed that, not only are they visible to the computer operator, but they are also visible to the malicious programs that are trying to hunt them down.  The second instance of a change in perception is the scene where Morpheus explains the history of the Matrix.  The description of what is the “real world” ends when the camera zooms out of the TV where the explanation has been happening.  It turns out that, while you initially think of the explanation as looking at the world, it was actually looking through the window of a screen.  Instead of the harsh exposure that you feel as the machines swirl around you, you are instead in a sterile room, protected by the television that keeps the monsters separated from you.

“Raw Data” Is An Oxymoron talks about Big Data being applied to different disciplines:

“Every discipline and disciplinary institution has its own norms and standards for the imagination of data, just as every field has its accepted methodologies and its evolved structures of practice” (3).

As a data scientist working in data analytics with respect to online education, I can relate to this statement. At work, I construct systems and algorithms that condense real world data into a standardized data object, which interacts with other objects in data structures. In the case of online education, I need to structure data relating to students answering questions online into discretized objects respresenting item responses.

While doing so, I need to make sure that what I’m building is consistent with learning theory. I read literature written by non-statisticians and build data models that are analogous to this models presented in this literature; because online education data analytics is such a new field, there is no norm to follow. How raw responses are transformed into structured data – how data is imagined – becomes extremely important to how we do statistical inference.

When considering the phenomenon of Big Data, I think it is important to think about its practical implications for reality and for the Internet. Some would idiomatically argue that globalization has in fact shrunk the world. Does the expansion of data output in the form of Big Data similarly shrink the Internet in the sense that the amount of relevant and useful data is becoming proportionally smaller in comparison to the vast surplus of data? In this sense, there seems to be a simultaneous expansion and contraction that parallels the apparent reality of the Small World. The rhetoric of networks, in which Big Data necessarily operates, enables this kind of contrived idea because networks represent connections and relationships between smaller modes or data. Following the concept of the network, there exist shortcomings in the function of Big Data because quantifying relationships according to electronic information is kind of not humanitarian. Is it ethical to aggregate information on the Internet in order to construct a profile for a user? As much as a lot of postmodernist ideologies emphasize the deconstruction of social constructs, aggregating data in the form of Big Data appears to solidify the categorization and the organization of persons with the backing of algorithmic functions and objectivity. Especially in this alleged Digital Age when empiricism directs most reason, does the prospect of data mining (mediating electronic qualitative and quantitative observation) offer a perfect kind of empiricism that is still ethically applicable to the human experience?