Skip navigation

One thing that frustrates me about discussions of “big data” is that they often take for granted that big data solutions (especially to questions of consumer preference) are universally effective. To me, it seems like they’re often quite terrible: Greg Linden’s characterization of early Amazon recommendations as “going shopping with the village idiot” don’t feel that far from the present. For example, I recently bought some tupperware on Amazon. Now my Amazon homepage recommends that I buy 10 different kinds of tupperware, because apparently I am a tupperware hoarding monster. Facebook’s personalized ads seem equally silly (either directly related to a page I “like”, or seemingly random). These often seem like over-engineered solutions, caught up in hype and techno-utopian idea that “if we have enough data, the answer will become clear!” The Mayer-Schonberger and Cukier reading gets at this too: is it surprising that the frequency of previous incidents and cable age are the strongest indicators of which manholes might explode? So, my first question is: will big data make us dumb (or has it already)? If we worry more about the what than the why and use big data as the proverbial hammer to the nail that all of our problems start to look like, what are we losing?

On a slightly different note, one thing that makes me so uncomfortable about big data is its constant use of classification. Machine learning — responsible for so many of the algorithms we think of as “big data” — is most often about categorization: the emergence of categories and clusters from a data set in “unsupervised” learning, or the fitting of data to a set of predefined and labeled categories in “supervised” learning. Of course, even in the unsupervised mode, these categories never simply “emerge” — they depend on the features we use to describe our data, which come with certain assumptions about what might be important about it. We can classify consumers into different groups based on their preferences, and this might help some corporation sell more products, but these categories all require a mode of thinking that normalizes some behaviors and marginalizes others. The “urban tribes” project from UCSD’s computer vision program is a great example of this trend (and very much an instance of the big-data-as-hammer logic): why not just categorize people based on their looks? The blatant potential for work like this to reinforce damaging stereotypes almost goes without saying, but hey, look what big data can do!

Fast Company’s response is pretty good.