On Tuesday, November 6, 2018 from 3 – 4 p.m., Ashley Champagne, Digital Humanities Librarian, will offer a workshop entitled, “Misconceptions of Data: Thinking Critically About Data.” Part of the Reading, Resisting, and Reimagining The Map series, the workshop will take place in the Digital Scholarship Lab at the Rockefeller Library. The workshops are free and open to the public.
Thinking Critically About Data
Despite our increasingly digital world, data sets on all kinds of topics are missing, limited, and misunderstood. Mimi Onuoha uses the term “missing data sets” for “the blank spots that exist in spaces that are otherwise data-saturated.” She documents a series of questions that have no answers. Questions like, “How many people have been excluded from public housing because of criminal records?” are impossible to answer because there is incomplete, unreliable, missing data. And even when data sets exist, they may not be publicly accessible.
The team behind the Torn Apart / Separados project encountered the lack of data surrounding the question of where children were living after they were separated from their parents due to Donald Trump’s “zero tolerance” immigration policy in 2018. Public discourse surrounding the crisis focused on how Immigration and Customs Enforcement (ICE) officials held children at the United States/Mexico border. But the Torn Apart / Separados map tells a different story due to the data that the team rapidly collected, analyzed, and published. ICE centers holding children separated from their parents are all over – not just along the border, but in the middle of the United States and everywhere in between.
The Torn Apart / Separados team were thankfully able to collect the data they needed, but for certain research questions there is little quantitative data to gather. Particularly in such cases, qualitative data can illuminate areas of study where quantitative data is limited or impossible to gather. The population size of transgender individuals in the United States, for example, isn’t well known partly because there isn’t a lot of data on gender identity. One way to find out some information on questions that do not have clear answers is to collect qualitative data, like articles that include the word “transgender,” and explore that qualitative data through text mining. Text mining offers the researcher the ability to find patterns and themes in large corpora.
One of the ways the Torn Apart / Separados team went about collecting the data was by using Application Programing Interfaces (APIs). At the Center for Digital Scholarship in the Brown University Library, we teach workshops on everything from data literacy to text analysis to thinking critically about data. On behalf of our center, I’m offering a workshop to explore how to use an API to collect full text articles to create a dataset.
APIs offer limited information, such as the web URLs, keywords, titles, and sometimes other metadata. They will get researchers part of the way to collecting a qualitative dataset, but not the whole way there. But from the initial API data, we can use web scraping software to gather full text articles. There will always be missing data sets, but beginning to collect data to find answers to our questions is a good start.
Digital Humanities Librarian
Brown University Library
Date: Tuesday, November 6, 2018
Time: 3 – 4 p.m.
Location: Patrick Ma Digital Scholarship Lab, Rockefeller Library, 10 Prospect Street, Providence, RI