Statistical Significance: How Big Data is Changing the Way We See the World

Craig Fugate, director of Federal Emergency Management Agency, once said on predicting calamities: “Disasters are like horseshoes, hand grenades and thermonuclear devices; you just need to be close – preferably more than less.”
Luckily, the information age has given organizations throughout the world a better way to understand and begin to predict complex phenomena. This new method ties together old-fashioned statistical analysis with the explosion of crowdsourced information now being transmitted over phone networks and the internet. This includes Twitter posts, Facebook status updates, blog posts, text messages, Amazon purchase histories, and more –what  a recent white paper by UN Global Pulse dubs “Big Data.”

Though huge datasets have existed before, Big Data information has the unique characteristics of being both current and broad enough for decision-makers to make policy and operational decisions that respond to immediate issues. On one hand, Big Data is an excellent tool when leveraged effectively. It helped to save lives after the 2010 Haiti earthquake, is being mined by the intelligence community to track future trends, and may even find its way into US counterinsurgency doctrine. However, as with all human-sourced data, Big Data is notoriously fickle, and is prone to misinterpretation. With some NGOs, businesses, and governments working tirelessly to harness the power of Big Data, even as others remain far more reluctant, it’s worth considering both sides of this new venture.

What does Big Data do for us?

On one hand, having access to millions of lines of data collected from various sources can lead to insights that would have previously been impossible. One of the earlier applications of large-scale statistical analysis happened some years ago, during the trial of Slobodan Milosevic. During the trial, the relevance of large datasets was showcased by demonstrating that surges in Serbian troop movements were the primary cause of Albanians fleeing their homes, and not NATO bombings or Albanian guerilla attacks, as Milosevic claimed.

Since then, the information age has led to much greater opportunities. Google has been successful in correlating specific web search terms with outbreaks of illnesses, such as dengue fever or the flu, often before the Centers for Disease Control & Prevention or other health agencies have officially confirmed the outbreak. This method looks at the number of people who search for terms such as “dengue” using Google’s search engine, and based on how many searches occur in a specific area within a narrow span of time, Google was able to estimate the likelihood of an outbreak having occurred. In the aftermath of the Haiti earthquake, the disaster relief NGOs in Haiti implemented a nationwide SMS campaign, allowing anyone with a cell phone to report the locations of injured or trapped people via a simple text message. The density and content of these text messages were updated on a map in real time and used in deciding how to best deploy resources.

However, some of the innate characteristics of Big Data information require analysts to use caution when extrapolating them to larger populations. For instance, data drawn from newer technologies tends to have age and education biases due to the disproportionate number of younger, more educated people using these services. Researchers in Haiti also found that data could be dirtied due to poor reporting, such as when an individual’s perception of an event is not indicative of what is actually happening, or if multiple individuals give conflicting reports of a particular event. There are methods to circumvent these issues – such as requiring individuals to verify their reports with a picture, or putting more weight on previously vetted, “trusted” sources – though these add to the overall cost of crowdsourcing and make it more difficult to draw inferences from the data.

The simple presence of data, too, does not necessarily indicate that researchers can draw desired conclusions from them. For instance, Google faced analytical issues when it leveraged web search histories to track outbreaks of influenza for Healthmap. The search terms that it originally looked for to identify flu victims – coughing, runny nose, sneezing, etc. – were indicative of so many illnesses that Google could not accurately identify a flu victim without controlling for several other attributes as well.

Similarly, NGOs in Haiti ran into unexpected trouble when they tried to infer the locations of damaged buildings via the density of text messages or Twitter posts that reported a person trapped within a building. This information was tarnished by the fact that affected persons tended to wait until they arrived at a rescue shelter before making reports on Twitter. In fact, analysis after the calamity indicated that NGOs would have had more accurate information on the possible location of damaged buildings simply by looking at maps of affected towns.

Here to stay

Despite some stumbling blocks, Big Data and statistical analysis are taking an increasingly large role in policymaking. US intelligence analysts have noted that in some cases, Big Data and crowdsourced intelligence outperforms traditionally gathered human intelligence. Both DARPA and IARPA – the research arms of the defense and intelligence communities, respectively – are extensively looking into crowdsourcing as a means to solve policy problems or as an untapped source for understanding national or regional trends around the world. Even in Afghanistan, a rapidly expanding mobile network will enable previously isolated communities to become connected to the world and to generate Big Data of their own; an opportunity that the US military is actively looking to leverage for the ongoing counterinsurgency campaign.

As the hot new thing in informatics, Big Data is still being explored for all of its implications. Concerns about privacy still loom large, as do the technical and legal challenges of being able to acquire, analyze, and disseminate this information across various sectors of government and industry. Hopefully, Big Data will grow to become a part of everyday life, and in time, will open up new avenues for people to relate to one another.

About the author

Jason Kumar is a Masters of Public Policy candidate at Georgetown University. Currently an intern at the Woodrow Wilson International Center for Scholars’ Science & Technology Innovation Program, he previously worked in supply-chain consulting and has a background in industrial engineering. He is interested in the effects of new technologies and tactics on national security policy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s