Image source: manrepeller

It has been over two weeks since the world witnessed the most dramatic presidential elections in the recent history. It has also been two weeks since the majority collectively lost their trust on poll data and the number games in general. The polling charts that predominantly predicted a win for Clinton (averaging between 1-7% lead) and her party months before the D-day started plunging as the counting concluded in some of the key swing states. So what happened on Nov 8 that failed even the staunchest of data experts and pundits? Has data started to grow a mind of its own by inheriting its masters’ cognitive biases?

Before we dive into comparisons, explanations and lessons, it is important to understand that the results from Electoral College model of conducting elections in the United States depends on a variety of socio-economic parameters. Therefore, no amount of data mining and processing could offer accurate projections on a platter. And to be fair with the pollsters, not all of them ruled out the thought of a Trump presidency and some of their predictions did come true as Hillary Clinton managed to lead the President-elect by 1.7 million in popular votes across the country. Either way, the results will serve as a “Data Analytics 101” course to not just data scientists, but also for other industries that look up to “Big Data” in making their business decisions.

Polling Models

During the presidential election season in US, polling data starts pouring in right from the day representatives from both parties announce their candidacy. An average polling sample is collected by a trusted pollster from survey data of over 5 to 7 polls amongst 1000 different respondents for every poll in any particular geography. For example, The Huffington Post’s pollster model, that gave a 98% winning chance for Hillary, is based on a Bayesian Kalman Filter Model that utilizes 100,000 simulations of data collected over at least 5 polls every day and the results are relayed as a trend line rather than by determining the average of the polling data.

On the contrary, and to the amusement of conventional pollsters, Allan Litchman- a political historian- had accurately predicted the presidency of Trump (and the elections results for 3 decades) well ahead of the elections with a model that is independent of survey data. According to Allan, “Polls are not predictions. They are snapshots and they are abused and misused as though they are predictions”. He has designed a 13-keys model that includes factors that depend more on the current political climate of the country and the third-party candidature for the season than by the everyday controversies or the policies put-forth by the candidates.

Data Vs Analytics

Even though this is not the first time in history that a poll data has been misread, the studies are already underway to completely understand what went wrong in 2016.  Dr. Pradeep Mutalik, a research scientist from Yale University, points his finger at the traditional polling models’ negligence of margins of error for uncertainty and the unavailability of substantial empirical data to validate their probability models.

While data sampling models were designed to minimize noise and maximize inclusivity of the polls to sharpen the predictions, the reality was far from their reach this election season. The key contrast between the models that succeeded and failed is that the former were reading the pulse of the nation along with its data; they understood the difference between accurate data and actual analytics.

Big Data and Business Decisions

As the analysts community wrap their head around this data debacle, they have admitted that predictive analytics, especially in election forecasting, is still a “young science”. This declaration sends a clear message to modern political observers and business decision-makers alike to re-think the way they approach big data and analytics. Some primary takeaways include:

  • Polls don’t define opinions: Several post-election analytics point at a case of Bradley effect (the case of prospective voters not expressing their honest answer fearing a social desirability bias) as a reason for the sudden surge of Trump’s supporters in the polling booths. While this theory is now tested for actuality, it proves the point that data without a solid ground analysis is not invincible.
  • Harvest data as per the climate: In the surveys that were done predominantly through online forms and telephones, the pollsters have admitted to voters reacting differently as per their location; states that gave Trump a wide margin for win were more vocal about their support while his supporters (especially women) in swing states were more comfortable voting online. This implies that decision-makers cannot rely on one misrepresented channel to understand the pulse of their entire demography.
  • Never underestimate the power of dissatisfied population: Last but not the least, this key takeaway has little to do with data but speaks volume about the anger of the majority with everything mainstream. This has pushed a considerable population to revolt against the idea of public opinion polls and therefore led them to not participate in it. If there is one thing that business owners should carry with them forever, it should be to prioritize and listen to their angry customers more often and, most importantly, do something about it.