Listen "US Election Special"
Episode Synopsis
What exciting data science problems emerge when you try to forecast an election? Many, it turns out!We're very excited to turn our DataCafé lens on the current Presidential race in the US as an exemplar of statistical modelling right now. Typically state election polls are asking around 1000 people in a state of maybe 12 million people how they will vote (or even if they have voted already) and return a predictive result with an estimated polling error of about 4%.In this episode, we look at polling as a data science activity and discuss how issues of sampling bias can have dramatic impacts on the outcome of a given poll. Elections are a fantastic use-case for Bayesian modelling where pollsters have to tackle questions like "What's the probability that a voter in Florida will vote for President Trump, given that they are white, over 60 and college educated".There are many such questions as each electorate feature (gender, age, race, education, and so on) potentially adds another multiplicative factor to the size of demographic sample needed to get a meaningful result out of an election poll.Finally, we even hazard a quick piece of psephological analysis ourselves and show how some naive Bayes techniques can at least get a foot in the door of these complex forecasting problems. (Caveat: correlation is still very important and can be a source of error if not treated appropriately!)Further reading:Article: Ensemble Learning to Improve Machine Learning Results (https://bit.ly/34MW3HO via statsbot.co)Paper: Combining Forecasts: An Application to Elections (https://bit.ly/3efx5nm via researchgate.net)Interactive map: Explore The Ways Trump Or Biden Could Win The Election (https://53eig.ht/2TIlAvh via fivethirtyeight.com)Podcast: 538 Politics Podcast (https://53eig.ht/2HSkwCA via fivethirtyeight.com)Update US polling map: Consensus Forecast Electoral Map (https://bit.ly/2HY1FWk via 270towin.com)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 30 October 2020Intro music by Music 4 Video Library (Patreon supporter) Send us a textThanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.
More episodes of the podcast DataCafé
Science Communication with physicist Laurie Winkless, author of "Sticky" & "Science and the City"
02/06/2023
A Culture of Innovation
06/09/2022
Scaling the Internet
30/07/2022
[Bite] Documenting Data Science Projects
29/06/2022
[Bite] Version Control for Data Scientists
05/05/2022
[Bite] Wordle: Winning against the algorithm
14/03/2022
Series 2 Introduction
14/03/2022
[Bite] Why Data Science projects fail
21/06/2021
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.