Could Trump be the next President of America?
There is a lot of statistical maths behind polling data to make it as accurate as possible – though poor sampling techniques can lead to unexpected results. For example in the UK 2015 general election even though labour were predicted to win around 37.5% of the vote, they only polled 34%. This was a huge political shock and led to a Conservative government when all the pollsters were predicting a hung parliament. In the postmortem following the fallout of this failure, YouGov concluded that their sampling methods were at fault – leading to big errors in their predictions.
Trump versus Clinton
The graph above from Real Clear Politics shows the current hypothetical face off between Clinton and Trump amongst American voters. Given that both are now clear favourites to win their respective party nominations, attention has started to turn to how they fare against each other.
A great deal of statistics dealing with populations is based on the normal distribution. The normal distribution has the bell curve shape above – with the majority of the population bunched around the mean value, and with symmetrical tails at each end. For example most men in the UK will be between 5 feet 8 and 6 foot – with a symmetrical tail of men much taller and much smaller. For polling data mathematicians usually use a sample of 1000 people – this is large enough to give a good approximation to the normal distribution whilst not being too large to be prohibitively expensive to conduct.
A Polling Example
The following example is from the excellent introduction to this topic from the University of Arizona.
So, say we have sample 1000 people asking them a simple Yes/No/Don’t Know type question. Say for example we asked 1000 people if they would vote for Trump, Clinton or if they were undecided. In our poll 675 people say, “Yes” to Trump – so what we want to know is what is our confidence interval for how accurate this prediction is. Here is where the normal distribution comes in. We use the following equations:
We have μ representing the mean.
n = the number of people we asked which is 1000
p0 = our sample probability of “Yes” for Trump which is 0.675
Therefore μ = 1000 x 0.675 = 675
We can use the same values to calculate the standard deviation σ:
σ = (1000(0.675)(1-0.675))0.5
σ = 14.811
We now can use the following table:
This tells us that when we have a normal distribution, we can be 90% confident that the data will be within +/- 1.645 standard deviations of the mean.
So in our hypothetical poll we are 90% confident that the real number of people who will vote for Trump will be +/- 1.645 standard deviations from our sample mean of 675
This gives us the following:
upper bound estimate = 675 + 1.645(14.811) = 699.4
lower bound estimate = 675 – 1.645(14.811) = 650.6
Therefore we can convert this back to a percent – and say that we can be 90% confident that between 65% and 70% of the population will vote for Trump. We therefore have a prediction of 67.5% with a margin of error of +or – 2.5%. You will see most polls that are published using a + – 2.5% margin of error – which means they are using a sample of 1000 people and a confidence interval of 90%.
Back to the real polling data on the Clinton, Trump match-up. We can see that the current trend is a narrowing of the polls between the 2 candidates – 47.3% for Clinton and 40.8% for Trump. This data is an amalgamation of a large number of polls – so should be reasonably accurate. You can see some of the original data behind this:
This is a very detailed polling report from CNN – and as you can see above, they used a sample of 1000 adults in order to get a margin of error of around 3%. However with around 6 months to go it’s very likely these polls will shift. Could we really have President Trump? Only time will tell.