You are currently browsing the category archive for the ‘statistics’ category.

**Predicting the UK election using linear regression**

The above data is the latest opinion poll data from the Guardian. The UK will have (another) general election on June 8th. So can we use the current opinion poll data to predict the outcome?

**Longer term data trends**

Let’s start by looking at the longer term trend following the aftermath of the Brexit vote on June 23rd 2016. I’ll plot some points for Labour and the Conservatives and see what kind of linear regression we get. To keep things simple I’ve looked at randomly chosen poll data approximately every 2 weeks – assigning 0 to July 1st 2016, 1 to mid July, 2 to August 1st etc. This has then been plotted using the fantastic Desmos.

**Labour**

You can see that this is not a very good fit – it’s a very weak correlation. Nevertheless let’s see what we would get if we used this regression line to predict the outcome in June. With the x axis scale I’ve chosen, mid June 2017 equates to 23 on the x axis. Therefore we predict the percentage as

y = -0.130(23) + 30.2

y = 27%

Clearly this would be a disaster for Labour – but our model is not especially accurate so perhaps nothing to worry about just yet.

**Conservatives**

As with Labour we have a weak correlation – though this time we have a positive rather than negative correlation. If we use our regression model we get a prediction of:

y = 0.242(23) + 38.7

y = 44%

So, we are predicting a crushing victory for the Conservatives – but could we get some more accurate models to base this prediction on?

**Using moving averages**

The Guardian’s poll tracker at the top of the page uses moving averages to smooth out poll fluctuations between different polls and to arrive at an averaged poll figure. Using this provides a stronger correlation:

**Labour**

This model doesn’t take into account a (possible) late surge in support for Labour but does fir better than our last graph. Using the equation we get:

y = -0.0764(23) + 28.8

y = 27%

**Conservatives**

We can have more confidence in using this regression line to predict the election. Putting in the numbers we get:

y = 0.411(23) + 36.48

y = 46%

**Conclusion**

Our more accurate models merely confirm what we found earlier – and indeed what all the pollsters are predicting – a massive win for the Conservatives. Even allowing for a late narrowing of the polls the Conservatives could be on target for winning by over 10% points – which would result in a very large majority. Let’s see what happens!

**Modelling Radioactive decay**

We can model radioactive decay of atoms using the following equation:

**N(t) = N _{0} e^{-λt}**

Where:

**N _{0}**: is the initial quantity of the element

**λ**: is the radioactive decay constant

**t**: is time

**N(t)**: is the quantity of the element remaining after time t.

So, for Carbon-14 which has a half life of 5730 years (this means that after 5730 years exactly half of the initial amount of Carbon-14 atoms will have decayed) we can calculate the decay constant **λ. **

After 5730 years, N(5730) will be exactly half of N_{0}, therefore we can write the following:

**N(5730) = 0.5N _{0} = N_{0} e^{-λt}**

therefore:

**0.5 = e ^{-λt}**

and if we take the natural log of both sides and rearrange we get:

**λ = ln(1/2) / -5730**

**λ ≈0.000121**

We can now use this to solve problems involving Carbon-14 (which is used in Carbon-dating techniques to find out how old things are).

eg. You find an old parchment and after measuring the Carbon-14 content you find that it is just 30% of what a new piece of paper would contain. How old is this paper?

We have

**N(t) = N _{0} e^{-0.000121t}**

**N(t)/N _{0}** =

**e**

^{-0.000121t}**0.30** = **e ^{-0.000121t}**

**t = ln(0.30)/(-0.000121)**

**t = 9950 years old.**

**Probability density functions**

We can also do some interesting maths by rearranging:

**N(t) = N _{0} e^{-λt}**

**N(t)/N _{0}** =

**e**

^{-λt}and then plotting **N(t)/N _{0}** against time.

**N(t)/N _{0}** will have a range between 0 and 1 as when t = 0,

**N(0)**=

**N**which gives

_{0}**N(0)**/

**N(0)**= 1.

We can then manipulate this into the form of a probability density function – by finding the constant a which makes the area underneath the curve equal to 1.

solving this gives a = λ. Therefore the following integral:

will give the fraction of atoms which will have decayed between times t1 and t2.

We could use this integral to work out the half life of Carbon-14 as follows:

Which if we solve gives us t = 5728.5 which is what we’d expect (given our earlier rounding of the decay constant).

We can also now work out the expected (mean) time that an atom will exist before it decays. To do this we use the following equation for finding E(x) of a probability density function:

and if we substitute in our equation we get:

Now, we can integrate this by parts:

So the expected (mean) life of an atom is given by 1/λ. In the case of Carbon, with a decay constant λ ≈0.000121 we have an expected life of a Carbon-14 atom as:

E(t) = 1 /0.000121

E(t) = 8264 years.

Now that may sound a little strange – after all the half life is 5730 years, which means that half of all atoms will have decayed after 5730 years. So why is the mean life so much higher? Well it’s because of the long right tail in the graph – we will have some atoms with very large lifespans – and this will therefore skew the mean to the right.

**Modeling Volcanoes – When will they erupt?**

A recent post by the excellent Maths Careers website looked at how we can model volcanic eruptions mathematically. This is an important branch of mathematics – which looks to assign risk to events and these methods are very important to statisticians and insurers. Given that large-scale volcanic eruptions have the potential to end modern civilisation, it’s also useful to know how likely the next large eruption is.

The Guardian has recently run a piece on the dangers that large volcanoes pose to humans. Iceland’s Eyjafjallajökull volcano which erupted in 2010 caused over 100,000 flights to be grounded and cost the global economy over $1 billion – and yet this was only a very minor eruption historically speaking. For example, the Tombora eruption in Indonesia (1815) was so big that the explosion could be heard over 2000km away, and the 200 million tones of sulpher that were emitted spread across the globe, lowering global temperatures by 2 degrees Celsius. This led to widespread famine as crops failed – and tens of thousands of deaths.

**Super volcanoes**

Even this destruction is insignificant when compared to the potential damage caused by a super volcano. These volcanoes, like that underneath Yellowstone Park in America, have the potential to wipe-out millions in the initial explosion and and to send enough sulpher and ash into the air to cause a “volcanic winter” of significantly lower global temperatures. The graphic above shows that the ash from a Yellowstone eruption could cover the ground of about half the USA. The resultant widespread disruption to global food supplies and travel would be devastating.

So, how can we predict the probability of a volcanic eruption? The easiest model to use, if we already have an estimated probability of eruption is the Poisson distribution:

This formula calculates the probability that X equals a given value of k. λ is the mean of the distribution. If X represents the number of volcanic eruptions we have Pr(X ≥1) = 1 – Pr(x = 0). This gives us a formula for working out the probability of an eruption as 1 -e^{-λ}. For example, the Yellowstone super volcano erupts around every 600,000 years. Therefore if λ is the number of eruptions every year, we have λ = 1/600,000 ≈ 0.00000167 and 1 -e ^{-λ} also ≈ 0.00000167. This gets more interesting if we then look at the probability over a range of years. We can do this by modifying the formula for probability as 1 -e^{-tλ} where t is the number of years for our range.

So the probability of a Yellowstone eruption in the next 1000 years is 1 -e^{-0.00167} ≈ 0.00166, and the probability in the next 10,000 years is 1 -e^{-0.0167} ≈ 0.0164. So we have approximately a 2% chance of this eruption in the next 10,000 years.

A far smaller volcano, like Katla in Iceland has erupted 16 times in the past 1100 years – giving a average eruption every ≈ 70 years. This gives λ = 1/70 ≈ 0.014. So we can expect this to erupt in the next 10 years with probability 1 -e^{-0.14} ≈ 0.0139. And in the next 30 years with probability 1 -e^{-0.42} ≈ 0.34.

The models for volcanic eruptions can get a lot more complicated – especially as we often don’t know the accurate data to give us an estimate for the λ. λ can be estimated using a technique called Maximum Likelihood Estimation – which you can read about here.

If you enjoyed this post you might also like:

Black Swans and Civilisation Collapse. How effective is maths at guiding government policies?

**Are you Psychic?**

There have been people claiming to have paranormal powers for thousands of years. However, scientifically we can say that as yet we still have no convincing proof that any paranormal abilities exist. We can show this using some mathematical tests – such as the binomial or normal distribution.

**ESP Test **

You can test your ESP powers on this site (our probabilities will be a little different than their ones). You have the chance to try and predict what card the computer has chosen. After repeating this trial 25 times you can find out if you possess psychic powers. As we are working with discrete data and have a fixed probability of guessing (0.2) then we can use a binomial distribution. Say I got 6 correct, do I have psychic powers?

We have the Binomial model B(25, 0.2), 25 trials and 0.2 probability of success. So we want to find the probability that I could achieve 6 **or more** by luck.

The probability of getting exactly 6 right is 0.16. Working out the probability of getting 6 or more correct would take a bit longer by hand (though could be simplified by doing 1 – P(x ≤ 5). Doing this, or using a calculator we find the probability is 0.38. Therefore we would expect someone to get 6 or more correct just by guessing 38% of the time.

So, using this model, when would we have evidence for potential ESP ability? Well, a minimum bar for our percentages would probably be 5%. So how many do you need to get correct before there is less than a 5% of that happening by chance?

Using our calculator we can do trial and error to see that the probability of getting 9 or more correct by guessing is only 4.7%. So, someone getting 9 correct might be showing some signs of ESP. If we asked for a higher % threshold (such as 1%) we would want to see someone get 11 correct.

Now, in the video above, one of the Numberphile mathematicians manages to toss 10 heads in a row. Again, we can ask ourselves if this is evidence of some extraordinary ability. We can calculate this probability as 0.5^{10} = 0.001. This means that such an event would only happen 0.1% of the time. But, we’re only seeing a very small part of the total video. Here’s the full version:

Suddenly the feat looks less mathematically impressive (though still an impressive endurance feat!)

You can also test your psychic abilities with this video here.

**Statistics to win penalty shoot-outs**

With the World Cup nearly upon us we can look forward to another heroic defeat on penalties by England. England are in fact the worst country of any of the major footballing nations at taking penalties, having won only 1 out of 7 shoot-outs at the Euros and World Cup. In fact of the 35 penalties taken in shoot-outs England have missed 12 – which is a miss rate of over 30%. Germany by comparison have won 5 out of 7 – and have a miss rate of only 15%.

With the stakes in penalty shoot-outs so high there have been a number of studies to look at optimum strategies for players.

**Shoot left when ahead
**

One study published in Psychological Science looked at all the penalties taken in penalty shoot-outs in the World Cup since 1982. What they found was pretty incredible – goalkeepers have a subconscious bias for diving to the right when their team is behind.

As is clear from the graphic, this is not a small bias towards the right, but a very strong one. When their team is behind the goalkeeper apparently favours his (likely) strong side 71% of the time. The strikers’ shot meanwhile continues to be placed either left or right with roughly the same likelihood as in the other situations. So, this built in bias makes the goalkeeper much less likely to help his team recover from a losing position in a shoot-out.

**Shoot high**

Analysis by Prozone looking at the data from the World Cups and European Championships between 1998 and 2010 compiled the following graphics:

The first graphic above shows the part of the goal that scoring penalties were aimed at. With most strikers aiming bottom left and bottom right it’s no surprise to see that these were the most successful areas.

The second graphic which shows where penalties were saved shows a more complete picture – goalkeepers made nearly all their saves low down. A striker who has the skill and control to lift the ball high makes it very unlikely that the goalkeeper will save his shot.

The last graphic also shows the risk involved in shooting high. This data shows where all the missed penalties (which were off-target) were being aimed. Unsurprisingly strikers who were aiming down the middle of the goal managed to hit the target! Interestingly strikers aiming for the right corner (as the goalkeeper stands) were far more likely to drag their shot off target than those aiming for the left side. Perhaps this is to do with them being predominantly right footed and the angle of their shooting arc?

**Win the toss and go first**

The Prozone data also showed the importance of winning the coin toss – 75% of the teams who went first went on to win. Equally, missing the first penalty is disastrous to a team’s chances – they went on to lose 81% of the time. The statistics also show a huge psychological role as well. Players who needed to score to keep their teams in the competition only scored a miserable 14% of the time. It would be interesting to see how these statistics are replicated over a larger data set.

**Don’t dive**

A different study which looked at 286 penalties from both domestic leagues and international competitions found that goalkeepers are actually best advised to stay in the centre of the goal rather than diving to one side. This had quite a significant affect on their ability to save the penalties – increasing the likelihood from around 13% to 33%. So, why don’t more goalkeepers stay still? Well, again this might come down to psychology – a diving save looks more dramatic and showcases the goalkeeper’s skill more than standing stationary in the centre.

**So, why do England always lose on penalties?**

There are some interesting psychological studies which suggest that England suffer more than other teams because English players are inhibited by their high public status (in other words, there is more pressure on them to perform – and hence that pressure is harder to deal with). One such study noted that the best penalty takers are the ones who compose themselves prior to the penalty. England’s players start to run to the ball only 0.2 seconds after the referee has blown – making them much less composed than other teams.

However, I think you can put too much analysis on psychology – the answer is probably simpler – that other teams beat England because they have technically better players. English footballing culture revolves much less around technical skill than elsewhere in Europe and South America – and when it comes to the penalty shoot-outs this has a dramatic effect.

As we can see from the statistics, players who are technically gifted enough to lift their shots into the top corners give the goalkeepers virtually no chance of saving them. England’s less technically gifted players have to rely on hitting it hard and low to the corner – which gives the goalkeeper a much higher percentage chance of saving them.

**Test yourself**

You can test your penalty taking skills with this online game from the Open University – choose which players are best suited to the pressure, decide what advice they need and aim your shot in the best position.

If you liked this post you might also like:

Championship Wages Predict League Position? A look at how statistics can predict where teams finish in the league.

Premier League Wages Predict League Positions? A similar analysis of Premier League teams.

**Which Times Tables do Students Find Difficult? **

There’s an excellent article on today’s Guardian Datablog looking at a computer based study (with 232 primary school students) on which times tables students find easiest and difficult. Edited highlights (Guardian quotes in italics):

**Which multiplication did students get wrong most often?**

*The hardest multiplication was six times eight, which students got wrong 63% of the time (about two times out of three). This was closely followed by 8×6, then 11×12, 12×8 and 8×12.*

The graphic shows the questions that were answered correctly the greatest percentage of times as dark blue (eg 1×12 was answered 95% correctly). The colours then change through lighter shades of blue, then from lighter reds to darker reds. It’s interesting to see that the difficult multiplications cluster in the middle – perhaps due to how students anchor from either 5 or 10 – so numbers away from both these anchors are more difficult.

**Which times table multiplication did students take the longest time to answer?
**

*Maybe unsurprisingly, 1×1 got answered the quickest (but perhaps illustrating the hazards of speed, pupils got it wrong about 10% of the time), at 2.4 seconds on average – while it was 12×9 which made them think for longest, at an average of 7.9 seconds apiece.*

It’s quite interesting to see that this data is somewhat different to the previous graph. You might have expected the most difficult multiplications to also take the longest time – however it looks as though some questions, whilst not intuitive can be worked out through mental methods (eg doing 12×9 by doing 12×10 then subtracting 12.)

**How did boys and girls differ?**

*On average, boys got 32% of answers wrong, and took 4.2 seconds to answer each question. Girls, by contrast, got substantially fewer wrong, at 22%, but took 4.6 seconds on average to answer.*

Another interesting statistic – boys were more reckless and less considered with their answers! The element of competition (ie. having to answer against a clock) may well have encouraged this attitude. It would be interesting to see the gender breakdown to see whether boys and girls have any differences in which multiplication they find difficult.

**Which times table was the hardest?**

As you might expect, overall the 12 times table was found most difficult – closely followed by 8. The numbers furthest away from 5 and 10 (7,8,12) are also the most difficult. Is this down to how students are taught to calculate their tables – or because of the sequence patterns are less memorable?

This would be a really excellent investigation topic for IGCSE, IB Studies or IB SL. It is something that would be relatively easy to collect data on in a school setting and then can provide a wealth of data to analyse. The full data spreadsheet is also available to download on the Guardian page.

If you enjoyed this post you may also like:

Finger Ratio Predicts Maths Ability?– a maths investigation about finger ratio and mathematical skill.

Premier League Finances – Debt and Wages – an investigation into the finances of Premier League clubs.