You are currently browsing the category archive for the ‘statistics’ category.

Anscombe’s Quartet was devised by the statistician Francis Anscombe to illustrate how important it was to not just rely on statistical measures when analyzing data. To do this he created 4 data sets which would produce nearly identical statistical measures. The scatter graphs above generated by the Python code here.

**Statistical measures**

1) Mean of x values in each data set = 9.00

2) Standard deviation of x values in each data set = 3.32

3) Mean of y values in each data set = 7.50

4) Standard deviation of x values in each data set = 2.03

5) Pearson’s Correlation coefficient for each paired data set = 0.82

6) Linear regression line for each paired data set: y = 0.500x + 3.00

When looking at this data we would be forgiven for concluding that these data sets must be very similar – but really they are quite different.

**Data Set A:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [8.04, 6.95,7.58,8.81,8.33, 9.96,7.24,4.26,10.84,4.82,5.68]

Data Set A does indeed fit a linear regression – and so this would be appropriate to use the line of best fit for predictive purposes.

**Data Set B:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [9.14,8.14,8.74,8.77,9.26,8.1,6.13,3.1,9.13,7.26,4.74]

You could fit a linear regression to Data Set B – but this is clearly not the most appropriate regression line for this data. Some quadratic or higher power polynomial would be better for predicting data here.

**Data Set C:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73]

In Data set C we can see the effect of a single outlier – we have 11 points in pretty much a perfect linear correlation, and then a single outlier. For predictive purposes we would be best investigating this outlier (checking that it does conform to the mathematical definition of an outlier), and then potentially doing our regression with this removed.

**Data Set D:**

x = [8,8,8,8,8,8,8,19,8,8,8]

y = [6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.50,5.56,7.91,6.89]

In Data set D we can also see the effect of a single outlier – we have 11 points in a vertical line, and then a single outlier. Clearly here again drawing a line of best fit for this data is not appropriate – unless we remove this outlier first.

**The moral of the story**

So – the moral here is always use graphical analysis alongside statistical measures. A very common mistake for IB students is to rely on Pearson’s Product coefficient without really looking at the scatter graph to decide whether a linear fit is appropriate. If you do this then you could end up with a very low mark in the E category as you will not show good understanding of what you are doing. So always plot a graph first!

**Martingale II and Currency Trading**

We can use computer coding to explore game strategies and also to help understand the underlying probability distribution functions. Let’s start with a simple game where we toss a coin 4 times, stake 1 counter each toss and always call heads. This would give us a binomial distribution with 4 trials and the probability of success fixed as 1/2.

**Tossing a coin 4 time [simple strategy]**

For example the only way of losing 4 counters is a 4 coin streak of T,T,T,T. The probability of this happening is 1/16. We can see from this distribution that the most likely outcome is 0 (i.e no profit and no loss). If we work out the expected value, E(X) by multiplying profit/loss by frequencies and summing the result we get E(X) = 0. Therefore this is a fair game (we expect to neither make a profit nor a loss).

**Tossing a coin 4 time [Martingale strategy]**

This is a more complicated strategy which goes as follows:

1) You stake 1 counter on heads.

b) if you lose you stake 2 counters on heads

c) if you lose you stake 4 counters on heads

d) if you lose you stake 8 counters on heads.

If you win, the your next stake is always to go back to staking 1 counter.

**For example ****for the sequence: H,H,T,T **

First you bet 1 counter on heads. You win 1 counter

Next you bet 1 counter on heads. You win 1 counter

Next you bet 1 counters on heads. You lose 1 counter

Next you bet 2 counters on heads. You lose 2 counters

[overall loss is 1 counter]

**For example for the sequence: T,T,T,H **

First you bet 1 counter on heads. You lose 1 counter

Next you bet 2 counter on heads. You lose 2 counters

Next you bet 4 counters on heads. You lose 4 counter

Next you bet 8 counters on heads. You win 8 counters

[overall profit is 1 counter]

This leads to the following probabilities:

Once again we will have E(X) = 0, but a very different distribution to the simple 4 coin toss. We can see we have an 11/16 chance of making a profit after 4 coins – but the small chance of catastrophic loss (15 counters) means that the overall expectation is still zero.

**Iterated Martingale:**

Here we can do a computer simulation. This is the scenario this time:

We start with 100 counters, we toss a coin for a maximum of 3 times. We then define a completed round as when we get to a shaded box. We then repeat this process through 999 rounds, and model what happens. Here I used a Python program to simulate a player using this strategy.

We can see that we have periods of linear growth followed by steep falls – which is a very familiar pattern across many investment types. We can see that the initial starting 100 counters was built up to around 120 at the peak, but was closer to just 40 when we finished the simulation.

Let’s do another simulation to see what happens this time:

Here we can see that the 2nd player was actually performing significantly worse after around 600 rounds, but this time ended up with a finishing total of around 130 counters.

**Changing the ****multiplier**

We can also see what happens when rather than doubling stakes on losses we follow some other multiple. For example we might choose to multiply our stake by 5. This leads to much greater volatility as we can see below:

**Multiplier x5**

Here we have 2 very different outcomes for 2 players using the same model. Player 1 (in blue) may believe they have found a sure-fire method of making huge profits, but player 2 (green) went bankrupt after around 600 rounds.

**Multiplier x1.11**

Here we can see that if the multiplier is close to 1 we have much less volatility (as you would expect because your maximum losses per round are much smaller).

We can run the simulation across 5000 rounds – and here we can see that we have big winning and losing streaks, but that over the long run the account value oscillates around the starting value of 100 counters.

**Forex charts**

We can see similar graphs when we look at forex (currency exchange) charts. For example:

In this graph (from here) we plot the exchange between US dollar and Thai Baht. We can see the same sort of graph movements – with run of gains and losses leading to a similar jagged shape. This is not surprising as forex trades can also be thought of in terms of 2 binary outcomes like tossing a coin, and indeed huge amounts of forex trading is done through computer programs, some of which do use the Martingale system as a basis.

**The effect of commission on the model**

So, to finish off we can modify our system slightly so that we try to replicate forex trading. We will follow the same model as before, but this time we have to pay a very small commission for every trade we make. This now gives us:

E(X) = -0.000175. (0.0001 counters commission per trade)

E(X) = -0.00035. (0.0002 counters commission per trade)

Even though E(X) is very slightly negative, it means that in the long run we would expect to lose money. With the 0.0002 counters commission we would expect to lose around 20 counters over 50,000 rounds. The simulation graph above was run with 0.0002 counters commission – and in this case it led to bankruptcy before 3000 rounds.

**Computer code**

The Python code above can be used to generate data which can then be copied into Desmos. The above code simulates 1 player playing 999 rounds, starting with 100 counters, with a multiplier of 5. If you know a little bit about coding you can try and play with this yourselves!

I’ve also just added a version of this code onto repl. You can run this code – and also generate the graph direct (click on the graph png after running). It creates some beautiful images like that shown above.

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

**Using Maths to model the spread of Coronavirus (COVID-19)**

This coronavirus is the latest virus to warrant global fears over a disease pandemic. Throughout history we have seen pandemic diseases such as the Black Death in Middle Ages Europe and the Spanish Flu at the beginning of the 20th century. More recently we have seen HIV responsible for millions of deaths. In the last few years there have been scares over bird flu and SARS – yet neither fully developed into a major global health problem. So, how contagious is COVID-19, and how can we use mathematics to predict its spread?

Modelling disease outbreaks with real accuracy is an incredibly important job for mathematicians and all countries employ medical statisticians for this job . Understanding how diseases spread and how fast they can spread through populations is essential to developing effective medical strategies to minimise deaths. If you want to save lives maybe you should become a mathematician rather than a doctor!

Currently scientists know relatively little about the new virus – but they do know that it’s the same coronavirus family as SARS and MERS which can both cause serious respiratory problems. Scientists are particularly interested in trying to discover how infectious the virus is, how long a person remains contagious, and whether people can be contagious before they show any symptoms.

**In the case of COVID-19 we have the following early estimated values: **[From a paper published by medical statisticians in the UK on January 24]

**R _{0}. between 3.6 and 4.** This is defined as how many people an infectious person will pass on their infection to in a totally susceptible population. The higher the R

_{0}. value the more quickly an infection will spread. By comparison seasonal flu has a R

_{0}. value around 2.8.

**Total number infected** by January 21: prediction interval 9,217–14,245. Of these an estimated 3,050–4,017 currently with the virus and the others recovered (or died). This is based on an estimation that only around 5% of cases have been diagnosed. By February 4th they predict 132,751–273,649 will be infected.

**Transmission rate β** estimated at 1.07. β represents the transmission rate per day – so on average an infected person will infect another 1.07 people a day.

**Infectious period** estimated at 3.6 days. We can therefore calculate μ (the per capita recovery rate) by μ = 1/(3.6). This tells us how quickly people will be removed from the population (either recovered and become immune or died)

**SIR Model**

The basic model is based on the SIR model. The SIR model looks at how much of the population is susceptible to infection (S), how many of these go on to become infectious (I), and how many of these are removed (R) from the population being considered (i.e they either recover and thus won’t catch the virus again, or die).

The Guardian datablog have an excellent graphic to show the contagiousness relative to deadliness of different diseases [click to enlarge, or follow the link]. We can see that seasonal flu has an R_{0}. value of around 2.8 and a fatality rate of around 0.1%, whereas measles has an R_{0}. value of around 15 and a fatality rate of around 0.3%. This means that measles is much more contagious than seasonal flu.

You can notice that we have nothing in the top right hand corner (very deadly and very contagious). This is just as well as that could be enough to seriously dent the human population. Most diseases we worry about fall into 2 categories – contagious and not very deadly or not very contagious and deadly.

The equations above represent a SIR (susceptible, infectious, removed) model which can be used to model the spread of diseases like flu.

dS/dt represents the rate of change of those who are susceptible to the illness with respect to time. dI/dt represents the rate of change of those who are infected with respect to time. dR/dt represents the rate of change of those who have been removed with respect to time (either recovered or died).

For example, if dI/dt is high then the number of people becoming infected is rapidly increasing. When dI/dt is zero then there is no change in the numbers of people becoming infected (number of infections remain steady). When dI/dt is negative then the numbers of people becoming infected is decreasing.

**Modelling for COVID-19**

N is the total population. Let’s take as the population of Wuhan as 11 million.

μ is the per capita recovery (Calculated by μ = 1/(duration of illness) ). We have μ = 1/3.6 = 5/18.

β the transmission rate as approximately 1.07

Therefore our 3 equations for rates of change become:

dS/dt = -1.07 S I /11,000,000

dI/dt = 1.07 S I /11,000,000 – 5/18 I

dR/dt = 5/18 I

Unfortunately these equations are very difficult to solve – but luckily we can use a computer program or spreadsheet to plot what happens. We need to assign starting values for S, I and R – the numbers of people susceptible, infectious and removed. With the following values for January 21: S = 11,000,000, I = 3500, R = 8200, β = 1.07, μ = 5/18, I designed the following Excel spreadsheet (instructions on what formula to use here):

This gives a prediction that around 3.9 million people infected within 2 weeks! We can see that the SIR model that we have used is quite simplistic (and significantly different to the expert prediction of around 200,000 infected).

So, we can try and make things more realistic by adding some real life considerations. The current value of β (the transmission rate) is 1.07, i.e an infected person will infect another 1.07 people each day. We can significantly reduce this if we expect that infected people are quarantined effectively so that they do not interact with other members of the public, and indeed if people who are not sick avoid going outside. So, if we take β as (say) 0.6 instead we get the following table:

Here we can see that this change to β has had a dramatic effect to our model. Now we are predicting around 129,000 infected after 14 days – which is much more in line with the estimate in the paper above.

As we are seeing exponential growth in the spread, small changes to the parameters will have very large effects. There are more sophisticated SIR models which can then be used to better understand the spread of a disease. Nevertheless we can see clearly from the spreadsheet the interplay between susceptible, infected and recovered which is the foundation for understanding the spread of viruses like COVID-19.

[Edited in March to use the newly designated name COVID-19]

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

**Simulating a Football Season**

This is a nice example of how statistics are used in modeling – similar techniques are used when gambling companies are creating odds or when computer game designers are making football manager games. We start with some statistics. The soccer stats site has the data we need from the 2018-19 season, and we will use this to predict the outcome of the 2019-20 season (assuming teams stay at a similar level, and that no-one was relegated in 2018-19).

**Attack and defense strength**

For each team we need to calculate:

- Home attack strength
- Away attack strength
- Home defense strength
- Away defense strength.

For example for Liverpool (LFC)

LFC Home attack strength = (LFC home goals in 2018-19 season)/(average home goals in 2018-19 season)

LFC Away attack strength = (LFC away goals in 2018-19 season)/(average away goals in 2018-19 season)

LFC Home defense strength = (LFC home goals conceded in 2018-19 season)/(average home goals conceded in 2018-19 season)

LFC Away defense strength = (LFC away goals conceded in 2018-19 season)/(average away goals conceded in 2018-19 season)

**Calculating lamda**

We can then use a Poisson model to work out some probabilities. First though we need to find our lamda value. To make life easier we can also use the fact that the lamda value for a Poisson gives the mean value – and use this to give an approximate answer.

So, for example if Liverpool are playing at home to Arsenal we work out Liverpool’s lamda value as:

LFC home lamda = league average home goals per game x LFC home attack strength x Arsenal away defense strength.

We would work out Arsenal’s away lamda as:

Arsenal away lamda = league average away goals per game x Arsenal away attack strength x Liverpool home defense strength.

Putting in some values gives a home lamda for Liverpool as 3.38 and an away lamda for Arsenal as 0.69. So we would expect Liverpool to win 3-1 (rounding to the nearest integer).

**Using Excel**

I then used an Excel spreadsheet to work out the home goals in each fixture in the league season (green column represents the home teams)

and then used the same method to work out the away goals in each fixture in the league (yellow column represents the away team)

I could then round these numbers to the nearest integer and fill in the scores for each match in the table:

Then I was able to work out the point totals to produce a predicted table:

Here we had both Liverpool and Manchester City on 104 points, but with Manchester City having a better goal difference, so winning the league again.

**Using a Poisson model.**

The poisson model allows us to calculate probabilities. The mode is:

P(k goals) = (e^{-λ}λ^{k})/k!

λ is the symbol lamda which we calculated before.

So, for example with Liverpool at home to Arsenal we calculate

Liverpool’s home lamda = league average home goals per game x LFC home attack strength x Arsenal away defense strength.

**Liverpool’s home lamda = 1.57 x 1.84 x 1.17 = 3.38**

Therefore

P(Liverpool score 0 goals) = (e^{-3.38}3.38^{0})/0! = 0.034

P(Liverpool score 1 goal) = (e^{-3.38}3.38^{1})/1! = 0.12

P(Liverpool score 2 goals) = (e^{-3.38}3.38^{2})/2! = 0.19

P(Liverpool score 3 goals) = (e^{-3.38}3.38^{3})/3! = 0.22

P(Liverpool score 4 goals) = (e^{-3.38}3.38^{1})/1! = 0.19

P(Liverpool score 5 goals) = (e^{-3.38}3.38^{5})/5! = 0.13 etc.

**Arsenal’s away lamda = 1.25 x 1.30 x 0.42 = 0.68**

P(Arsenal score 0 goals) = (e^{-0.68}0.68^{0})/0! = 0.51

P(Arsenal score 1 goal) = (e^{-0.68}0.68^{1})/1! = 0.34

P(Arsenal score 2 goals) = (e^{-0.68}0.68^{2})/2! = 0.12

P(Arsenal score 3 goals) = (e^{-0.68}0.68^{3})/3! = 0.03 etc.

**Probability that Arsenal win**

Arsenal can win if:

Liverpool score 0 goals and Arsenal score 1 or more

Liverpool score 1 goal and Arsenal score 2 or more

Liverpool score 2 goals and Arsenal score 3 or more etc.

i.e the approximate probability of Arsenal winning is:

0.034 x 0.49 + 0.12 x 0.15 + 0.19 x 0.03 = 0.04.

Using the same method we could work out the probability of a draw and a Liverpool win. This is the sort of method that bookmakers will use to calculate the probabilities that ensure they make a profit when offering odds.

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

**Quantum Mechanics – Statistical Universe**

Quantum mechanics is the name for the mathematics that can describe physical systems on extremely small scales. When we deal with the macroscopic – i.e scales that we experience in our everyday physical world, then Newtonian mechanics works just fine. However on the microscopic level of particles, Newtonian mechanics no longer works – hence the need for quantum mechanics.

Quantum mechanics is both very complicated and very weird – I’m going to try and give a very simplified (though not simple!) example of how *probabilities *are at the heart of quantum mechanics. Rather than speaking with certainty about the property of an object as we can in classical mechanics, we need to take about the probability that it holds such a property.

For example, one property of particles is *spin. *We can have create a particle with the property of either *up* spin or *down* spin. We can visualise this as an arrow pointing up or down:

We can then create an apparatus (say the slit below parallel to the z axis) which measures whether the particle is in either up state or down state. If the particle is in up spin then it will return a value of +1 and if it is in down spin then it will return a value of -1.

So far so normal. But here is where things get weird. If we then rotate the slit 90 degrees clockwise so that it is parallel to the x axis, we would expect from classical mechanics to get a reading of 0. i.e the “arrow” will not fit through the slit. However that is not what happens. Instead we will still get readings of -1 or +1. However if we run the experiment a large number of times we find that the mean average reading will indeed be 0!

What has happened is that the act of measuring the particle with the slit has changed the state of the particle. Say it was previously +1, i.e in *up* spin, by measuring it with the newly rotated slit we have forced the particle into a new state of either pointing right (*right* spin) or pointing left (*left* spin). Our rotated slit will then return a value of +1 if the particle is in right spin, and will return a value of -1 if the particle in in left spin.

In this case the probability that the apparatus will return a value of +1 is 50% and the probability that the apparatus will return a value of -1 is also 50%. Therefore when we run this experiment many times we get the average value of 0. Therefore classical mechanics is achieved as an probabilistic approximation of repeated particle interactions

We can look at a slightly more complicated example – say we don’t rotate the slit 90 degrees, but instead rotate it an arbitrary number of degrees from the z axis as pictured below:

Here the slit was initially parallel to the z axis in the x,y plane (i.e y=0), and has been rotated Θ degrees. So the question is what is the probability that our previously *up* spin particle will return a value of +1 when measured through this new slit?

The equations above give the probabilities of returning a +1 spin or a -1 spin depending on the angle of orientation. So in the case of a 90 degree orientation we have both P(+1) and P(-1) = 1/2 as we stated earlier. An orientation of 45 degrees would have P(+1) = 0.85 and P(-1) = 0.15. An orientation of 10 degrees would have P(+1) = 0.99 and P(-1) = 0.01.

The statistical average meanwhile is given by the above formula. If we rotate the slit by Θ degrees from the z axis in the x,z plane, then run the experiment many times, we will get a long term average of cosΘ. As we have seen before, when Θ = 90 this means we get an average value of 0. if Θ = 45 degrees we would get an average reading of √2/2.

This gives a very small snapshot into the ideas of quantum mechanics and the crucial role that probability plays in understanding quantum states. If you found that difficult, then don’t worry you’re in good company. As Richard Feynman the legendary physicist once said, “If you think you understand quantum mechanics, you don’t understand quantum mechanics.”

Essential resources for IB students:

2) Exploration Guides and Paper 3 Resources

**Predicting the UK election using linear regression**

The above data is the latest opinion poll data from the Guardian. The UK will have (another) general election on June 8th. So can we use the current opinion poll data to predict the outcome?

**Longer term data trends**

Let’s start by looking at the longer term trend following the aftermath of the Brexit vote on June 23rd 2016. I’ll plot some points for Labour and the Conservatives and see what kind of linear regression we get. To keep things simple I’ve looked at randomly chosen poll data approximately every 2 weeks – assigning 0 to July 1st 2016, 1 to mid July, 2 to August 1st etc. This has then been plotted using the fantastic Desmos.

**Labour**

You can see that this is not a very good fit – it’s a very weak correlation. Nevertheless let’s see what we would get if we used this regression line to predict the outcome in June. With the x axis scale I’ve chosen, mid June 2017 equates to 23 on the x axis. Therefore we predict the percentage as

y = -0.130(23) + 30.2

y = 27%

Clearly this would be a disaster for Labour – but our model is not especially accurate so perhaps nothing to worry about just yet.

**Conservatives**

As with Labour we have a weak correlation – though this time we have a positive rather than negative correlation. If we use our regression model we get a prediction of:

y = 0.242(23) + 38.7

y = 44%

So, we are predicting a crushing victory for the Conservatives – but could we get some more accurate models to base this prediction on?

**Using moving averages**

The Guardian’s poll tracker at the top of the page uses moving averages to smooth out poll fluctuations between different polls and to arrive at an averaged poll figure. Using this provides a stronger correlation:

**Labour**

This model doesn’t take into account a (possible) late surge in support for Labour but does fir better than our last graph. Using the equation we get:

y = -0.0764(23) + 28.8

y = 27%

**Conservatives**

We can have more confidence in using this regression line to predict the election. Putting in the numbers we get:

y = 0.411(23) + 36.48

y = 46%

**Conclusion**

Our more accurate models merely confirm what we found earlier – and indeed what all the pollsters are predicting – a massive win for the Conservatives. Even allowing for a late narrowing of the polls the Conservatives could be on target for winning by over 10% points – which would result in a very large majority. Let’s see what happens!

Essential resources for IB students:

2) Exploration Guides and Paper 3 Resources

**Modelling Radioactive decay**

We can model radioactive decay of atoms using the following equation:

**N(t) = N _{0} e^{-λt}**

Where:

**N _{0}**: is the initial quantity of the element

**λ**: is the radioactive decay constant

**t**: is time

**N(t)**: is the quantity of the element remaining after time t.

So, for Carbon-14 which has a half life of 5730 years (this means that after 5730 years exactly half of the initial amount of Carbon-14 atoms will have decayed) we can calculate the decay constant **λ. **

After 5730 years, N(5730) will be exactly half of N_{0}, therefore we can write the following:

**N(5730) = 0.5N _{0} = N_{0} e^{-λt}**

therefore:

**0.5 = e ^{-λt}**

and if we take the natural log of both sides and rearrange we get:

**λ = ln(1/2) / -5730**

**λ ≈0.000121**

We can now use this to solve problems involving Carbon-14 (which is used in Carbon-dating techniques to find out how old things are).

eg. You find an old parchment and after measuring the Carbon-14 content you find that it is just 30% of what a new piece of paper would contain. How old is this paper?

We have

**N(t) = N _{0} e^{-0.000121t}**

**N(t)/N _{0}** =

**e**

^{-0.000121t}**0.30** = **e ^{-0.000121t}**

**t = ln(0.30)/(-0.000121)**

**t = 9950 years old.**

**Probability density functions**

We can also do some interesting maths by rearranging:

**N(t) = N _{0} e^{-λt}**

**N(t)/N _{0}** =

**e**

^{-λt}and then plotting **N(t)/N _{0}** against time.

**N(t)/N _{0}** will have a range between 0 and 1 as when t = 0,

**N(0)**=

**N**which gives

_{0}**N(0)**/

**N(0)**= 1.

We can then manipulate this into the form of a probability density function – by finding the constant a which makes the area underneath the curve equal to 1.

solving this gives a = λ. Therefore the following integral:

will give the fraction of atoms which will have decayed between times t1 and t2.

We could use this integral to work out the half life of Carbon-14 as follows:

Which if we solve gives us t = 5728.5 which is what we’d expect (given our earlier rounding of the decay constant).

We can also now work out the expected (mean) time that an atom will exist before it decays. To do this we use the following equation for finding E(x) of a probability density function:

and if we substitute in our equation we get:

Now, we can integrate this by parts:

So the expected (mean) life of an atom is given by 1/λ. In the case of Carbon, with a decay constant λ ≈0.000121 we have an expected life of a Carbon-14 atom as:

E(t) = 1 /0.000121

E(t) = 8264 years.

Now that may sound a little strange – after all the half life is 5730 years, which means that half of all atoms will have decayed after 5730 years. So why is the mean life so much higher? Well it’s because of the long right tail in the graph – we will have some atoms with very large lifespans – and this will therefore skew the mean to the right.

**IB Revision**

If you’re already thinking about your coursework then it’s probably also time to start planning some revision, either for the end of Year 12 school exams or Year 13 final exams. There’s a really great website that I would strongly recommend students use – you choose your subject (HL/SL/Studies if your exam is in 2020 or Applications/Analysis if your exam is in 2021), and then have the following resources:

The Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and each area then has a number of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to ready made exams on each topic – again with worked solutions. This also has some harder exams for those students aiming for 6s and 7s and the Past IB Exams section takes you to full video worked solutions to every question on every past paper – and you can also get a prediction exam for the upcoming year.

I would really recommend everyone making use of this – there is a mixture of a lot of free content as well as premium content so have a look and see what you think.

**Modeling Volcanoes – When will they erupt?**

A recent post by the excellent Maths Careers website looked at how we can model volcanic eruptions mathematically. This is an important branch of mathematics – which looks to assign risk to events and these methods are very important to statisticians and insurers. Given that large-scale volcanic eruptions have the potential to end modern civilisation, it’s also useful to know how likely the next large eruption is.

The Guardian has recently run a piece on the dangers that large volcanoes pose to humans. Iceland’s Eyjafjallajökull volcano which erupted in 2010 caused over 100,000 flights to be grounded and cost the global economy over $1 billion – and yet this was only a very minor eruption historically speaking. For example, the Tombora eruption in Indonesia (1815) was so big that the explosion could be heard over 2000km away, and the 200 million tones of sulpher that were emitted spread across the globe, lowering global temperatures by 2 degrees Celsius. This led to widespread famine as crops failed – and tens of thousands of deaths.

**Super volcanoes**

Even this destruction is insignificant when compared to the potential damage caused by a super volcano. These volcanoes, like that underneath Yellowstone Park in America, have the potential to wipe-out millions in the initial explosion and and to send enough sulpher and ash into the air to cause a “volcanic winter” of significantly lower global temperatures. The graphic above shows that the ash from a Yellowstone eruption could cover the ground of about half the USA. The resultant widespread disruption to global food supplies and travel would be devastating.

So, how can we predict the probability of a volcanic eruption? The easiest model to use, if we already have an estimated probability of eruption is the Poisson distribution:

This formula calculates the probability that X equals a given value of k. λ is the mean of the distribution. If X represents the number of volcanic eruptions we have Pr(X ≥1) = 1 – Pr(x = 0). This gives us a formula for working out the probability of an eruption as 1 -e^{-λ}. For example, the Yellowstone super volcano erupts around every 600,000 years. Therefore if λ is the number of eruptions every year, we have λ = 1/600,000 ≈ 0.00000167 and 1 -e ^{-λ} also ≈ 0.00000167. This gets more interesting if we then look at the probability over a range of years. We can do this by modifying the formula for probability as 1 -e^{-tλ} where t is the number of years for our range.

So the probability of a Yellowstone eruption in the next 1000 years is 1 -e^{-0.00167} ≈ 0.00166, and the probability in the next 10,000 years is 1 -e^{-0.0167} ≈ 0.0164. So we have approximately a 2% chance of this eruption in the next 10,000 years.

A far smaller volcano, like Katla in Iceland has erupted 16 times in the past 1100 years – giving a average eruption every ≈ 70 years. This gives λ = 1/70 ≈ 0.014. So we can expect this to erupt in the next 10 years with probability 1 -e^{-0.14} ≈ 0.0139. And in the next 30 years with probability 1 -e^{-0.42} ≈ 0.34.

The models for volcanic eruptions can get a lot more complicated – especially as we often don’t know the accurate data to give us an estimate for the λ. λ can be estimated using a technique called Maximum Likelihood Estimation – which you can read about here.

If you enjoyed this post you might also like:

Black Swans and Civilisation Collapse. How effective is maths at guiding government policies?

**IB Revision**

If you’re already thinking about your coursework then it’s probably also time to start planning some revision, either for the end of Year 12 school exams or Year 13 final exams. There’s a really great website that I would strongly recommend students use – you choose your subject (HL/SL/Studies if your exam is in 2020 or Applications/Analysis if your exam is in 2021), and then have the following resources:

The Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and each area then has a number of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to ready made exams on each topic – again with worked solutions. This also has some harder exams for those students aiming for 6s and 7s and the Past IB Exams section takes you to full video worked solutions to every question on every past paper – and you can also get a prediction exam for the upcoming year.

I would really recommend everyone making use of this – there is a mixture of a lot of free content as well as premium content so have a look and see what you think.

**Are you Psychic?**

There have been people claiming to have paranormal powers for thousands of years. However, scientifically we can say that as yet we still have no convincing proof that any paranormal abilities exist. We can show this using some mathematical tests – such as the binomial or normal distribution.

**ESP Test **

You can test your ESP powers on this site (our probabilities will be a little different than their ones). You have the chance to try and predict what card the computer has chosen. After repeating this trial 25 times you can find out if you possess psychic powers. As we are working with discrete data and have a fixed probability of guessing (0.2) then we can use a binomial distribution. Say I got 6 correct, do I have psychic powers?

We have the Binomial model B(25, 0.2), 25 trials and 0.2 probability of success. So we want to find the probability that I could achieve 6 **or more** by luck.

The probability of getting exactly 6 right is 0.16. Working out the probability of getting 6 or more correct would take a bit longer by hand (though could be simplified by doing 1 – P(x ≤ 5). Doing this, or using a calculator we find the probability is 0.38. Therefore we would expect someone to get 6 or more correct just by guessing 38% of the time.

So, using this model, when would we have evidence for potential ESP ability? Well, a minimum bar for our percentages would probably be 5%. So how many do you need to get correct before there is less than a 5% of that happening by chance?

Using our calculator we can do trial and error to see that the probability of getting 9 or more correct by guessing is only 4.7%. So, someone getting 9 correct might be showing some signs of ESP. If we asked for a higher % threshold (such as 1%) we would want to see someone get 11 correct.

Now, in the video above, one of the Numberphile mathematicians manages to toss 10 heads in a row. Again, we can ask ourselves if this is evidence of some extraordinary ability. We can calculate this probability as 0.5^{10} = 0.001. This means that such an event would only happen 0.1% of the time. But, we’re only seeing a very small part of the total video. Here’s the full version:

Suddenly the feat looks less mathematically impressive (though still an impressive endurance feat!)

You can also test your psychic abilities with this video here.