You are currently browsing the category archive for the ‘statistics’ category.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Anscombe’s Quartet – the importance of graphs!

Anscombe’s Quartet was devised by the statistician Francis Anscombe to illustrate how important it was to not just rely on statistical measures when analyzing data.  To do this he created 4 data sets which would produce nearly identical statistical measures.  The scatter graphs above generated by the Python code here.

Statistical measures

1) Mean of x values in each data set = 9.00
2) Standard deviation of x values in each data set  = 3.32
3) Mean of y values in each data set = 7.50
4) Standard deviation of x values in each data set  = 2.03
5) Pearson’s Correlation coefficient for each paired data set = 0.82
6) Linear regression line for each paired data set: y = 0.500x + 3.00

When looking at this data we would be forgiven for concluding that these data sets must be very similar – but really they are quite different.

Data Set A:

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [8.04, 6.95,7.58,8.81,8.33, 9.96,7.24,4.26,10.84,4.82,5.68]

Data Set A does indeed fit a linear regression – and so this would be appropriate to use the line of best fit for predictive purposes.

Data Set B:

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [9.14,8.14,8.74,8.77,9.26,8.1,6.13,3.1,9.13,7.26,4.74]

You could fit a linear regression to Data Set B – but this is clearly not the most appropriate regression line for this data.  Some quadratic or higher power polynomial would be better for predicting data here.

Data Set C:

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73]

In Data set C we can see the effect of a single outlier – we have 11 points in pretty much a perfect linear correlation, and then a single outlier.  For predictive purposes we would be best investigating this outlier (checking that it does conform to the mathematical definition of an outlier), and then potentially doing our regression with this removed.

Data Set D:

x = [8,8,8,8,8,8,8,19,8,8,8]

y = [6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.50,5.56,7.91,6.89]

In Data set D we can also see the effect of a single outlier – we have 11 points in a vertical line, and then a single outlier.  Clearly here again drawing a line of best fit for this data is not appropriate – unless we remove this outlier first.

The moral of the story

So – the moral here is always use graphical analysis alongside statistical measures.  A very common mistake for IB students is to rely on Pearson’s Product coefficient without really looking at the scatter graph to decide whether a linear fit is appropriate.  If you do this then you could end up with a very low mark in the E category as you will not show good understanding of what you are doing.  So always plot a graph first!

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Generating e through probability and hypercubes

This is a really beautiful solution to an interesting probability problem posed by fellow IB teacher Daniel Hwang, for which I’ve outlined a method for solving suggested by Ferenc Beleznay.  The problem is as follows:

On average, how many random real numbers from 0 to 1 (inclusive) are required for the sum to exceed 1?

1 number

Clearly if we choose only 1 number then we can’t exceed 1.

2 numbers

Here we imagine the 2 numbers we pick as x and y and therefore we can represent them as a coordinate pair.  The smallest pair (0,0) and the largest pair (1,1).  This means that the possible coordinates fit inside the unit square shown above.  We want to know for what coordinate pairs we have the inequality x + y > 1.  This can be rearrange to give y > 1-x.  The line y = 1-x is plotted and we can see that any coordinate points in the triangle BCD satisfy this inequality.  Therefore the probability of a random coordinate pair being in this triangle is 1/2.

3 numbers

This time we want to find the probability that we exceed 1 with our third number.  We can consider the numbers as x, y, z and therefore as 3D coordinates (x,y,z).  From the fact that we are choosing a third number we must already have x +y <1. We draw the line x+y = 1, which in 3D gives us a plane.  The volume in which our coordinate point must lie is the prism ABDEFG.

We now also add the constraint x+y+z >1.  This creates the plane as shown.  If our coordinate lies inside the pyramid ABDE then our coordinates will add to less than 1, outside this they will add to more than 1.

The volume of the pyramid ABDE = 1/3 (base area)(perpendicular height).

The volume of the prism ABDEFG =  (base area)(perpendicular height).

Given that they share the same perpendicular height and base area then precisely 1/3 of the available volume would give a coordinate point that adds to less than 1, and 2/3 of the available volume would give a coordinate point that adds to more than 1.

Therefore we have the following tree diagram:

Exceeds 1 with 2 numbers = 1/2

Does not exceed 1 with 2 numbers, exceeds 1 with 3 numbers = 1/2 x 2/3 = 1/3.

Does not exceed 1 with 2 numbers, does not exceed 1 with 3 numbers = 1/2 x 1/3 = 1/6.

4 numbers

If you been following so far this is where things get interesting!  We can now imagine a 4 dimensional unit cube (image above from Wikipedia) and a 4D coordinate point (x,y,z,a).

Luckily all we care about is the ratio of the 4-D pyramid and the 4-D prim formed by our constraints x+y+z <1 and x+y+z+a >1.

We have the following formula to help:

The n-D volume of a n-D pyramid = 1/n (base)(perpendicular height).

Therefore:

The 4-D volume of a 4-D pyramid = 1/4 (base 3D volume)(perpendicular height).

The 4-D volume of the prism ABDEFG = (base 3D volume)(perpendicular height).

Given that the 2 shapes share the same base and perpendicular height,  the hyper-pyramid occupies exactly 1/4 of the 4-D space of the hyper-prism.  So the probability of being in this space is 1/4 and 3/4 of being outside this space.

We can now extend our tree diagram:

Does not exceed 1 with 2 numbers, does not exceed 1 with 3 numbers, exceeds with 4 numbers = 1/2 x 1/3 x 3/4 = 1/8

Does not exceed 1 with 2 numbers, does not exceed 1 with 3 numbers, does not exceed with 4 numbers = 1/2 x 1/3 x 1/4 = 1/24.

In general a hyper-pyramid in n dimensional space occupies exactly 1/n of the space of the hyper-prism – so we can now continue this tree diagram.

Expected value

We can make a table of probabilities to find how many numbers we expect to use in order to exceed one.

Which gives us the following expected value calculation:

Which we can rewrite as:

But we have:

Therefore this gives:

So on average we would need to pick numbers for the sum to exceed one! This is quite a remarkable result – e, one of the fundamental mathematical constants has appeared as if by magic on a probability question utilizing hyper-dimensional shapes.

Demonstrating this with Python

Running the Python code shown above will simulate doing this experiment.  The computer generates a “random” number, then another and carries on until the sum is greater than 1.  It then records how many numbers were required.  It then does this again 1 million times and finds the average from all the trials.

1 million simulations gives 2.7177797177797176.  When we compare this with the real answer for e, 2.7182818284590452353602874713527, we can see it has taken 1 million simulations to only be correct to 4sf.

Even 5 million simulations only gives 2.7182589436517888, so whilst we can clearly see that we will eventually get e, it’s converging very slowly.  This may be because we are reliant on a random number generator which is not truly random (and only chooses numbers to a maximum number of decimal places rather than choosing from all values between 0 and 1).

I think this is a beautiful example of the unexpected nature of mathematics – we started out with a probability problem and ended up with e, via a detour into higher dimensional space!  We can also see the power of computers in doing these kinds of brute force calculations.

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

We can use computer coding to explore game strategies and also to help understand the underlying probability distribution functions.   Let’s start with a simple game where we toss a coin 4 times, stake 1 counter each toss and always call heads.  This would give us a binomial distribution with 4 trials and the probability of success fixed as 1/2.

Tossing a coin 4 time [simple strategy]

For example the only way of losing 4 counters is a 4 coin streak of T,T,T,T.  The probability of this happening is 1/16.  We can see from this distribution that the most likely outcome is 0 (i.e no profit and no loss).  If we work out the expected value, E(X) by multiplying profit/loss by frequencies and summing the result we get E(X) = 0.  Therefore this is a fair game (we expect to neither make a profit nor a loss).

Tossing a coin 4 time [Martingale strategy]

This is a more complicated strategy which goes as follows:

1) You stake 1 counter on heads.
b) if you lose you stake 2 counters on heads
c) if you lose you stake 4 counters on heads
d) if you lose you stake 8 counters on heads.

If you win, the your next stake is always to go back to staking 1 counter.

For example for the sequence: H,H,T,T

First you bet 1 counter on heads.  You win 1 counter
Next you bet 1 counter on heads.  You win 1 counter
Next you bet 1 counters on heads.  You lose 1 counter
Next you bet 2 counters on heads.  You lose 2 counters

[overall loss is 1 counter]

For example for the sequence: T,T,T,H

First you bet 1 counter on heads.  You lose 1 counter
Next you bet 2 counter on heads.  You lose 2 counters
Next you bet 4 counters on heads.  You lose 4 counter
Next you bet 8 counters on heads.  You win 8 counters

[overall profit is 1 counter]

This leads to the following probabilities:

Once again we will have E(X) = 0, but a very different distribution to the simple 4 coin toss.  We can see we have an 11/16 chance of making a profit after 4 coins – but the small chance of catastrophic loss (15 counters) means that the overall expectation is still zero.

Iterated Martingale:

Here we can do a computer simulation.  This is the scenario this time:

We start with 100 counters, we toss a coin for a maximum of 3 times. We then define a completed round as when we get to a shaded box.  We then repeat this process through 999 rounds, and model what happens. Here I used a Python program to simulate a player using this strategy.

We can see that we have periods of linear growth followed by steep falls – which is a very familiar pattern across many investment types.  We can see that the initial starting 100 counters was built up to around 120 at the peak, but was closer to just 40 when we finished the simulation.

Let’s do another simulation to see what happens this time:

Here we can see that the 2nd player was actually performing significantly worse after around 600 rounds, but this time ended up with a finishing total of around 130 counters.

Changing the multiplier

We can also see what happens when rather than doubling stakes on losses we follow some other multiple.  For example we might choose to multiply our stake by 5.  This leads to much greater volatility as we can see below:

Multiplier x5

Here we have 2 very different outcomes for 2 players using the same model.  Player 1 (in blue) may believe they have found a sure-fire method of making huge profits, but player 2 (green) went bankrupt after around 600 rounds.

Multiplier x1.11

Here we can see that if the multiplier is close to 1 we have much less volatility (as you would expect because your maximum losses per round are much smaller).

We can run the simulation across 5000 rounds – and here we can see that we have big winning and losing streaks, but that over the long run the account value oscillates around the starting value of 100 counters.

Forex charts

We can see similar graphs when we look at forex (currency exchange) charts.  For example:

In this graph (from here) we plot the exchange between US dollar and Thai Baht.  We can see the same sort of graph movements – with run of gains and losses leading to a similar jagged shape.  This is not surprising as forex trades can also be thought of in terms of 2 binary outcomes like tossing a coin, and indeed huge amounts of forex trading is done through computer programs, some of which do use the Martingale system as a basis.

The effect of commission on the model

So, to finish off we can modify our system slightly so that we try to replicate forex trading.  We will follow the same model as before, but this time we have to pay a very small commission for every trade we make.  This now gives us:

E(X) = -0.000175. (0.0001 counters commission per trade)

E(X) = -0.00035. (0.0002 counters commission per trade)

Even though E(X) is very slightly negative, it means that in the long run we would expect to lose money.    With the 0.0002 counters commission we would expect to lose around 20 counters over 50,000 rounds.  The simulation graph above was run with 0.0002 counters commission –  and in this case it led to bankruptcy before 3000 rounds.

Computer code

The Python code above can be used to generate data which can then be copied into Desmos.  The above code simulates 1 player playing 999 rounds, starting with 100 counters, with a multiplier of 5.   If you know a little bit about coding you can try and play with this yourselves!

I’ve also just added a version of this code onto repl.  You can run this code – and also generate the graph direct (click on the graph png after running).  It creates some beautiful images like that shown above.

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Using Maths to model the spread of Coronavirus (COVID-19)

This coronavirus is the latest virus to warrant global fears over a disease pandemic.  Throughout history we have seen pandemic diseases such as the Black Death in Middle Ages Europe and the Spanish Flu at the beginning of the 20th century. More recently we have seen HIV responsible for millions of deaths.  In the last few years there have been scares over bird flu and SARS – yet neither fully developed into a major global health problem.  So, how contagious is COVID-19, and how can we use mathematics to predict its spread?

Modelling disease outbreaks with real accuracy is an incredibly important job for mathematicians and all countries employ medical statisticians for this job .  Understanding how diseases spread and how fast they can spread through populations is essential to developing effective medical strategies to minimise deaths.  If you want to save lives maybe you should become a mathematician rather than a doctor!

Currently scientists know relatively little about the new virus – but they do know that it’s the same coronavirus family as SARS and MERS which can both cause serious respiratory problems.  Scientists are particularly interested in trying to discover how infectious the virus is, how long a person remains contagious, and whether people can be contagious before they show any symptoms.

In the case of COVID-19 we have the following early estimated values: [From a paper published by medical statisticians in the UK on January 24]

R0. between 3.6 and 4. This is defined as how many people an infectious person will pass on their infection to in a totally susceptible population.  The higher the R0. value the more quickly an infection will spread.  By comparison seasonal flu has a R0. value around 2.8.

Total number infected by January 21:  prediction interval 9,217–14,245.  Of these an estimated 3,050–4,017 currently with the virus and the others recovered (or died).  This is based on an estimation that only around 5% of cases have been diagnosed.  By February 4th they predict 132,751–273,649 will be infected.

Transmission rate β estimated at 1.07.  β represents the transmission rate per day – so on average an infected person will infect another 1.07 people a day.

Infectious period estimated at 3.6 days. We can therefore calculate μ (the per capita recovery rate) by μ = 1/(3.6). This tells us how quickly people will be removed from the population (either recovered and become immune or died)

SIR Model

The basic model is based on the SIR model.  The SIR model looks at how much of the population is susceptible to infection (S), how many of these go on to become infectious (I), and how many of these are removed (R) from the population being considered (i.e they either recover and thus won’t catch the virus again, or die).

The Guardian datablog have an excellent graphic to show the contagiousness relative to deadliness of different diseases [click to enlarge, or follow the link].  We can see that seasonal flu has an R0. value of around 2.8 and a fatality rate of around 0.1%, whereas measles has an R0. value of around 15 and a fatality rate of around 0.3%.  This means that measles is much more contagious than seasonal flu.

You can notice that we have nothing in the top right hand corner (very deadly and very contagious). This is just as well as that could be enough to seriously dent the human population. Most diseases we worry about fall into 2 categories – contagious and not very deadly or not very contagious and deadly.

The equations above represent a SIR (susceptible, infectious, removed) model which can be used to model the spread of diseases like flu.

dS/dt represents the rate of change of those who are susceptible to the illness with respect to time.  dI/dt represents the rate of change of those who are infected with respect to time.  dR/dt represents the rate of change of those who have been removed with respect to time (either recovered or died).

For example, if dI/dt is high then the number of people becoming infected is rapidly increasing.  When dI/dt is zero then there is no change in the numbers of people becoming infected (number of infections remain steady).  When dI/dt is negative then the numbers of people becoming infected is decreasing.

Modelling for COVID-19

N is the total population.  Let’s take as the population of Wuhan as 11 million.

μ is the per capita recovery (Calculated by μ = 1/(duration of illness) ).  We have μ = 1/3.6 = 5/18.

β the transmission rate as approximately 1.07

Therefore our 3 equations for rates of change become:

dS/dt = -1.07 S I /11,000,000

dI/dt = 1.07 S I /11,000,000 – 5/18 I

dR/dt = 5/18 I

Unfortunately these equations are very difficult to solve – but luckily we can use a computer program  or spreadsheet to plot what happens.   We need to assign starting values for S, I and R – the numbers of people susceptible, infectious and removed.  With the following values for January 21: S = 11,000,000, I = 3500, R = 8200, β = 1.07, μ = 5/18, I designed the following Excel spreadsheet (instructions on what formula to use here):

This gives a prediction that around 3.9 million people infected within 2 weeks!  We can see that the SIR model that we have used is quite simplistic (and significantly different to the expert prediction of around 200,000 infected).

So, we can try and make things more realistic by adding some real life considerations.  The current value of β (the transmission rate) is 1.07, i.e an infected person will infect another 1.07 people each day.  We can significantly reduce this if we expect that infected people are quarantined effectively so that they do not interact with other members of the public, and indeed if people who are not sick avoid going outside.  So, if we take β as (say) 0.6 instead we get the following table:

Here we can see that this change to β has had a dramatic effect to our model.  Now we are predicting around 129,000 infected after 14 days – which is much more in line with the estimate in the paper above.

As we are seeing exponential growth in the spread, small changes to the parameters will have very large effects.  There are more sophisticated SIR models which can then be used to better understand the spread of a disease.  Nevertheless we can see clearly from the spreadsheet the interplay between susceptible, infected and recovered which is the foundation for understanding the spread of viruses like COVID-19.

[Edited in March to use the newly designated name COVID-19]
Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Simulating a Football Season

This is a nice example of how statistics are used in modeling – similar techniques are used when gambling companies are creating odds or when computer game designers are making football manager games.  We start with some statistics.  The soccer stats site has the data we need from the 2018-19 season, and we will use this to predict the outcome of the 2019-20 season (assuming teams stay at a similar level, and that no-one was relegated in 2018-19).

Attack and defense strength

For each team we need to calculate:

1. Home attack strength
2. Away attack strength
3. Home defense strength
4. Away defense strength.

For example for Liverpool (LFC)

LFC Home attack strength = (LFC home goals in 2018-19 season)/(average home goals in 2018-19 season)

LFC Away attack strength = (LFC away goals in 2018-19 season)/(average away goals in 2018-19 season)

LFC Home defense strength = (LFC home goals conceded in 2018-19 season)/(average home goals conceded in 2018-19 season)

LFC Away defense strength = (LFC away goals conceded in 2018-19 season)/(average away goals conceded in 2018-19 season)

Calculating lamda

We can then use a Poisson model to work out some probabilities.  First though we need to find our lamda value.  To make life easier we can also use the fact that the lamda value for a Poisson gives the mean value – and use this to give an approximate answer.

So, for example if Liverpool are playing at home to Arsenal we work out Liverpool’s lamda value as:

LFC home lamda = league average home goals per game x LFC home attack strength x Arsenal away defense strength.

We would work out Arsenal’s away lamda as:

Arsenal away lamda = league average away goals per game x Arsenal away attack strength x Liverpool home defense strength.

Putting in some values gives a home lamda for Liverpool as 3.38 and an away lamda for Arsenal as 0.69.  So we would expect Liverpool to win 3-1 (rounding to the nearest integer).

Using Excel

I then used an Excel spreadsheet to work out the home goals in each fixture in the league season (green column represents the home teams)

and then used the same method to work out the away goals in each fixture in the league (yellow column represents the away team)

I could then round these numbers to the nearest integer and fill in the scores for each match in the table:

Then I was able to work out the point totals to produce a predicted table:

Here we had both Liverpool and Manchester City on 104 points, but with Manchester City having a better goal difference, so winning the league again.

Using a Poisson model.

The poisson model allows us to calculate probabilities.  The mode is:

P(k goals) = (eλk)/k!

λ is the symbol lamda which we calculated before.

So, for example with Liverpool at home to Arsenal we calculate

Liverpool’s home lamda = league average home goals per game x LFC home attack strength x Arsenal away defense strength.

Liverpool’s home lamda = 1.57 x 1.84 x 1.17 = 3.38

Therefore

P(Liverpool score 0 goals) = (e-3.383.380)/0! = 0.034

P(Liverpool score 1 goal) = (e-3.383.381)/1! = 0.12

P(Liverpool score 2 goals) = (e-3.383.382)/2! = 0.19

P(Liverpool score 3 goals) = (e-3.383.383)/3! = 0.22

P(Liverpool score 4 goals) = (e-3.383.381)/1! = 0.19

P(Liverpool score 5 goals) = (e-3.383.385)/5! = 0.13 etc.

Arsenal’s away lamda = 1.25 x 1.30 x 0.42 = 0.68

P(Arsenal score 0 goals) = (e-0.680.680)/0! = 0.51

P(Arsenal score 1 goal) = (e-0.680.681)/1! = 0.34

P(Arsenal score 2 goals) = (e-0.680.682)/2! = 0.12

P(Arsenal score 3 goals) = (e-0.680.683)/3! = 0.03 etc.

Probability that Arsenal win

Arsenal can win if:

Liverpool score 0 goals and Arsenal score 1 or more

Liverpool score 1 goal and Arsenal score 2 or more

Liverpool score 2 goals and Arsenal score 3 or more etc.

i.e the approximate probability of Arsenal winning is:

0.034 x 0.49 + 0.12 x 0.15 + 0.19 x 0.03 = 0.04.

Using the same method we could work out the probability of a draw and a Liverpool win.  This is the sort of method that bookmakers will use to calculate the probabilities that ensure they make a profit when offering odds.

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Quantum Mechanics – Statistical Universe

Quantum mechanics is the name for the mathematics that can describe physical systems on extremely small scales.  When we deal with the macroscopic – i.e scales that we experience in our everyday physical world, then Newtonian mechanics works just fine.  However on the microscopic level of particles, Newtonian mechanics no longer works – hence the need for quantum mechanics.

Quantum mechanics is both very complicated and very weird – I’m going to try and give a very simplified (though not simple!) example of how probabilities are at the heart of quantum mechanics.  Rather than speaking with certainty about the property of an object as we can in classical mechanics, we need to take about the probability that it holds such a property.

For example, one property of particles is spin.  We can have create a particle with the property of either up spin or down spin.  We can visualise this as an arrow pointing up or down:

We can then create an apparatus (say the slit below parallel to the z axis) which measures whether the particle is in either up state or down state.  If the particle is in up spin then it will return a value of +1 and if it is in down spin then it will return a value of -1.

So far so normal.  But here is where things get weird.  If we then rotate the slit 90 degrees clockwise so that it is parallel to the x axis, we would expect  from classical mechanics to get a reading of 0.  i.e the “arrow” will not fit through the slit.  However that is not what happens.  Instead we will still get readings of -1 or +1.  However if we run the experiment a large number of times we find that the mean average reading will indeed be 0!

What has happened is that the act of measuring the particle with the slit has changed the state of the particle.  Say it was previously +1, i.e in up spin, by measuring it with the newly rotated slit we have forced the particle into a new state of either pointing right (right spin) or pointing left (left spin).  Our rotated slit will then return a value of +1 if the particle is in right spin, and will return a value of -1 if the particle in in left spin.

In this case the probability that the apparatus will return a value of +1 is 50% and the probability that the apparatus will return a value of -1 is also 50%.  Therefore when we run this experiment many times we get the average value of 0.  Therefore classical mechanics is achieved as an probabilistic approximation of repeated particle interactions

We can look at a slightly more complicated example – say we don’t rotate the slit 90 degrees, but instead rotate it an arbitrary number of degrees from the z axis as pictured below:

Here the slit was initially parallel to the z axis in the x,y plane (i.e y=0), and has been rotated Θ degrees.  So the question is what is the probability that our previously up spin particle will return a value of +1 when measured through this new slit?

The equations above give the probabilities of returning a +1 spin or a -1 spin depending on the angle of orientation.  So in the case of a 90 degree orientation we have both P(+1) and P(-1) = 1/2 as we stated earlier.  An orientation of 45 degrees would have P(+1) = 0.85 and P(-1) = 0.15.  An orientation of 10 degrees would have P(+1) = 0.99 and P(-1) = 0.01.

The statistical average meanwhile is given by the above formula.  If we rotate the slit by Θ degrees from the z axis in the x,z plane, then run the experiment many times, we will get a long term average of cosΘ.  As we have seen before, when Θ = 90 this means we get an average value of 0.  if Θ = 45 degrees we would get an average reading of √2/2.

This gives a very small snapshot into the ideas of quantum mechanics and the crucial role that probability plays in understanding quantum states.  If you found that difficult, then don’t worry you’re in good company.  As Richard Feynman the legendary physicist once said, “If you think you understand quantum mechanics, you don’t understand quantum mechanics.”

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams.  I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions.  What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial.  Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers.   These all have worked solutions and allow you to focus on specific topics or start general revision.  This also has some excellent challenging questions for those students aiming for 6s and 7s.

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

Predicting the UK election using linear regression

The above data is the latest opinion poll data from the Guardian.  The UK will have (another) general election on June 8th.  So can we use the current opinion poll data to predict the outcome?

Longer term data trends

Let’s start by looking at the longer term trend following the aftermath of the Brexit vote on June 23rd 2016.  I’ll plot some points for Labour and the Conservatives and see what kind of linear regression we get.  To keep things simple I’ve looked at randomly chosen poll data approximately every 2 weeks – assigning 0 to July 1st 2016, 1 to mid July, 2 to August 1st etc.  This has then been plotted using the fantastic Desmos.

Labour

You can see that this is not a very good fit – it’s a very weak correlation.  Nevertheless let’s see what we would get if we used this regression line to predict the outcome in June.  With the x axis scale I’ve chosen, mid June 2017 equates to 23 on the x axis.  Therefore we predict the percentage as

y = -0.130(23) + 30.2

y  = 27%

Clearly this would be a disaster for Labour – but our model is not especially accurate so perhaps nothing to worry about just yet.

Conservatives

As with Labour we have a weak correlation – though this time we have a positive rather than negative correlation.  If we use our regression model we get a prediction of:

y = 0.242(23) + 38.7

y = 44%

So, we are predicting a crushing victory for the Conservatives – but could we get some more accurate models to base this prediction on?

Using moving averages

The Guardian’s poll tracker at the top of the page uses moving averages to smooth out poll fluctuations between different polls and to arrive at an averaged poll figure.  Using this provides a stronger correlation:

Labour

This model doesn’t take into account a (possible) late surge in support for Labour but does fir better than our last graph.  Using the equation we get:

y = -0.0764(23) + 28.8

y = 27%

Conservatives

We can have more confidence in using this regression line to predict the election.  Putting in the numbers we get:

y = 0.411(23) + 36.48

y = 46%

Conclusion

Our more accurate models merely confirm what we found earlier – and indeed what all the pollsters are predicting – a massive win for the Conservatives.  Even allowing for a late narrowing of the polls the Conservatives could be on target for winning by over 10% points – which would result in a very large majority.  Let’s see what happens!

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

If you are a teacher then please also visit my new site: intermathematics.com for over 2000+ pdf pages of resources for teaching IB maths!

We can model radioactive decay of atoms using the following equation:

N(t) = N0 e-λt

Where:

N0: is the initial quantity of the element

λ: is the radioactive decay constant

t: is time

N(t): is the quantity of the element remaining after time t.

So, for Carbon-14 which has a half life of 5730 years (this means that after 5730 years exactly half of the initial amount of Carbon-14 atoms will have decayed) we can calculate the decay constant λ.

After 5730 years, N(5730) will be exactly half of N0, therefore we can write the following:

N(5730) = 0.5N0 = N0 e-λt

therefore:

0.5 = e-λt

and if we take the natural log of both sides and rearrange we get:

λ = ln(1/2) / -5730

λ ≈0.000121

We can now use this to solve problems involving Carbon-14 (which is used in Carbon-dating techniques to find out how old things are).

eg.  You find an old parchment and after measuring the Carbon-14 content you find that it is just 30% of what a new piece of paper would contain.  How old is this paper?

We have

N(t) = N0 e-0.000121t

N(t)/N0e-0.000121t

0.30e-0.000121t

t = ln(0.30)/(-0.000121)

t = 9950 years old.

Probability density functions

We can also do some interesting maths by rearranging:

N(t) = N0 e-λt

N(t)/N0 =  e-λt

and then plotting N(t)/N0 against time.

N(t)/N0 will have a range between 0 and 1 as when t = 0, N(0)N0 which gives N(0)/N(0) = 1.

We can then manipulate this into the form of a probability density function – by finding the constant a which makes the area underneath the curve equal to 1.

solving this gives a = λ.  Therefore the following integral:

will give the fraction of atoms which will have decayed between times t1 and t2.

We could use this integral to work out the half life of Carbon-14 as follows:

Which if we solve gives us t = 5728.5 which is what we’d expect (given our earlier rounding of the decay constant).

We can also now work out the expected (mean) time that an atom will exist before it decays.  To do this we use the following equation for finding E(x) of a probability density function:

and if we substitute in our equation we get:

Now, we can integrate this by parts:

So the expected (mean) life of an atom is given by 1/λ.  In the case of Carbon, with a decay constant λ ≈0.000121 we have an expected life of a Carbon-14 atom as:

E(t) = 1 /0.000121

E(t) = 8264 years.

Now that may sound a little strange – after all the half life is 5730 years, which means that half of all atoms will have decayed after 5730 years.  So why is the mean life so much higher?  Well it’s because of the long right tail in the graph – we will have some atoms with very large lifespans – and this will therefore skew the mean to the right.

Essential Resources for IB Teachers

If you are a teacher then please also visit my new site.  This has been designed specifically for teachers of mathematics at international schools.  The content now includes over 2000 pages of pdf content for the entire SL and HL Analysis syllabus and also the SL Applications syllabus.  Some of the content includes:

1. Original pdf worksheets (with full worked solutions) designed to cover all the syllabus topics.  These make great homework sheets or in class worksheets – and are each designed to last between 40 minutes and 1 hour.
2. Original Paper 3 investigations (with full worked solutions) to develop investigative techniques and support both the exploration and the Paper 3 examination.
3. Over 150 pages of Coursework Guides to introduce students to the essentials behind getting an excellent mark on their exploration coursework.
4. A large number of enrichment activities such as treasure hunts, quizzes, investigations, Desmos explorations, Python coding and more – to engage IB learners in the course.

There is also a lot more.  I think this could save teachers 200+ hours of preparation time in delivering an IB maths course – so it should be well worth exploring!

Essential Resources for both IB teachers and IB students

I’ve put together a 168 page Super Exploration Guide to talk students and teachers through all aspects of producing an excellent coursework submission.  Students always make the same mistakes when doing their coursework – get the inside track from an IB moderator!  I have also made Paper 3 packs for HL Analysis and also Applications students to help prepare for their Paper 3 exams.  The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

Modeling Volcanoes – When will they erupt?

A recent post by the excellent Maths Careers website looked at how we can model volcanic eruptions mathematically. This is an important branch of mathematics – which looks to assign risk to events and these methods are very important to statisticians and insurers. Given that large-scale volcanic eruptions have the potential to end modern civilisation, it’s also useful to know how likely the next large eruption is.

The Guardian has recently run a piece on the dangers that large volcanoes pose to humans.  Iceland’s Eyjafjallajökull volcano which erupted in 2010 caused over 100,000 flights to be grounded and cost the global economy over \$1 billion – and yet this was only a very minor eruption historically speaking.  For example, the Tombora eruption in Indonesia (1815) was so big that the explosion could be heard over 2000km away, and the 200 million tones of sulpher that were emitted spread across the globe, lowering global temperatures by 2 degrees Celsius.  This led to widespread famine as crops failed – and tens of thousands of deaths.

Super volcanoes

Even this destruction is insignificant when compared to the potential damage caused by a super volcano.  These volcanoes, like that underneath Yellowstone Park in America, have the potential to wipe-out millions in the initial explosion and and to send enough sulpher and ash into the air to cause a “volcanic winter” of significantly lower global temperatures.  The graphic above shows that the ash from a Yellowstone eruption could cover the ground of about half the USA. The resultant widespread disruption to global food supplies and travel would be devastating.

So, how can we predict the probability of a volcanic eruption?  The easiest model to use, if we already have an estimated probability of eruption is the Poisson distribution:

$P(X{=}k)= \frac{\lambda^k e^{-\lambda}}{k!},$

This formula calculates the probability that X equals a given value of k.  λ is the mean of the distribution.  If X represents the number of volcanic eruptions we have Pr(X ≥1) = 1 – Pr(x = 0).  This gives us a formula for working out the probability of an eruption as 1 -e.  For example, the Yellowstone super volcano erupts around every 600,000 years.  Therefore if λ is the number of eruptions every year, we have λ = 1/600,000  ≈ 0.00000167 and 1 -e also ≈ 0.00000167. This gets more interesting if we then look at the probability over a range of years. We can do this by modifying the formula for probability as 1 -e-tλ where t is the number of years for our range.

So the probability of a Yellowstone eruption in the next 1000 years is 1 -e-0.00167 ≈ 0.00166, and the probability in the next 10,000 years is 1 -e-0.0167 ≈ 0.0164. So we have approximately a 2% chance of this eruption in the next 10,000 years.

A far smaller volcano, like Katla in Iceland has erupted 16 times in the past 1100 years – giving a average eruption every ≈ 70 years. This gives λ = 1/70 ≈ 0.014. So we can expect this to erupt in the next 10 years with probability 1 -e-0.14 ≈ 0.0139. And in the next 30 years with probability 1 -e-0.42 ≈ 0.34.

The models for volcanic eruptions can get a lot more complicated – especially as we often don’t know the accurate data to give us an estimate for the λ.  λ can be estimated using a technique called Maximum Likelihood Estimation – which you can read about here.

If you enjoyed this post you might also like:

Black Swans and Civilisation Collapse. How effective is maths at guiding government policies?

Are you Psychic?

There have been people claiming to have paranormal powers for thousands of years.  However, scientifically we can say that as yet we still have no convincing proof that any paranormal abilities exist.  We can show this using some mathematical tests – such as the binomial or normal distribution.

ESP Test

You can test your ESP powers on this site (our probabilities will be a little different than their ones).  You have the chance to try and predict what card the computer has chosen.  After repeating this trial 25 times you can find out if you possess psychic powers.  As we are working with discrete data and have a fixed probability of guessing (0.2) then we can use a binomial distribution.  Say I got 6 correct, do I have psychic powers?

We have the Binomial model B(25, 0.2), 25 trials and 0.2 probability of success.  So we want to find the probability that I could achieve 6 or more by luck.

The probability of getting exactly 6 right is 0.16.  Working out the probability of getting 6 or more correct would take a bit longer by hand (though could be simplified by doing 1 – P(x ≤ 5).  Doing this, or using a calculator we find the probability is 0.38.  Therefore we would expect someone to get 6 or more correct just by guessing 38% of the time.

So, using this model, when would we have evidence for potential ESP ability?  Well, a minimum bar for our percentages would probably be 5%.  So how many do you need to get correct before there is less than a 5% of that happening by chance?

Using our calculator we can do trial and error to see that the probability of getting 9 or more correct by guessing is only 4.7%.  So, someone getting 9 correct might be showing some signs of ESP.  If we asked for a higher % threshold (such as 1%) we would want to see someone get 11 correct.

Now, in the video above, one of the Numberphile mathematicians manages to toss 10 heads in a row.  Again, we can ask ourselves if this is evidence of some extraordinary ability.  We can calculate this probability as 0.510 = 0.001. This means that such an event would only happen 0.1% of the time. But, we’re only seeing a very small part of the total video. Here’s the full version:

Suddenly the feat looks less mathematically impressive (though still an impressive endurance feat!)

You can also test your psychic abilities with this video here.

### Website Stats

• 9,460,856 views

All content on this site has been written by Andrew Chambers (MSc. Mathematics, IB Mathematics Examiner).

### New website for International teachers

I’ve just launched a brand new maths site for international schools – over 2000 pdf pages of resources to support IB teachers.  If you are an IB teacher this could save you 200+ hours of preparation time.

Explore here!

### Free HL Paper 3 Questions

P3 investigation questions and fully typed mark scheme.  Packs for both Applications students and Analysis students.