**Prime Spirals – Patterns in Primes**

One of the fundamental goals of pure mathematicians is gaining a deeper understanding of the distribution of prime numbers – hence why the Riemann Hypothesis is one of the great unsolved problems in number theory and has a $1 million prize for anyone who can solve it. Prime numbers are the the building blocks of our number system and are essential to our current encryption methods such as RSA encryption. Hence finding patterns in the primes is one of the great mathematical pursuits.

**Polar coordinates**

The beautiful prime spiral was generated above on Desmos using polar coordinates. We can see a clear spiral pattern – so let’s see how to create this. Polar coordinates (r, θ) need a length (r) from the origin and an angle of anti-clockwise rotation from the origin (θ). So for example in polar coordinates (2,2) means a length of 2 from the origin and a rotation of 2 radians. By considering trigonometry and the unit circle we can say that the polar coordinates (r, θ) are equivalent to the Cartesian coordinate (r.cosθ, r.sinθ).

**Plotting prime pairs**

So we plot the first few prime pairs:

Polar: (2,2). Cartesian: (2cos2, 2sin2).

Polar: (3,3). Cartesian: (3cos3, 3sin3).

Polar: (5,5). Cartesian: (5cos5, 5sin5).

In Desmos (making sure we are in radians) we input:

We can then change the Desmos graph view to polar (first click on the spanner on the right of the screen). This gives the first 3 points of our spirals. Note I have labeled the points as polar coordinates.

I then downloaded the first 1000 prime numbers from here. I then copied this list of comma separated values and pasted it into an empty part of square brackets M = [ ] in Desmos to create a list.

I can then plot every point in the list as a prime pair by doing the following:

We can then generate our prime spiral for the first 1000 prime pairs:

Just to see how powerful Desmos really is, I then downloaded all the prime numbers less than or equal to 100,000 from here. This time we see the following graph:

We can see that we lose the clear definition of the spiral – though there are still circular spirals with higher densities of primes than others. Also we can see that there are higher densities of the primes on some of the radial lines out from the origin – and other radial lines where no primes appear.

**Prime Number Theorem**

We can also use our Desmos result to investigate another (more fundamental) result about the distribution of prime numbers. The prime number theorem states:

Here pi(N) is the number of prime numbers less than or equal to N. The little squiggle means that as N gets large pi(N) becomes better and better approximated by the function on the RHS.

For our purple “spiral” above we downloaded all the primes less than or equal to 100,000 – and Desmos tells us that there were 9,592 of them. So let’s see how close the prime number theorem gets us:

We can see that we are off by an error of around 9.46% – not too bad, though still a bit out. As we make N larger we will find that we get a better and better approximation.

Let’s look at what would happen if we took N as 1,000,000,000. From Wikipedia we can see that there are 50,847,534 primes less than or equal to 1,000,000,000. Therefore:

This time we are off by an error of only 5.10%. Have a look at the table of values in Wikipedia to find how large N has to be to be within 1% accuracy.

So this is a nice introduction to looking for patterns in the primes – and a good chance to explore some of the nice graphical capabilities of Desmos. See if you can find any more patterns of your own!

]]>**Getting a 7 in IB Maths Exploration Coursework**

I’ve teamed up with Udemy – the world’s leading provider of online courses to create a comprehensive online guide to the exploration. It includes **9 tutorial videos** of essential information designed to ensure you get the best possible grade. You will also get a **60 page pdf Exploration Guide** (worth $7.50) for free.

The IB Maths coursework is worth 20% of the final grade – but many students score poorly on this, and sometimes because of poor advice. Gain the inside track on what makes a good coursework piece from an IB Maths Examiner as you learn all the skills necessary to produce something outstanding. The video tutorials will cover

1) The tools required to pick an excellent topic,

2) Looking at how to gain a deep understanding of the criteria points,

3) Non calculator technique to demonstrate thorough understanding,

4) Exploring top tips for making beautiful graphs and modeling using Desmos,

5) Comparing “good” versus “bad” examples of coursework.

6) Achieving a Level 7 – what you need to do to hit the top criteria levels.

There is more than 140 minutes of video tutorial content as well as a number of multiple choice quizzes to aid understanding. There are also a number of pdf downloads to support the lesson content – such as a criteria checklist, examples of topics to research in more detail, a initial submission sheet and also some data to use in Desmos graphing.

See the free preview here.

]]>Anscombe’s Quartet was devised by the statistician Francis Anscombe to illustrate how important it was to not just rely on statistical measures when analyzing data. To do this he created 4 data sets which would produce nearly identical statistical measures. The scatter graphs above generated by the Python code here.

**Statistical measures**

1) Mean of x values in each data set = 9.00

2) Standard deviation of x values in each data set = 3.32

3) Mean of y values in each data set = 7.50

4) Standard deviation of x values in each data set = 2.03

5) Pearson’s Correlation coefficient for each paired data set = 0.82

6) Linear regression line for each paired data set: y = 0.500x + 3.00

When looking at this data we would be forgiven for concluding that these data sets must be very similar – but really they are quite different.

**Data Set A:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [8.04, 6.95,7.58,8.81,8.33, 9.96,7.24,4.26,10.84,4.82,5.68]

Data Set A does indeed fit a linear regression – and so this would be appropriate to use the line of best fit for predictive purposes.

**Data Set B:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [9.14,8.14,8.74,8.77,9.26,8.1,6.13,3.1,9.13,7.26,4.74]

You could fit a linear regression to Data Set B – but this is clearly not the most appropriate regression line for this data. Some quadratic or higher power polynomial would be better for predicting data here.

**Data Set C:**

x = [10,8,13,9,11,14,6,4,12,7,5]

y = [7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73]

In Data set C we can see the effect of a single outlier – we have 11 points in pretty much a perfect linear correlation, and then a single outlier. For predictive purposes we would be best investigating this outlier (checking that it does conform to the mathematical definition of an outlier), and then potentially doing our regression with this removed.

**Data Set D:**

x = [8,8,8,8,8,8,8,19,8,8,8]

y = [6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.50,5.56,7.91,6.89]

In Data set D we can also see the effect of a single outlier – we have 11 points in a vertical line, and then a single outlier. Clearly here again drawing a line of best fit for this data is not appropriate – unless we remove this outlier first.

**The moral of the story**

So – the moral here is always use graphical analysis alongside statistical measures. A very common mistake for IB students is to rely on Pearson’s Product coefficient without really looking at the scatter graph to decide whether a linear fit is appropriate. If you do this then you could end up with a very low mark in the E category as you will not show good understanding of what you are doing. So always plot a graph first!

]]>**Hailstone Numbers**

Hailstone numbers are created by the following rules:

**if n is even:** divide by 2

**if n is odd:** times by 3 and add 1

We can then generate a sequence from any starting number. For example, starting with 10:

10, 5, 16, 8, 4, 2, 1, 4, 2, 1…

we can see that this sequence loops into an infinitely repeating 4,2,1 sequence. Trying another number, say 58:

58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1…

and we see the same loop of 4,2,1.

Hailstone numbers are called as such because they fall, reach one (the ground) before bouncing up again. The proper mathematical name for this investigation is the Collatz conjecture. This was made in 1937 by a German mathematian, Lothar Collatz.

One way to investigate this conjecture is to look at the length of time it takes a number to reach the number 1. Some numbers take longer than others. If we could find a number that didn’t reach 1 even in an infinite length of time then the Collatz conjecture would be false.

The following graphic from wikipedia shows how different numbers (x axis) take a different number of iterations (y axis) to reach 1. We can see that some numbers take much longer than others to reach one. Some numbers take over 250 iterations – but every number checked so far does eventually reach 1.

For example, the number 73 has the following pattern:

73, 220, 110, 55, 166, 83, 250, 125, 376, 188, 94, 47, 142, 71, 214, 107, 322, 161, 484, 242, 121, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132, 566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238, 1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1…

**No proof yet**

Investigating what it is about certain numbers that leads to long chains is one possible approach to solving the conjecture. This conjecture has been checked by computers up to a staggering 5.8 x 10^{18} numbers. That would suggest that the conjecture could be true – but doesn’t prove it is. Despite looking deceptively simple, Paul Erdos – one of the great 20th century mathematicians stated in the 1980s that “mathematics is not yet ready for such problems” – and it has remained unsolved over the past few decades. Maybe you could be the one to crack this problem!

**Exploring this problem with Python.**

We can plot this with Python – such that we also generate a nice graphical representation of these numbers. The graph above shows what happens to the number 500 when we follow this rule – we “bounce” up to close to 10,000 before falling back into the closed loop after around 100 iterations.

**Numbers with large iterations:**

871 takes 178 steps to reach 1:

77,031 takes 350 steps to reach 1:

9,780,657,630 takes 1132 steps to reach 1:

If you want to explore this code yourself, the following code has been written to run on repl.it. You can see the code yourself here, and I have also copied it below:

Have a play – and see what nice graphs you can draw!

]]>

**Chaos and strange Attractors: Henon’s map**

Henon’s map was created in the 1970s to explore chaotic systems. The general form is created by the iterative formula:

The classic case is when a = 1.4 and b = 0.3 i.e:

To see how points are generated, let’s choose a point near the origin. If we take (0,0) the next x coordinate is given by:

We would then continue this process over several thousands iterations. If we do this then we get the very strange graph at the top of the page – the points are attracted to a flow like structure, which they then circulate round. The graph above was generated when we took our starting coordinate as (0.1,0.1), let’s take a different starting point. This time let’s have (1.1, 1.1):

We can see that exactly the same structure appears. All coordinates close to the origin will get attracted to this strange attractor – except for a couple of fixed points near the origin which remain where they are. Let’s see why. First we can rewrite the iterative formula just in terms of x:

Next we use the fact that when we have a fixed point the x coordinate (and y coordinate) will not change. Therefore we can define the following:

This allows us to then make the following equation:

Which we can then solve using the quadratic formula:

Which also gives y:

So therefore at these 2 fixed points the coordinates do not get drawn to the strange attractor.

Above we can see the not especially interesting graph of the repeated iterations when starting at this point!

But we can also see the chaotic behavior of this system by choosing a point very close to this fixed point. Let’s choose (0.631354477, 0.631354477) which is correct to 9 decimal places as an approximation for the fixed point.

We can see our familiar graph is back. This is an excellent example of chaotic behavior – a very small change in the initial conditions has created a completely different system.

This idea was suggested by the excellent Doing Maths With Python – which is well worth a read if you are interested in computer programing to solve mathematical problems.

]]>**The Barnsley Fern: Mathematical Art**

This pattern of a fern pictured above was generated by a simple iterative program designed by mathematician Michael Barnsely. I downloaded the Python code from the excellent Tutorialspoint and then modified it slightly to run on repl.it. What we are seeing is the result of 40,000 individual points – each plotted according to a simple algorithm. The algorithm is as follows:

**Transformation 1:** (0.85 probability of occurrence)

x_{i+1} = 0.85x_{i} +0.04y_{i}

y_{i+1}= -0.04x_{i}+0.85y_{i}+1.6

**Transformation 2:** (0.07 probability of occurrence)

x_{i+1} = 0.2x_{i} -0.26y_{i}

y_{i+1}= 0.23x_{i}+0.22y_{i}+1.6

**Transformation 3:** (0.07 probability of occurrence)

x_{i+1} = -0.15x_{i} -0.28y_{i}

y_{i+1}= 0.26x_{i}+0.24y_{i}+0.44

**Transformation 4:** (0.01 probability of occurrence)

x_{i+1} = 0

y_{i+1}= 0.16y_{i}

So, I start with (0,0) and then use a random number generator to decide which transformation to use. I can run a generator from 1-100 and assign 1-85 for transformation 1, 86-92 to transformation 2, 93-99 for transformation 3 and 100 for transformation 4. Say I generate the number 36 – therefore I will apply transformation 1.

x_{i+1} = 0.85(0)+0.04(0)

y_{i+1}= -0.04(0)+0.85(0)+1.6

and my new coordinate is (0,1.6). I mark this on my graph.

I then repeat this process – say this time I generate the number 90. This tells me to do transformation 2. So:

x_{i+1} = 0.2(0) -0.26(1.6)

y_{i+1}= 0.23(0)+0.22(1.6)+1.6

and my new coordinate is (-0.416, 1.952). I mark this on my graph and carry on again. The graph above was generated with 40,000 iterations – let’s see how it develops over time:

**1000 iterations:**

**10,000 iterations:**

**100,000 iterations:**

**500,000 iterations:**

If we want to understand what is happening here we can think of each transformation as responsible for a different part of our fern. Transformation 1 is most likely and therefore this fills in the smaller leaflets. Transformations 2 and 3 fill in the bottom left and right leaflet (respectively) and transformation 4 fills in the stem.

It’s quite amazing to think that a simple computer program can create what looks like art – or indeed that is can replicate what we see in nature so well. This fern is an example of a self-similar pattern – i.e one which will look the same at different scales. You could zoom into a detailed picture and see the same patterns repeating. You might want to explore the idea of fractals in delving into this topic in more detail.

**Changing the iterations**

We can explore what happens when we change the iterations very slightly.

**Christmas tree**

**Crazy spiral**

**Modern art**

You can modify the code to run this here. Have a go!

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

]]>**Galileo’s Inclined Planes**

*This post is based on the maths and ideas of Hahn’s Calculus in Context – which is probably the best mathematics book I’ve read in 20 years of studying and teaching mathematics. Highly recommended for both students and teachers!*

Hahn talks us though the mathematics, experiments and thought process of Galileo as he formulates his momentous theory that in free fall (ignoring air resistance) an object falling for *t* seconds will fall a distance of ct² where c is a constant. This is counter-intuitive as we would expect the mass of an object to be an important factor in how far an object falls (i.e that a heavier object would fall faster). Galileo also helped to overturned Aristotle’s ideas on motion. Aristotle had argued that any object in motion would eventually stop, Galileo instead argued that with no friction a perfectly spherical ball once started rolling would roll forever. Galileo’s genius was to combine thought experiments and real data to arrive at results that defy “common sense” – to truly understand the universe humans had to first escape from our limited anthropocentric perspective, and mathematics provided an opportunity to do this.

**Inclined Planes**

Galileo conducted experiments on inclined planes where he placed balls at different heights and then measured their projectile motion when they left the ramp, briefly ran past the edge of a flat surface and then fell to the ground. We can see the set up of one ramp above. The ball starts at O, and we mark as h this height. At an arbitrary point P we can see that there are 2 forces acting on the ball, F which is responsible for the ball rolling down the slope, and f, which is a friction force in the opposite direction. At point P we can mark the downwards force mg acting on the ball. We can then use some basic rules of parallel lines to note that the angles in triangle PCD are equal to triangle AOB.

Galileo’s t**imes squared law of fall**

We have the following equation for the total force acting on the ball at point P:

We also have the following relationship from physics, where m is the mass and a(t) the acceleration:

This therefore gives:

Next we can use trigonometry on triangle PCD to get an equation for F:

Next we can use another equation from physics which gives us the frictional force on a perfectly spherical, homogenous body rolling down a plane is:

So this gives:

We can then integrate to get velocity (our constant of integration is 0 because the velocity is 0 when t = 0)

and integrate again to get the distance travelled of the ball (again our constant of integration is 0):

When Galileo was conducting his experiments he did not know *g, *instead he noted that the relationship was of the form;

where c is a constant related to a specific incline. This is a famous result called the *times squared law of fall. * It shows that the distance travelled is independent of the mass and is instead related to the time of motion squared.

**Velocity also independent of the angle of incline**

Above we have shown that the distance travelled is independent of the mass – but in the equation it is still dependent on the angle of the incline. We can go further and then show that the velocity of the ball is also independent of the angle of incline, and is only dependent on the height at which the ball starts from.

If we denote as t_b as the time when the ball reaches point A in our triangle we have:

This is equal to the distance from AO, so we can use trigonometry to define:

This can then be rearranged to give:

this is the time taken to travel from O to A. We can the substitute this into the velocity equation we derived earlier to give the velocity at point A. This is:

This shows that the velocity of the ball at point A is only dependent on the height and not the angle of incline or mass. The logical extension of this is that if the angle of incline has no effect on the velocity, that this result would still hold as the angle of incline approaches and then reaches 90 degrees – i.e when the ball is in free fall.

Galileo used a mixture of practical experiments on inclined planes, mathematical calculations and thought experiments to arrive at his truly radical conclusion – the sign of a real genius!

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

]]>**Finding focus with Archimedes**

*This post is based on the maths and ideas of Hahn’s Calculus in Context – which is probably the best mathematics book I’ve read in 20 years of studying and teaching mathematics. Highly recommended for both students and teachers!*

Hard as it is to imagine now, for most of the history of mathematics there was no coordinate geometry system and therefore graphs were not drawn using algebraic equations but instead were constructed. The ancient Greeks such as Archimedes made detailed studies of conic sections (parabolas, ellipses and hyperbola) using ideas of relationships in constructions. The nice approach to this method is that it makes clear the link between conic sections and their properties in reflecting light – a property which can then be utilized when making lenses. A parabolic telescope for example uses the property that all light collected through the scope will pass through a single focus point.

Let’s see how we can construct a parabola without any algebra – simply using the constructions of the Greeks. We start with a line and a focus point F not on the line. This now defines a **unique parabola**.

This unique parabola is defined as all the points A such that the distance from A to F is equal as the perpendicular distance from A to the line.

We can see above that point A must be on our parabola because the distance AB is the same as the distance AF.

We can also see that point C must be on our parabola because the length CD is the same as CF. Following this same method we could eventually construct every point on our parabola. This would finally create the following parabola:

**Focus point of a parabolic **mirror

We can now see how this parabola construction gives us an intrinsic understanding of reflective properties. If we have a light source entering parallel to the perpendicular though the focus then we can use the fact that this light will pass through the focus to find the path the light traces before it is reflected out.

Newton made use of this property when designing his parabolic telescope. It’s interesting to note how a different method leads to a completely different appreciation of the properties of a curve.

**Finding the area under a quadratic curve without calculus**

Amazingly a method for finding the area under a quadratic curve was also discovered by the Greek scientist and mathematician Archimedes around 2200 years ago – and nearly 2000 years before calculus. Archimedes’ method was as follows.

Choose 2 points on the curve, join them to make 2 sides of a triangle. Choose the 3rd point of the triangle as the point on the quadratic with the same gradient as the chord. This is best illustrated as below. Here I generated a parabola with focus at (0,1) and line with the x axis.

Here I chose points B and C, joined these with a line and then looked for the point on the triangle with the same gradient. This then gives a triangle with area 4. Archimedes then discovered that the area of the parabolic segment (i.e the total area enclosed by the line BC and the parabola) is 4/3 the area of the triangle. This gives 4/3 of 4 which is 5 1/3. Once we have this we can find the area under the curve (i.e the integral) using simple areas of geometric shapes.

**Using calculus**

We can check that Archimedes’ method does indeed work. We want to find the area enclosed by the 2 following equations:

This is given by:

It works! Now we can try a slightly more difficult example. This time I won’t choose 2 points parallel to the x-axis.

This time I find the gradient of the line joining B and C and then find the point on the parabola with the same gradient. This forms my 3rd point of the triangle. The area of this triangle is approximately 1.68. Therefore Archimedes’ method tells us the area enclosed between the line and the curve will be approximately 4/3 (1.68) = 2.24. Let’s check this with calculus:

Again we can see that this method works – our only error was in calculating an approximate area for the triangle rather than a more precise answer.

So, nearly 2000 years before the invention of calculus the ancient Greeks were already able to find areas bounded by line and parabolic curves – and indeed Archimedes was already exploring the ideas of the limit of sums of areas upon which calculus in based.

Essential resources for IB students:

Revision Village has been put together to help IB students with topic revision both for during the course and for the end of Year 12 school exams and Year 13 final exams. I would strongly recommend students use this as a resource during the course (not just for final revision in Y13!) There are specific resources for HL and SL students for both Analysis and Applications.

There is a comprehensive Questionbank takes you to a breakdown of each main subject area (e.g. Algebra, Calculus etc) and then provides a large bank of graded questions. What I like about this is that you are given a difficulty rating, as well as a mark scheme and also a worked video tutorial. Really useful!

The Practice Exams section takes you to a large number of ready made quizzes, exams and predicted papers. These all have worked solutions and allow you to focus on specific topics or start general revision. This also has some excellent challenging questions for those students aiming for 6s and 7s.

Each course also has a dedicated video tutorial section which provides 5-15 minute tutorial videos on every single syllabus part – handily sorted into topic categories.

2) Exploration Guides and Paper 3 Resources

I’ve put together four comprehensive pdf guides to help students prepare for their exploration coursework and Paper 3 investigations. The exploration guides talk through the marking criteria, common student mistakes, excellent ideas for explorations, technology advice, modeling methods and a variety of statistical techniques with detailed explanations. I’ve also made 17 full investigation questions which are also excellent starting points for explorations. The Exploration Guides can be downloaded here and the Paper 3 Questions can be downloaded here.

]]>

**Finding the average distance between 2 points on a hypercube**

This is the natural extension from this previous post which looked at the average distance of 2 randomly chosen points in a square – this time let’s explore the average distance in n dimensions. I’m going to investigate what dimensional hypercube is required to have an average distance of more than one, and then also what happens to the average distance as n approaches infinity.

**Monte Carlo method**

The Monte Carlo method is a very powerful technique which utilizes computational power. Basically we use the fact that the average of a very large number of trials will serve as an approximation to an exact result. In this case I will run a Python program 10 million times – each time it will select 2 coordinate points and then work out the distance between them. It will then find the average of these 10 million trials. The code above generates 2 coordinates in 3 dimensional space inside a unit cube. We can modify this for n-dimensional space by remembering that Pythagoras still works in higher dimensions.

**Results**

Running this code helps to generate the above results. This answers our first question – we need a 7 dimensional unit hypercube until the average distance between two randomly chosen points is greater than 1. We can also see that the difference between the average distances is reducing – but it’s not clear if this will approach a limit or if it will continue growing to infinity. So let’s do some more trials.

**Further trials**

This takes us up to a 22-dimensional hypercube. At this point it’s probably useful to plot a graph to see the trend.

**Reciprocal model**

This reciprocal model is of the form:

We can see that this is a pretty good fit (R squared 0.9994). If this model is accurate then this would suggest that the average distance approaches a limit as n approaches infinity.

**Polynomial model**

This polynomial model is of the form:

We can see that this is also a very good fit (R squared 0.9997). If this model is accurate then as b is greater than 0, this would suggest that the average distance approaches infinity as n approaches infinity.

**Reflection**

Quite annoyingly we have 2 model which both fit the data very accurately – but predict completely different results! Logically we could probably say that we would expect the average distance to approach infinity as n approaches infinity – and also we could possibly justify this by the fact that the polynomial model is a slightly better fit. Given the similarity between the 2 models it probably time to find out the actual results for this.

**Average n-dimensional distance bounds**

Not surprisingly the mathematics required to work this out is exceptionally difficult – and ends up with non-solvable integrals which require analytic solutions. The Monte Carlo method with very large numbers of trials is a reasonably good approach to approximating this answer. There is however a very useful lower and upper bound for the average distance in n dimensional space given by:

This shows immediately that the average distance will approach infinity as n grows large – as the lower bound will grow to infinity. Quite pleasingly we can see that the polynomial model we derived is similar to the lower bound. We can plot both upper and lower bound along with our polynomial model to see how these all compare. We have lower bound (green), polynomial model (black) and upper bound (green):

We can see that our polynomial model very closely follows the upper bound in our domain. As we extend the domain this polynomial approximation remains above the lower and tracks the upper bounds before gradually growing less accurate. When n is 50 our model predicts a distance of 2.94, whereas the upper bound is 2.88. This is quite a nice result – we have used the Monte Carlo method to derive a polynomial approximation to the average distance in n-dimensional hypercubes and it both closely follows the upper bound over a reasonable domain and also is of a very similar form to the lower bound. We can use this lower bound to see that a 36 dimensional hypercube (and higher) would be guaranteed to have an average distance of more than 2.

**Conclusion**

This was a nice example of the power of the Monte Carlo method in these kind of problems – we were able to use it quite successfully to get a polynomial approximation which turned out to be reasonably accurate. We could have significantly improved this accuracy by running 100 million (or 1 billion etc) trials each time – though this would have probably required a more powerful computer!

Essential resources for IB students:

2) Exploration Guides and Paper 3 Resources

]]>

**Find the average distance between 2 points on a square**

This is another excellent mathematical puzzle from the MindYourDecisions youtube channel. I like to try these without looking at the answer – and then to see how far I get. This one is pretty difficult (and the actual solution exceptionally difficult!) The problem is to take a square and randomly choose 2 points somewhere inside. If you calculate the distance between the 2 points, then do this trial approaching an infinite number of times what will the average distance be? Here is what I did.

**Simplify the situation: 1×1 square**

This is one of the most important strategies in tackling difficult maths problems. You simplify in order to gain an understanding of the underlying problem and possibly either develop strategies or notice patterns. So, I started with a unit square and only considered the vertices. We can then list all the possible lengths:

We can then find the average length by simply doing:

**2×2 square**

We can then follow the same method for a 2×2 square. This gives:

Which gives an average of:

**Back to a 1×1 square**

Now, we can imagine that we have a 1 x 1 square with dots at every 0.5. This is simply a scaled version of the 2×2 square, so we can divide our answer by 2 to give:

**3×3 square**

Following the same method we have:

This gives an average of:

and if we imagine a 1×1 square with dots at every 1/3. This is simply a scaled version of the 3×3 square, so we can divide our answer by 3 to give:

We can then investigate what happens as we consider more and more dots inside our 1×1 square. When we have considered an infinite number then we will have our average distance – so we are looking the limit to infinity. This suggests using a graph. First I calculated a few more terms in the sequence:

Then I plotted this on Desmos. The points looked like they fit either an exponential or a reciprocal function – both which have asymptotes, so I tried both. The reciprocal function fit with an R squared value of 1. This is a perfect fit so I will use that.

This was plotted using the regression line:

And we can find the equation of the horizontal asymptote by seeing what happens when x approaches infinity. This will give a/c. Using the values provided by Desmos’ regression I got 0.515004887. Because I have been using approximate answers throughout I’ll take this as 0.52 (2sf). **Therefore I predict that the average distance between 2 points in a 1×1 square will be approximately 0.52**. And more generally, the average distance in an n x n square will be 0.52(n). This is somewhat surprising as a result – it’s not obvious why it would be a little over half the distance from 0 to 1.

**Brute forcing using Python**

We can also write a quick code to approximate this answer using Python (This is a Monte Carlo method). I generate 4 random numbers to represent the 2 x-coordinates and 2-y coordinates of 2 random points. I then work out the distance between them and repeat this 10 million times, then calculate the average distance. This gives:

**Checking with the actual answer**

Now for the moment of truth – and we watch the video to find out how accurate this is. The correct answer is indeed 0.52 (2sf) – which is great – our method worked! The exact answer is given by:

Our graphical answer is not quite accurate enough to 3 sf – probably because we relied on rounded values to plot our regression line. Our Python method with 10 million trials was accurate to 4 sf. Just to keep my computer on its toes I also calculated this with 100 million trials. This gave 0.5214126210834646 (now accurate to 5 sf).

We can also find the percentage error when using our graphical method. This is only:

Overall this is a decent result! If you are feeling *extremely* brave you might want to look at the video to see how to do this using calculus.

**Extension: The average distance between 2 points in a unit circle**

I modified the Python code slightly to now calculate the average distance between 2 points in a unit circle. This code is:

which returns an answer of 0.9054134561871364. I then looked up what the exact answer is. For the unit circle it is 128/(45 pi). This is approximately 0.9054147874. We can see that our computer method was accurate to 5 sf here. Again, the actual mathematical proof is extremely difficult.

**Reflection**

This is a nice example of important skills and techniques useful in mathematics – simplification of a problem, noticing patterns, graphical methods, computational power and perseverance!

Essential resources for IB students:

2) Exploration Guides and Paper 3 Resources