You are currently browsing the category archive for the ‘computing’ category.

Screen Shot 2023-03-28 at 7.46.37 PM

GPT-4 vs ChatGPT. The beginning of an intelligence revolution?

The above graph (image source) is one of the most incredible bar charts you’ll ever see – this is measuring the capabilities of GPT4, Open AI’s new large language model with its previous iteration, ChatGPT.  As we can see, GPT4 is now able to score in the top 20% of takers across a staggering field of subjects.  This on its own is amazing – but the really incredible part is that the green sections represent improvements since ChatGPT – and that ChatGPT was only released 3 ½ months ago.

GPT4 is now able to successfully pass nearly all AP subjects, would pass the bar exam to qualify as a lawyer and is now even making headway on Olympiad style mathematics papers.   You can see that ChatGPT had already mastered many of the humanities subjects – and that now GPT4 has begun to master the sciences, maths, economics and law.

We can see an example of the mathematical improvements in GPT4 below from a recently released research paper.  Both AIs were asked a reasonably challenging integral problem:

Screen Shot 2023-03-28 at 7.38.12 PM

GPT4 response:

Screen Shot 2023-03-28 at 7.56.00 PM

Screen Shot 2023-03-28 at 7.56.16 PM

Screen Shot 2023-03-28 at 7.56.21 PM

GPT4 is correct – and excellently explained, whereas the ChatGPT response (viewable in the paper) was just completely wrong.  It’s not just that GPT4 is now able to do maths like this – after all, so can Wolfram Alpha, but that the large language model training method allows it do complicated maths as well as everything else.  The research paper this appears in is entitled “Sparks of Artificial General Intelligence” because this appears to be the beginning of the holy grail of AI research – a model which has intelligence across multiple domains – and as such begins to reach human levels of intelligence across multiple measures.

An intelligence explosion?

Nick Bostrom’s Superintelligence came out several years ago to discuss the ideas behind the development of intelligent systems and in it he argues that we can probably expect explosive growth – perhaps even over days or weeks – as a system reaches a critical level of intelligence and then drives its own further development.  Let’s look at the maths behind this.  We start by modelling the rate of growth of intelligence over time:

Screen Shot 2023-03-28 at 7.16.02 PM

Optimisation power is a measure of how much resource power is being allocated to improving the intelligence of the AI.  The resources driving this improvement are going to come the company working on the project (in this case Open AI), and also global research into AI in the nature of published peer review papers on neural networks etc.  However there is also the potential for the AI itself to work on the project to improve its own intelligence.  We can therefore say that the optimisation power is given by:

Screen Shot 2023-03-28 at 7.16.08 PM

Whilst the AI system is still undeveloped and unable to contribute meaningfully to its own intelligence improvements we will have:

Screen Shot 2023-03-28 at 7.16.12 PM

If we assume that the company provides a constant investment in optimising their AI, and similarly there is a constant investiment worldwise, then we can treat this as a constant:

Screen Shot 2023-03-28 at 7.16.18 PM

Responsiveness to optimisation describes how easily a system is able to be improved upon.  For example a system which is highly responsive can be easily improved upon with minimal resource power.  A system which shows very little improvements despite a large investment in resource power has low responsiveness.

If we also assume that responsiveness to optimization, R, remains constant over some timeframe then we can write:

Screen Shot 2023-03-28 at 7.16.24 PM

We can then integrate this by separating the variables:

Screen Shot 2023-03-28 at 7.16.27 PM

This means that the intelligence of the system grows in a linear fashion over time.

However when the AI system reaches a certain threshold of intelligence it will become the main resource driving its own intelligence improvements (and much larger than the contribution of the company or the world).  At this point we can say:

Screen Shot 2023-03-28 at 7.16.31 PM

In other words the optimization power is a function of the AI’s current level of intelligence.  This then creates a completely different growth trajectory:

Screen Shot 2023-03-28 at 7.16.36 PM

Which we again solve as follows:

Screen Shot 2023-03-28 at 7.16.41 PMWhich means that now we have the growth of the intelligence of the system exhibiting exponential growth.

What does this mean in practice in terms of AI development?

Screen Shot 2023-03-28 at 7.16.48 PM

We can see above an example of how we might expect such intelligence development to look.  The first section (red) is marked by linear growth over short periods.  As R or D is altered this may create lines with different gradient but growth is not explosive. 

At the point A the AI system gains sufficient intelligence to be the main driving force in its own future intelligence gains.  Note that this does not mean that it has to be above the level of human intelligence when this happens (though it may be) – simply that in the narrow task of improving intelligence it is now superior to the efforts of the company and the world researchers.

So, at point A the exponential growth phase begins (purple) – in this diagram taking the AI system explosively past human intelligence levels.  Then at some unspecified point in the future (B on the diagram), this exponential growth ends and the AI approaches the maximum capacity intelligence for the system.

So it is possible that there will be an intelligence explosion once an AI system gets close to human levels of intelligence – and based on current trends it looks like this is well within reach within the next 5 years.  So hold on tight – things could get very interesting!

Screen Shot 2022-12-24 at 1.31.32 PM

Creating a Neural Network: AI Machine Learning

A neural network is a type of machine learning algorithm modeled after the structure and function of the human brain. It is composed of a large number of interconnected “neurons,” which are organized into layers. These layers are responsible for processing and transforming the input data and passing it through to the output layer, where the final prediction or decision is made.

Image recognition

Screen Shot 2022-12-24 at 1.30.00 PM

Neural networks can be used to classify images of (say) cats and dogs by training a model on a large dataset of labeled images. The model is presented with an input image, and its job is to predict the correct label (e.g., “cat” or “dog”) for the image.

To train the model, the input images are passed through the network and the model makes predictions based on the patterns it has learned from the training data. If the prediction is incorrect, the model adjusts the weights of the connections between neurons in order to improve its accuracy on future predictions.

Our own model

Screen Shot 2022-12-24 at 1.32.42 PM

I want to create a very simple model to “recognise” faces.  I first start with a 5 by 5 grid, and define what I think is a perfect face.  This is shown above.  I can then convert this to numerical values by defining the white spaces as 0 and the black squares as 1.

Screen Shot 2022-12-24 at 1.32.48 PM

I can then represent this information as 5 column vectors:

Screen Shot 2022-12-24 at 1.33.00 PM

Building a weighting model

Screen Shot 2022-12-24 at 1.33.07 PM

Next I need to decide which squares would be acceptable for a face.  I’ve kept the black squares for the most desirable, and then added some grey shade for squares that could also be included. I can then convert this into numerical data by deciding on a weight that each square should receive:

Screen Shot 2022-12-24 at 1.33.12 PM

Here I am using 1 to represent a very desirable square, 0.5 for a somewhat desirable square and -1 for an undesirable square.  I can also represent this weighting model as 5 column vectors:

Screen Shot 2022-12-24 at 1.33.17 PM

Using the dot product

I can then find the sum of the dot products of the 5 x vectors with the 5 w vectors.  In formal notation this is given by:

Screen Shot 2022-12-24 at 1.33.30 PM

What this means is that I find the dot product of x_1 and w_1 and then add this to the dot product of x_2 and w_2 etc. For example with:

Screen Shot 2022-12-24 at 1.33.36 PM

This gives:

Screen Shot 2022-12-24 at 1.33.41 PM

Which is:

Screen Shot 2022-12-24 at 1.33.50 PM

Doing this for all 5 vectors gives:

Screen Shot 2022-12-24 at 1.33.53 PM

So my perfect face has a score of 5.  So I can therefore give an upper and lower bound what what would be considered a face.  Let’s say:

Screen Shot 2022-12-24 at 1.33.58 PM

Testing our model: A Face

Screen Shot 2022-12-24 at 1.34.12 PM

I want to see if the above image would be recognised as a face by our model.  This has the following:

Screen Shot 2022-12-24 at 1.34.21 PM

And when we calculate the sum of the dot products we get:

Screen Shot 2022-12-24 at 1.34.25 PM

Which would be recognised as a face.

Testing our model: Not a Face

Screen Shot 2022-12-24 at 1.34.32 PM

There are 2 to the power 25 different patterns that can be generated (over 33 million), so we would expect that the majority do not get recognised as a face.  I randomly generated a 25 length binary string and created the image above.  When we use our model it returns:

Screen Shot 2022-12-24 at 1.34.39 PM

Which would not be recognised as a face.

Using Python and modifying the design

Screen Shot 2022-12-24 at 1.35.35 PM

I decided to modify the weighting so that undesirable squares received -2, to make them less likely to appear.  I then changed the weighting so that I wanted a score between 4.5 and 5.5 inclusive.

Screen Shot 2022-12-24 at 1.59.06 PM

I then wrote some Python code that would randomly generate 200,000 images and then run this algorithm to check whether this was recognised as a face or not.

The results

Screen Shot 2022-12-24 at 2.11.22 PM

You can see some of the results above – whilst not perfect, they do have a feel of a face about them.  And my favourite is below:

Screen Shot 2022-12-24 at 1.35.58 PM

A nice cheeky grin!  You can see the power of this method – this was an extremely simple model and yet achieves some results very quickly.  With images of 1 million pixels and much more advanced weighting algorithms, modern AI systems can accurately identify and categorise a huge variety of images.

Screen Shot 2022-12-11 at 7.06.37 PM

Can Artificial Intelligence (Chat GPT) Get a 7 on an SL Maths paper?

ChatGPT is a large language model that was trained using machine learning techniques. One of the standout features of ChatGPT is its mathematical abilities. It can perform a variety of calculations and solve equations.  This advanced capability is made possible by the model’s vast amounts of training data and its ability to understand and manipulate complex mathematical concepts. 

I didn’t write that previous paragraph – I asked Chat GPT to do it for me. The current version of Chat GPT is truly stunning – so I thought I would see if it is able to get a 7 on an SL Mathematics paper.  I simply typed the questions into the interface to see what the result was. 

AI vs an SL Paper

Screen Shot 2022-12-11 at 7.08.33 PM

I chose a 2018 SL Paper.  Let’s look at a breakdown of its scores (The full pdf of AI’s answers and marks is available to download here ).

(1) A function question.   AI gets 5 out of 6.  It makes a mistake by not swapping x and y for the inverse function.

(2) A box and whisker plot question. AI gets 2 out of 6.  It doesn’t know the IB’s definition of outlier.

(3) Interpreting a graph.  AI gets 0 out of 6.  It needs some diagrammatic recognition to be able to do this.

(4) Functions and lines.  AI gets 4 out of 7.  Bafflingly it solves it solves 2+4 -c = 5 incorrectly.

(5) Integration and volume of revolution.  AI gets 1 out of 7.  The integral is incorrect (off by a factor of 1/2).  Doesn’t sub in the limits for the integral.

(6) Vectors from a diagram.  AI gets 0 out of 6.  It needs some diagrammatic recognition to be able to do this.

(7) Normals to curves.  AI gets 7 out of 7.

(8) Inflection points and concavity.  AI gets 12 out of 13.  It solves 6x+18 <0 incorrectly on the last line!

(9) Vectors in 3D.  AI gets 7 out of 16.  Solves cos(OBA) = 0 incorrectly and can’t find the area of a triangle based on vector information.

(10) Sequences and trig.  AI  gets 11 out of 15.  

Total:  49/90.  This is a Level 5.  [Level 6 is 54.  7 is 65+]. 

Considering that there were 2 full questions that had to be skipped this is pretty good.  It did make some very surprising basic mistakes – but overall was still able to achieve a solid IB Level 5, and it did this in about 5-10 minutes (the only slow part was entering the questions).  If this system was hooked up to text recognition and diagrammatic recognition and then fine-tuned for IB Maths I think this would be able to get a Level 7 very easily.

Engines like Wolfram Alpha are already exceptional at doing maths as long as questions are interpreted to isolate the maths required.  This seems to be a step change – with a system able to simply process all information as presented and then to interpret what maths is required by itself.  

So, what does this mean?  Well probably that no field of human thought is safe!  AI systems are now unbelievably impressive at graphics design, art, coding, essay writing and chat functions – so creative fields which previously considered too difficult for computers are now very much in play. 

Screen Shot 2022-11-25 at 8.12.26 AM

The mathematics behind blockchain, bitcoin and NFTs.

If you’ve ever wondered about the maths underpinning cryptocurrencies and NFTs, then here I’m going to try and work through the basic idea behind the Elliptic Curve Digital Signature Algorithm (ECDSA).  Once you understand this idea you can (in theory!) create your own digital currency or NFT – complete with a digital signature that allows anyone to check that it has been issued by you.

Say I create 100 MATHSCOINS which I sell.  This MATHSCOIN only has value if it can be digitally verified to be an original issued by me.  To do this I share some data publicly – this then allows anyone who wants to check via its digital signature that this is a genuine MATHSCOIN.  So let’s get into the maths!  (Header image generated from here).

Elliptical curves

I will start with an elliptical curve and a chosen prime mod (here we work in mod arithmetic which is the remainder when dividing by a given mod).  For this example I will be in mod 13 and my curve will be:

Screen Shot 2022-11-25 at 8.23.23 AM

First I will work out all the integer solutions to this equation.  For example (7,5) is a solution because:

Screen Shot 2022-11-25 at 8.23.27 AM

The full set of integer solutions is given by:

Screen Shot 2022-11-25 at 8.23.35 AM

Now we define addition of 2 non equal points (p_1, p_2) and (q_1, q_2) on the curve mod M by the following algorithm:

Screen Shot 2022-11-25 at 8.23.51 AM

And we define the addition of 2 equal points (p_1, p_2)  on the curve mod M by the following algorithm:

Screen Shot 2022-11-25 at 8.24.08 AM

So  in the case of (8,8) if we want to calculate (8,8) + (8,8) this gives:

Screen Shot 2022-11-25 at 8.24.24 AM

This is a little tedious to do, so we can use an online generator here to calculate the full addition table of all points on the curve:

Screen Shot 2022-11-25 at 8.12.26 AM

This shows that (say) (7,5) + (8,5) = (11,8) etc.

I can then chose a base point to find the order of this point (how many times it can be added to itself until it reaches the point at infinity).  For example with the base point (8,8):

Screen Shot 2022-11-25 at 8.24.38 AM

We can also see that the order of our starting point A(8,8)  is 7 because there are 7 coordinate points (including the point at infinity) in the group when we calculate A, 2A, 3A…

ECDSA:  Elliptic Curve Signatures

So I have chosen my curve mod M (say):

Screen Shot 2022-11-25 at 8.23.23 AM

And I choose a base point on that curve (p_1, p_2) (say):

Screen Shot 2022-11-25 at 8.39.59 AM

And I know the order of this base point is 7 (n=7).  (n has to be prime).  This gives the following:

Screen Shot 2022-11-25 at 9.15.25 AM

I now chose a private key k_1:

Screen Shot 2022-11-25 at 8.40.34 AM

Let’s say:

Screen Shot 2022-11-25 at 8.47.55 AM

This is super-secret key.  I never share this!  I use this key to generate the following point on the curve:

Screen Shot 2022-11-25 at 8.40.41 AM

I can see that 5(8,8) = (11,5) from my table when repeatedly adding (8,8) together.

Next I have some data z_1 which I want to give a digital signature to – this signature will show anyone who examines it that the data is authentic, has been issued by me and has not been tampered with.  Let’s say:

Screen Shot 2022-11-25 at 8.47.49 AM

I choose another integer k_2 such that:

Screen Shot 2022-11-25 at 8.40.54 AM

Let’s say:

Screen Shot 2022-11-25 at 8.49.52 AM

I am now ready to create my digital signature (s_1, s_2) by using the following algorithm:

Screen Shot 2022-11-25 at 8.41.32 AM

Note, dividing by 2 is the same as multiplying by 4 in mod 7 (as this is the multiplicative inverse).

I can then release this digital signature alongside my MATHSCOIN (represented by the data z_1 = 100).  Anyone can now check with me that this MATHSCOIN was really issued by me.

Testing a digital signature

So someone has bought a MATHSCOIN direct from me – and later on wants to sell to another buyer.  Clearly this new buyer needs to check whether this is a genuine MATHSCOIN.  So they have check the digital signature on the data.  To allow them to do this I can share all the following data (but crucially not my private key):

Screen Shot 2022-11-25 at 8.53.44 AM

This gives:

Screen Shot 2022-11-25 at 9.07.46 AM

To verify someone then needs to do the following:

Screen Shot 2022-11-25 at 8.54.23 AM

To verify that the data z_1 has a valid digital signature we need:

Screen Shot 2022-11-25 at 8.54.28 AM

So with the shared data we have:

Screen Shot 2022-11-25 at 8.54.33 AM

This verifies that the data had a valid digital signature – and that the MATHSCOIN is genuine!  This is basically the foundation of all digital assets Screen Shot 2022-11-25 at 8.54.13 AMwhich require some stamp of authenticity.

In real life the numbers chosen are extremely large – private keys will be 256 digits long and primes very large.  This makes it computationally impossible (at current speeds) to work out a private key based on public information, but still relatively easy to check a digital signature for validity.

I have also made some simple Python code which will provide a digital signature as well as check that one is valid.  You can play around with these codes here:

(1) Digital signature code,  (2) Checking a digital signature for validity

So time to create your own digital currency!

Screenshot 2565-04-11 at 16.26.00

Finding the average distance in a polygon

Over the previous couple of posts I’ve looked at the average distance in squares, rectangles and equilateral triangles.  The logical extension to this is to consider a regular polygon with sides 1.   Above is pictured a regular pentagon with sides 1 enclosed in a 2 by 2 square.  The points N and O represent 2 randomly chosen points which we find the distance between.  On average what is the distance between these randomly chosen points N and O?

Starting with a hexagon

It’s a little easier to start with a hexagon as we get some nicer coordinate points.  So, our first challenge is to find the coordinates of a regular hexagon with sides 1.  Luckily we can use the complex roots of unity to do this.  We start by finding the 6th roots of unity and then converting these to coordinates in an Argand diagram:

Screen Shot 2022-04-11 at 5.29.11 PM

This then allows us to plot the following:

Screenshot 2565-04-11 at 16.32.54

We can then work out the inequalities which define the inside of the hexagon when we generate points within the 2×2 square centred at (0,0).  This gives:

Screen Shot 2022-04-11 at 5.32.08 PM

We can then run the following code to find the average distance:

Screenshot 2565-04-11 at 16.36.25

This gives the following result:

Screenshot 2565-04-11 at 16.37.29

We can check this result as the exact value is:

Screenshot 2565-04-11 at 16.41.17

which is 0.8262589495.  So we can see we are accurate here to 3 sf.

Pentagon

For the pentagon we can find the coordinates by finding the 5th roots of unity:

Screen Shot 2022-04-11 at 5.34.33 PM

We then need to scale all coordinate points by a factor, because in a pentagon the distance from the centre to the points is not 1 (as is the case in roots of unity).  We can find the distance from the centre to the edge of a pentagon by the following trigonometry:

Screen Shot 2022-04-11 at 6.20.13 PM

So, when we scale all coordinate points by this factor we get:

Screenshot 2565-04-11 at 17.09.47

And we can then do the same method as before and run the following Python code:

Screenshot 2565-04-11 at 17.07.58

This gives:

Screenshot 2565-04-11 at 17.11.03

n-sided polygon

We can now consider an n-sided polygon with sides 1.  Let’s start with the values we’ve found for an equilateral triangle (0.364), a square (0.522), a pentagon (0.697) and a hexagon (0.826.

Screenshot 2565-04-11 at 17.14.32

When we plot these they appear to follow a linear relationship:

average distance = 0.14n

We can check that this is correct by considering the fact that an n sided polygon will approximate a circle when n gets large.  So an n sided polygon with sides length 1 can be approximated by a circle with circumference n.  This allows us to work out the radius.

Screen Shot 2022-04-11 at 5.39.09 PM

We can then substitute this into the equation for the average distance of 2 points in a circle.

Screen Shot 2022-04-11 at 5.39.47 PM

So we would expect the average distance between 2 points in a regular polygon of sides 1 to approach the equation (as n gets large):

average distance = 0.144101239n

And we’ve finished!  Everything cross-checks and works nicely.  We’ve been able to use a mixture of complex numbers, geometry, coding and trigonometry to achieve this result.

Screenshot 2565-04-08 at 20.12.01

Finding the average distance in an equilateral triangle

In the previous post I looked at the average distance between 2 points in a rectangle.  In this post I will investigate the average distance between 2 randomly chosen points in an equilateral triangle.

Drawing a sketch.

The first step is to start with an equilateral triangle with sides 1.  This is shown above.  I sketched this using Geogebra – and used some basic Pythagoras to work out the coordinates of point C.

Screenshot 2565-04-08 at 20.15.37

I can then draw a square of sides 1 around this triangle as shown above.  I’m then going to run a Python program to randomly select points and then work out the distance between them – but I need to make sure that the 2 points chosen are both inside this triangle.  For this I need to work out the equation of the line AC and CB.

Using basic coordinate geometry we can see that the line AC has equation y = √3x.  We want the inequality y < √3x so that we are on the correct side of this line.

The line BC has equation y = -√3x + √3.  Therefore the triangle must also satisfy the inequality y < -√3x + √3.

I can then run the following code on Python, with finds the average distance between points (a,c) and (b,d) both within the unit square but also subject to the 2 inequality constraints above.

Screenshot 2565-04-08 at 20.22.08

When this is run it performs 999,999 trials and then finds the average distance.  This returns the following value:

Screenshot 2565-04-08 at 20.25.29

So we can see that the average distance is just over a third of a unit.

Finding the average distance of an equilateral triangle of length n.

Screenshot 2565-04-08 at 20.29.59

We can then draw the sketch above to find the equation of lines AC and CB for an equilateral triangle with lengths n.  This leads to the following inequalities:

 y < √3x

 y < -√3x + √3n

So we can then modify the code as follows:

Screenshot 2565-04-08 at 20.41.51

This then returns the average distances for equilateral triangles of sizes 1 to 10.

Screenshot 2565-04-08 at 20.42.08

And when we plot this on Desmos we can see that there is a linear relationship:

Screenshot 2565-04-08 at 20.45.11

The regression line has gradient 0.36 (2sf) so we can hypothesise that for an equilateral triangle of size n, the average distance between 2 points is approximately 0.36n.

Checking the maths

I then checked the actual equation for the average distance between 2 points in an equilateral triangle of sides n:

Screen Shot 2022-04-08 at 8.54.42 PM

This gives us:

Screen Shot 2022-04-08 at 8.57.43 PM

So we can see that we were accurate to 2 significant figures.  So this is a nice mixture of geometry, graphing and computational power to provide a result which would be otherwise extremely difficult to calculate.

Screen Shot 2022-04-06 at 11.15.38 AM

What is the average distance between 2 points in a rectangle?

Say we have a rectangle, and choose any 2 random points within it.  We then could calculate the distance between the 2 points.  If we do this a large number of times, what would the average distance between the 2 points be?

Monte Carlo method

The Monte Carlo method is perfect for this – we can run the following code on Python:

Screen Shot 2022-04-06 at 11.33.05 AM

This code will find the average distance between 2 points in a 10 by 10 square.  It does this by generating 2 random coordinates, finding the distance between them and then repeating this process 999,999 times.  It then works out the average value.  If we do this it returns:

Screen Shot 2022-04-06 at 11.36.13 AM

This means that on average, the distance between 2 random points in a 10 by 10 square is about 5.21.

Generalising to rectangles

I can now see what happens when I fix one side of the rectangle and vary the other side.  The code below fixes one side of the rectangle at 1 unit, and then varies the other side in integer increments.  For each rectangle it then calculates the average distance.

Screen Shot 2022-04-06 at 11.38.05 AM

This then returns the first few values as:

Screen Shot 2022-04-06 at 11.40.04 AM

This shows that for a 1 by 1 square the average distance between two points is around 0.52 and for a 1 by 10 rectangle the average distance is around 3.36.

Plotting some Desmos graphs

Screen Shot 2022-04-05 at 8.32.49 PM

Because I have included the comma in the Python code I can now copy and paste this straight into Desmos.  The dotted green points show how the average distance of a 1 by x rectangle changes as x increases.  I then ran the same code to work out the average distance of a 10 by x rectangle (red), 20 by x rectangle (black), 30 by x rectangle (purple) and 100 by x rectangle (yellow).

We can see if we continue these points further that they all appear to approach the line y = 1/3 x (dotted green).  This is a little surprising – i.e when x gets large, then for any n by x rectangle (with n fixed), an increase in x by one will tend towards an increase in the average distance by 1/3.

Heavy duty maths!

There is actually an equation that fits these curves – and will give the average distance, a(X) between any 2 points in a rectangle with sides a and b (a≥b).  Here it is:

Screen Shot 2022-04-06 at 5.38.59 PM

I added this equation into Desmos, by changing the a to x, and then adding a slider for b.  So, when I set b=1 this generated the case when the side of a rectangle is fixed as 1 and the other side is increased:

Screen Shot 2022-04-06 at 8.00.28 PM

Plotting these equations on Desmos then gives the following:

Screen Shot 2022-04-06 at 5.46.51 PM

Pleasingly we can see that the points created by our Monte Carlo method fit pretty much exactly on the lines generated by these equations.  By looking at these lines at a larger scale we can see that they do all indeed appear to be approaching the line y = 1/3 x.

General equation for a square

We can now consider the special case when a=b (i.e a square).  This gives:

Screen Shot 2022-04-06 at 6.11.34 PM

Which we can simplify to give:

Screen Shot 2022-04-06 at 6.11.42 PM

We can see therefore that a square of side 1 (a=1) will have an average distance of 0.52 (2dp) and a square of side 10 (a=10) will have an average distance of 5.21  – which both agree with our earlier results.

Plotting Pi and Searching for Mona Lisa

This is a very nice video from Numberphile – where they use a string of numbers (pi) to write a quick Python Turtle code to create some nice graphical representations of pi.  I thought I’d quickly go through the steps required for people to do this by themselves.

Screen Shot 2022-02-04 at 12.14.47 PM

Firstly you can run the Turtle code on trinket.io.  If you type the above code this will take the decimal digits of pi one at a time and for each one move forward 10 steps and then turn by 36 degrees times by that digit.  So for example the 1 will lead to a right turn of 36 degrees and the 4 will lead to a right turn of 36 x 4 = 144 degrees.

Screen Shot 2022-02-04 at 12.20.25 PM

Next it would be nice to have more digits of pi to paste in rather than type.  So we can go to the onlinenumbertools website and generate as many digits of pi as we want.  Select them to be comma separated and also to not include the first digit 3.  You can then copy and paste this string in place of the 1,4,1 in the code above.

1000 digits of pi

Screen Shot 2022-02-04 at 12.08.50 PM

If we run this program after pasting the first 1000 digits of pi we get (after waiting a while!) the above picture. There are a number of questions that they then raise in the video – if this program was ran infinitely long would the whole screen eventually be black?  Would this create every possible image that can be created by 36 degree turns?  Would it be possible to send this picture (say to an alien civilization) and for the recipient to be able to reverse engineer the digits of pi?

2000 digits of pi

Screen Shot 2022-02-04 at 1.32.10 PM

If you increase the digits of pi to around 2000 you get the above picture.  The graph spends a large time in the central region before finally “escaping” to the left.  It then left my screen at the top.

3000 digits of pi

Screen Shot 2022-02-04 at 1.41.45 PM

We can see that the turtle “returned” from off the top of the screen and then joined back up with the central region.  This starts to look like a coastline – maybe the south of the UK!

Different bases: Base 3

Screen Shot 2022-02-04 at 12.32.36 PM

We can consider the digits of pi in base three – which means that they are all equivalent to 0,1,2.  This means that we can use these to specify either 0 degree, 120 degree or 240 degree turns.  We can change the code as shown above to achieve this.  Note the i%3 gives i mod 3.  For example if the digit is 8, then 8 mod 3 is 2 (the remainder when 8 is divided by 3) and so this would turn 120 x 2 = 240 degrees.

This then creates a pattern which looks similar to the Sierpinski triangle fractal design:

Screen Shot 2022-02-04 at 12.39.09 PM

Base 4

Using a similar method, we can create the following using a base 4 design:

Screen Shot 2022-02-04 at 12.49.03 PM

This creates what looks like a map layout.

Base 5:

Screen Shot 2022-02-04 at 1.13.58 PM

In base 5 the turtle quickly departed from my screen!  With turns of 72 we don’t see the tessellating shapes that we do with base 3 and 4.

Base 6:

Screen Shot 2022-02-04 at 1.26.12 PM

With a 60 degree turn we can now see a collection of equilateral triangles and hexagons.

You can explore with different numbers and different bases to see what patterns you can create!

Witness Numbers: Finding Primes

The Numberphile video above is an excellent introduction to primality tests – where we conduct a test to determine if a number is prime or not.  Finding and understanding about prime numbers is an integral part of number theory.  I’m going to go through some examples when we take the number 2 as our witness number.  We have a couple of tests that we conduct with 2 – and for all numbers less than 2047 if a number passes either test then we can guarantee that it is a prime number.

Miller-Rabin test using 2 as a witness number:

We choose an odd number, n >2. First we need to write it in the form:

Screen Shot 2021-12-22 at 3.41.49 PM

Then we have to conduct a maximum of 2 different tests:

Screen Shot 2021-12-22 at 4.08.17 PM

If either of the above are true then we have a prime number.

Testing whether n = 23 is prime.

First we need to write 23 in the following form:

Screen Shot 2021-12-22 at 3.46.40 PM

Next we need to check if the following is true:

Screen Shot 2021-12-22 at 3.46.57 PM

Remember that mod 23 simply means we look at the remainder when we divide by 23.  We can do this using Wolfram Alpha – but in this case let’s see how we could do this without a calculator:

Screen Shot 2021-12-22 at 3.47.02 PM

Therefore this passes the test – and we can say that it is prime.

Testing whether 1997 is prime

For 1997 we have:

Screen Shot 2021-12-22 at 3.51.44 PM

So we need to first test if the following is true:

Screen Shot 2021-12-22 at 3.51.49 PM

However using Wolfram Alpha we get:

Screen Shot 2021-12-22 at 3.51.56 PM

So this fails the first part of the test.

Trying the second part of the test, we need:

Screen Shot 2021-12-22 at 3.52.14 PM

We have already tested the case when r=0 (this gives the earlier result), so just need to look at what happens when r=1.  Again we use Wolfram Alpha to get:

Screen Shot 2021-12-22 at 3.52.24 PM

This passes the 2nd part of the test and so confirms that 1997 is prime.

What happens with 2047?

2047 is not prime as we can write it as 2 x 3 x 11 x 31.  However it is the first number for which the witness 2 gives a false positive (i.e we get a positive result even though it is not prime).  We write 2047 as:

Screen Shot 2021-12-22 at 3.59.14 PM

But we do indeed get:

Screen Shot 2021-12-22 at 3.59.19 PM

So we can call 2047 a pseudoprime – it passes this prime number test but is not actually prime.

Larger primes

For numbers larger than 2047 you can combine witnesses – for example if you use both 2 and 3 as your witness numbers (and regard a positive result as at least one of them returning a positive result) then this will find all primes for n < 1,373,653.

More interestingly for extremely large numbers you can use this test to provide a probability estimate for the likelihood that a number is prime.  Lots to explore here!

Screen Shot 2021-05-12 at 2.54.23 PM

Maths Games and Markov Chains

This post carries on from the previous one on Markov chains – be sure to read that first if this is a new topic.  The image above is of the Russian mathematician Andrey Markov [public domain picture from here] who was the first mathematician to work in this field (in the late 1800s).

Creating a maths gameScreen Shot 2021-05-13 at 6.20.00 AM

Above I have created a simple maths game – based on the simplified rules of Monopoly.  All players start at Go.  All players roll 2 dice and add the scores and move that number of squares forward (after square 9 you move to square 1 etc).  If you roll a double you get sent to Jail (square 9) – but are released on your next turn.  If you land on square 5 you immediately are transported to Go (and end your turn).  

The question is, if we play this game over the long run which famous mathematician will receive the most visits?  (In order we have Newton (2), Einstein (3), Euler (4), Gauss (6), Euclid (7), Riemann (8)).

Creating a Markov Chain

The first task is to work out the probabilities of landing on each square for someone who is starting on any square.  Using a sample space diagram you can work out the following:

p(move 2) = 0.  p(move 3) = 2/36.  p(move 4) = 2/36.  p(move 5) = 4/36.  p(move 6) = 4/36.  p(move 7) = 6/36.  p(move 8) = 4/36.  p(move 9) = 4/36.  p(move 10) = 2/36.  p(move 11) = 2/36. p(move 12) = 0.  p(double) = 6/36.

We can see that the only variation from the standard sample space is that we have separated the doubles – because they will move us to jail.  

Matrix representation

Screen Shot 2021-05-13 at 6.34.34 AM

I now need to create a 9×9 matrix which represents all the states of the game.  The first subscript denotes where a player starts and the second subscript denotes where a player finishes after 1 turn.  So for example m_13 denotes that a player will start on square 1 and finish on square 3, and p(m_13) is the probability that it will happen.  If we do some (rather tedious!) calculations we get the following matrix:

Screen Shot 2021-05-13 at 6.34.42 AM

We can note that the probability of ending up on square 5 is 0 because if you do land on it you are transported to Go.  Equally you can’t start on square 5 – so the probability of starting there and arriving somewhere else is also 0.  Also note that each row represents all possible states – so always adds up to 1.

Now all I need to do is raise this matrix to a large power.  If I raise this to the power 100 this will give me the long term probabilities of which square people will land on.  This will also give me a 9×9 matrix but I can focus on the first row which will tell me the long term probabilities of where people end up when starting at Go.  This gives:

Screen Shot 2021-05-13 at 6.34.51 AM

So I can see that the long term probabilities (to 3sf) show that Jail is going to be visited around 40% of the time, followed by Go (around 26.7% of the time).  And the mathematician that receives the most visits will be Euclid (around 16.8%).  We can logically see why this is true – if 40% of the time people are in Jail, then the next roll they are most likely to get a 7 which then takes them to this square.

Extensions

Clearly we can then refine our game – we could in theory use this to model the real game of Monopoly (though this would be quite complicated!)  The benefit of Markov chains is that it is able to reduce complex rules and systems into a simple long term probability – which is hugely useful for making long term predictions.  

Website Stats

  • 9,374,417 views

About

All content on this site has been written by Andrew Chambers (MSc. Mathematics, IB Mathematics Examiner).

New website for International teachers

I’ve just launched a brand new maths site for international schools – over 2000 pdf pages of resources to support IB teachers.  If you are an IB teacher this could save you 200+ hours of preparation time.

Explore here!

Free HL Paper 3 Questions

P3 investigation questions and fully typed mark scheme.  Packs for both Applications students and Analysis students.

Available to download here

IB Maths Super Exploration Guide

A Super Exploration Guide with 168 pages of essential advice from a current IB examiner to ensure you get great marks on your coursework.

Available to download here.

Recent Posts

Follow IB Maths Resources from Intermathematics on WordPress.com