**GPT-4 vs ChatGPT. The beginning of an intelligence revolution?**

The above graph (image source) is one of the most incredible bar charts you’ll ever see – this is measuring the capabilities of GPT4, Open AI’s new large language model with its previous iteration, ChatGPT. As we can see, GPT4 is now able to score in the top 20% of takers across a staggering field of subjects. This on its own is amazing – but the really incredible part is that the green sections represent improvements since ChatGPT – and that ChatGPT was only released **3 ½ months ago**.

GPT4 is now able to successfully pass nearly all AP subjects, would pass the bar exam to qualify as a lawyer and is now even making headway on Olympiad style mathematics papers. You can see that ChatGPT had already mastered many of the humanities subjects – and that now GPT4 has begun to master the sciences, maths, economics and law.

We can see an example of the mathematical improvements in GPT4 below from a recently released research paper. Both AIs were asked a reasonably challenging integral problem:

**GPT4 response:**

GPT4 is correct – and excellently explained, whereas the ChatGPT response (viewable in the paper) was just completely wrong. It’s not just that GPT4 is now able to do maths like this – after all, so can Wolfram Alpha, but that the large language model training method allows it do complicated maths as well as everything else. The research paper this appears in is entitled “Sparks of Artificial General Intelligence” because this appears to be the beginning of the holy grail of AI research – a model which has intelligence across multiple domains – and as such begins to reach human levels of intelligence across multiple measures.

**An intelligence explosion?**

Nick Bostrom’s Superintelligence came out several years ago to discuss the ideas behind the development of intelligent systems and in it he argues that we can probably expect explosive growth – perhaps even over days or weeks – as a system reaches a critical level of intelligence and then drives its own further development. Let’s look at the maths behind this. We start by modelling the rate of growth of intelligence over time:

Optimisation power is a measure of how much resource power is being allocated to improving the intelligence of the AI. The resources driving this improvement are going to come the company working on the project (in this case Open AI), and also global research into AI in the nature of published peer review papers on neural networks etc. However there is also the potential for the AI itself to work on the project to improve its own intelligence. We can therefore say that the optimisation power is given by:

Whilst the AI system is still undeveloped and unable to contribute meaningfully to its own intelligence improvements we will have:

If we assume that the company provides a constant investment in optimising their AI, and similarly there is a constant investiment worldwise, then we can treat this as a constant:

Responsiveness to optimisation describes how easily a system is able to be improved upon. For example a system which is highly responsive can be easily improved upon with minimal resource power. A system which shows very little improvements despite a large investment in resource power has low responsiveness.

If we also assume that responsiveness to optimization, *R*, remains constant over some timeframe then we can write:

We can then integrate this by separating the variables:

This means that the intelligence of the system grows in a linear fashion over time.

However when the AI system reaches a certain threshold of intelligence it will become the main resource driving its own intelligence improvements (and much larger than the contribution of the company or the world). At this point we can say:

In other words the optimization power is a function of the AI’s current level of intelligence. This then creates a completely different growth trajectory:

Which we again solve as follows:

Which means that now we have the growth of the intelligence of the system exhibiting exponential growth.

**What does this mean in practice in terms of AI development?**

We can see above an example of how we might expect such intelligence development to look. The first section (red) is marked by linear growth over short periods. As R or D is altered this may create lines with different gradient but growth is not explosive.

At the point A the AI system gains sufficient intelligence to be the main driving force in its own future intelligence gains. Note that this does not mean that it has to be above the level of human intelligence when this happens (though it may be) – simply that in the narrow task of improving intelligence it is now superior to the efforts of the company and the world researchers.

So, at point A the exponential growth phase begins (purple) – in this diagram taking the AI system explosively past human intelligence levels. Then at some unspecified point in the future (B on the diagram), this exponential growth ends and the AI approaches the maximum capacity intelligence for the system.

So it is possible that there will be an intelligence explosion once an AI system gets close to human levels of intelligence – and based on current trends it looks like this is well within reach within the next 5 years. So hold on tight – things could get very interesting!

]]>**The Perfect Rugby Kick**

This was inspired by the ever excellent Numberphile video which looked at this problem from the perspective of Geogebra. I thought I would look at the algebra behind this.

In rugby we have the situation that when a try is scored, there is an additional kick (conversion kick) which can be taken. This must be in a perpendicular line with where the try was scored, but can be as far back as required.

We can represent this in the diagram above. The line AB represents the rugby goals (5.6 metres across). For a try scored at point D, a rugby player can then take the kick anywhere along the line DC.

Let’s imagine a situation where a player has scored a try at point D – which is *a* metres from the rugby post at B. For this problem we want to find the distance, *x* for this value of *a* such that this maximises the value of θ . The larger the value of θ, the more of the rugby goal the player can aim at and so we are assuming that this is the perfect angle to achieve.

**Making an equation:**

We can use the diagram to achieve the following equation linking θ and *x:*

We can use Desmos to plot this graph for different values of *a:*

We can then find the maximum points from Desmos and record these

This then allows us to find the exponential regression line of the coordinates of the maximum points:

This regression line is given by the equation:

This graph is shown below:

We can also plot the *x* values of the maximum points against *a* to give the following linear regression line:

This graph is shown below:

This means that if we know the value of *a* we can now very easily calculate the value of *x* which will provide the optimum angle.

**Bring in the calculus!**

We can then see how close our approximations are by doing some calculus:

We can find the x coordinate of the maximum point in terms of *a* by differentiating, setting equal to 0 and then solving. This gives:

When we plot this (green) versus our earlier linear approximation we can see a very close fit:

And if we want to find an equation for optimum theta in terms of *x* we can also achieve this as follow:

When we plot this (green) we can also see a good fit for the domain required:

**Conclusion**

A really nice investigation – could be developed quite easily to score very highly as an HL IA investigation as it has a nice combination of modelling, trigonometry, calculus and generalised functions. We can see that our approximations are pretty accurate – and so we can say that a rugby player who scores a try *a* metres from the goal should then take the resultant conversion kick about *a+2 *metres perpendicular distance from the try line in order to maximise the angle to the goal.

**Creating a Neural Network: AI Machine Learning**

A neural network is a type of machine learning algorithm modeled after the structure and function of the human brain. It is composed of a large number of interconnected “neurons,” which are organized into layers. These layers are responsible for processing and transforming the input data and passing it through to the output layer, where the final prediction or decision is made.

**Image recognition**

Neural networks can be used to classify images of (say) cats and dogs by training a model on a large dataset of labeled images. The model is presented with an input image, and its job is to predict the correct label (e.g., “cat” or “dog”) for the image.

To train the model, the input images are passed through the network and the model makes predictions based on the patterns it has learned from the training data. If the prediction is incorrect, the model adjusts the weights of the connections between neurons in order to improve its accuracy on future predictions.

**Our own model**

I want to create a very simple model to “recognise” faces. I first start with a 5 by 5 grid, and define what I think is a perfect face. This is shown above. I can then convert this to numerical values by defining the white spaces as 0 and the black squares as 1.

I can then represent this information as 5 column vectors:

**Building a weighting model**

Next I need to decide which squares would be acceptable for a face. I’ve kept the black squares for the most desirable, and then added some grey shade for squares that could also be included. I can then convert this into numerical data by deciding on a weight that each square should receive:

Here I am using 1 to represent a very desirable square, 0.5 for a somewhat desirable square and -1 for an undesirable square. I can also represent this weighting model as 5 column vectors:

**Using the dot product**

I can then find the sum of the dot products of the 5 x vectors with the 5 w vectors. In formal notation this is given by:

What this means is that I find the dot product of x_1 and w_1 and then add this to the dot product of x_2 and w_2 etc. For example with:

This gives:

Which is:

Doing this for all 5 vectors gives:

So my perfect face has a score of 5. So I can therefore give an upper and lower bound what what would be considered a face. Let’s say:

**Testing our model: A Face**

I want to see if the above image would be recognised as a face by our model. This has the following:

And when we calculate the sum of the dot products we get:

Which would be recognised as a face.

**Testing our model: Not a Face**

There are 2 to the power 25 different patterns that can be generated (over 33 million), so we would expect that the majority do not get recognised as a face. I randomly generated a 25 length binary string and created the image above. When we use our model it returns:

Which would not be recognised as a face.

**Using Python and modifying the design**

I decided to modify the weighting so that undesirable squares received -2, to make them less likely to appear. I then changed the weighting so that I wanted a score between 4.5 and 5.5 inclusive.

I then wrote some Python code that would randomly generate 200,000 images and then run this algorithm to check whether this was recognised as a face or not.

**The results**

You can see some of the results above – whilst not perfect, they do have a feel of a face about them. And my favourite is below:

A nice cheeky grin! You can see the power of this method – this was an extremely simple model and yet achieves some results very quickly. With images of 1 million pixels and much more advanced weighting algorithms, modern AI systems can accurately identify and categorise a huge variety of images.

]]>**Can Artificial Intelligence (Chat GPT) Get a 7 on an SL Maths paper?**

ChatGPT is a large language model that was trained using machine learning techniques. One of the standout features of ChatGPT is its mathematical abilities. It can perform a variety of calculations and solve equations. This advanced capability is made possible by the model’s vast amounts of training data and its ability to understand and manipulate complex mathematical concepts.

I didn’t write that previous paragraph – I asked Chat GPT to do it for me. The current version of Chat GPT is truly stunning – so I thought I would see if it is able to get a 7 on an SL Mathematics paper. I simply typed the questions into the interface to see what the result was.

**AI vs an SL Paper**

I chose a 2018 SL Paper. Let’s look at a breakdown of its scores (The full pdf of AI’s answers and marks is available to download here ).

(1) A function question. AI gets 5 out of 6. It makes a mistake by not swapping x and y for the inverse function.

(2) A box and whisker plot question. AI gets 2 out of 6. It doesn’t know the IB’s definition of outlier.

(3) Interpreting a graph. AI gets 0 out of 6. It needs some diagrammatic recognition to be able to do this.

(4) Functions and lines. AI gets 4 out of 7. Bafflingly it solves it solves 2+4 -c = 5 incorrectly.

(5) Integration and volume of revolution. AI gets 1 out of 7. The integral is incorrect (off by a factor of 1/2). Doesn’t sub in the limits for the integral.

(6) Vectors from a diagram. AI gets 0 out of 6. It needs some diagrammatic recognition to be able to do this.

(7) Normals to curves. AI gets 7 out of 7.

(8) Inflection points and concavity. AI gets 12 out of 13. It solves 6x+18 <0 incorrectly on the last line!

(9) Vectors in 3D. AI gets 7 out of 16. Solves cos(OBA) = 0 incorrectly and can’t find the area of a triangle based on vector information.

(10) Sequences and trig. AI gets 11 out of 15.

Total: 49/90. This is a Level 5. [Level 6 is 54. 7 is 65+].

Considering that there were 2 full questions that had to be skipped this is pretty good. It did make some very surprising basic mistakes – but overall was still able to achieve a solid IB Level 5, and it did this in about 5-10 minutes (the only slow part was entering the questions). If this system was hooked up to text recognition and diagrammatic recognition and then fine-tuned for IB Maths I think this would be able to get a Level 7 very easily.

Engines like Wolfram Alpha are already exceptional at doing maths as long as questions are interpreted to isolate the maths required. This seems to be a step change – with a system able to simply process all information as presented and then to interpret what maths is required by itself.

So, what does this mean? Well probably that no field of human thought is safe! AI systems are now unbelievably impressive at graphics design, art, coding, essay writing and chat functions – so creative fields which previously considered too difficult for computers are now very much in play.

]]>(Header image generated from here).

**ECDSA: Elliptic Curve Signatures**

This is the second post on this topic – following on from the first post here. Read that first for more of the maths behind this! In this post I’ll look at this from a computational angle – and make a simple Python code to create and verify Elliptic Curve Signatures.

**Why Elliptical Curve Signatures?**

Say I create 100 MATHSCOINS which I sell. This MATHSCOIN only has value if it can be digitally verified to be an original issued by me. To do this I share some data publicly – this then allows anyone who wants to check via its digital signature that this is a genuine MATHSCOIN. Once you understand this idea you can (in theory!) create your own digital currency or NFT – complete with a digital signature that allows anyone to check that it has been issued by you.

**Python code**

This code will revolve around solutions mod M to the following elliptical curve:

We can run a quick Python code to find these solutions for a defined M:

This Python code then needs to use the algorithms for repeated addition of the base pair. It then needs to store all the coordinate pairs in a list (one list for the x coordinates and one for the y coordinates). These can then follow the algorithm for creating the digital signature. Note that we need to define the mod of the curve (M), the starting base pair (a,b), the order of the base pair (n), our data to digitally sign (z1), our private key (k1) and a public key (k2).

**The full code for digital signatures**

**Running this code**

I have put this code online at Replit.com here – so you can see how it works. It should look something like this:

**Checking a digital signature is genuine**

We might also want to work backwards to check if a digital signature is correct. The following code will tell us this – as long as we specify all the required variables below. Note we need the digital signature (s1, s2) as well as (r1,r2) – which is worked out by the previous code.

**Running this code**

You can run this code here – again on Replit.com. You should see something like this:

**Try it yourself!**

To create your own digital signatures you need to find a mod M and a base pair with order n, such that both M and n are prime. Remember you can use this site to find some starting base pairs mod M. Here are some to start off with

(1)

M = 907. Base pair = (670,30). n = 967

(2)

M = 79. Base pair = (60, 10). n = 67

(3)

M = 97. Base pair = (85, 92). n = 79

(4)

M = 13. Base pair = (8,8). n = 7

Can you run the code to create a digital signature, and then run the verification code to check that it is indeed genuine?

]]>**The mathematics behind blockchain, bitcoin and NFTs.**

If you’ve ever wondered about the maths underpinning cryptocurrencies and NFTs, then here I’m going to try and work through the basic idea behind the Elliptic Curve Digital Signature Algorithm (ECDSA). Once you understand this idea you can (in theory!) create your own digital currency or NFT – complete with a digital signature that allows anyone to check that it has been issued by you.

Say I create 100 MATHSCOINS which I sell. This MATHSCOIN only has value if it can be digitally verified to be an original issued by me. To do this I share some data publicly – this then allows anyone who wants to check via its digital signature that this is a genuine MATHSCOIN. So let’s get into the maths! (Header image generated from here).

**Elliptical curves**

I will start with an elliptical curve and a chosen prime mod (here we work in mod arithmetic which is the remainder when dividing by a given mod). For this example I will be in mod 13 and my curve will be:

First I will work out all the integer solutions to this equation. For example (7,5) is a solution because:

The full set of integer solutions is given by:

Now we define addition of 2 non equal points (p_1, p_2) and (q_1, q_2) on the curve mod M by the following algorithm:

And we define the addition of 2 equal points (p_1, p_2) on the curve mod M by the following algorithm:

So in the case of (8,8) if we want to calculate (8,8) + (8,8) this gives:

This is a little tedious to do, so we can use an online generator here to calculate the full addition table of all points on the curve:

This shows that (say) (7,5) + (8,5) = (11,8) etc.

I can then chose a base point to find the order of this point (how many times it can be added to itself until it reaches the point at infinity). For example with the base point (8,8):

We can also see that the order of our starting point A(8,8) is 7 because there are 7 coordinate points (including the point at infinity) in the group when we calculate A, 2A, 3A…

**ECDSA: Elliptic Curve Signatures**

So I have chosen my curve mod M (say):

And I choose a base point on that curve (p_1, p_2) (say):

And I know the order of this base point is 7 (n=7). (n has to be prime). This gives the following:

I now chose a private key k_1:

Let’s say:

This is super-secret key. I never share this! I use this key to generate the following point on the curve:

I can see that 5(8,8) = (11,5) from my table when repeatedly adding (8,8) together.

Next I have some data z_1 which I want to give a digital signature to – this signature will show anyone who examines it that the data is authentic, has been issued by me and has not been tampered with. Let’s say:

I choose another integer k_2 such that:

Let’s say:

I am now ready to create my digital signature (s_1, s_2) by using the following algorithm:

Note, dividing by 2 is the same as multiplying by 4 in mod 7 (as this is the multiplicative inverse).

I can then release this digital signature alongside my MATHSCOIN (represented by the data z_1 = 100). Anyone can now check with me that this MATHSCOIN was really issued by me.

**Testing a digital signature**

So someone has bought a MATHSCOIN direct from me – and later on wants to sell to another buyer. Clearly this new buyer needs to check whether this is a genuine MATHSCOIN. So they have check the digital signature on the data. To allow them to do this I can share all the following data (but crucially not my private key):

This gives:

To verify someone then needs to do the following:

To verify that the data z_1 has a valid digital signature we need:

So with the shared data we have:

This verifies that the data had a valid digital signature – and that the MATHSCOIN is genuine! This is basically the foundation of all digital assets which require some stamp of authenticity.

In real life the numbers chosen are extremely large – private keys will be 256 digits long and primes very large. This makes it computationally impossible (at current speeds) to work out a private key based on public information, but still relatively easy to check a digital signature for validity.

I have also made some simple Python code which will provide a digital signature as well as check that one is valid. You can play around with these codes here:

(1) Digital signature code, (2) Checking a digital signature for validity

So time to create your own digital currency!

]]>**Finding planes with radar**

PlusMaths recently did a nice post about the link between ellipses and radar (here), which inspired me to do my own mini investigation on this topic. We will work in 2D (with planes on the ground) for ease of calculations! A transmitter will send out signals – and if any of these hit an object (such as a plane) they will be reflected and received by a receiver. This locates the object as somewhere on the ellipse formed with the receiver and transmitter as the 2 foci. When we add a second receiver as shown above then if both receivers receive a signal, then we can narrow down the location of the object as the intersection of the 2 ellipses.

So, for this mini exploration I wanted to find the equations of 2 ellipses with a shared focus so that I could plot them on Desmos. I then would be able to find the intersection of the ellipses in simple cases when both ellipses’ major axis lies on the x axis.

**Defining ellipses**

For an ellipse centred at the origin shown above, with foci at c and -c we have:

where c is linked to a and b by the equation:

**Rotating an ellipse**

Next we can imagine a new ellipse in a coordinate system (u,v)

This coordinate system is created by rotating the x and y axis by an angle of theta radians anticlockwise about the origin. The following matrix transformation achieves this rotation:

This therefore gives:

and we can substitute this into our new coordinate system to give:

When we plot this we can therefore rotate our original ellipse by any given theta value:

We can use basic Pythagoras to see that the focus point c will become the point c1 shown above with coordinates:

By the same method we can see that the point c2 will have coordinates:

**Transformation**

Next we want to translate this new ellipse so that it shares a focus point with our original green ellipse. To do this we need to translate the point c2 to the point c. This is given by the translation:

So we can therefore translate our ellipse:

Which becomes:

When we plot this we get:

This then gives the 2nd ellipse in blue which does indeed share a focus point at c:

**Finding points of intersection**

The coordinates of when the 2 ellipses intersect is given by the solution to:

This looks a bit difficult! So let’s solve an easier problem – the points of intersection when the theta value is 0 (i.e when the ellipses both lie on the x axis). This simplifies things to give:

and we can find the y coordinates by substituting this into the original ellipse equation.

So the coordinates of intersection are given by:

So – in the above case we would be able to narrow down the location of the plane to 2 locations. With a 3rd ellipse we could pinpoint the location exactly.

]]>**Proving Pythagoras Like Einstein?**

There are many ways to prove Pythagoras’ theorem – Einstein reputedly used the sketch above to prove this using similar triangles. To keep in the spirit of discovery I also just took this diagram as a starting point and tried to prove this myself, (though Einstein’s version turns out to be a bit more elegant)!

**Step 1: Finding some links between triangles**

We can see that our large right angled triangle has sides *a,b,c* with angles alpha and beta. Hopefully it should also be clear that the two smaller right angled triangles will also have angles alpha and beta. Therefore our triangles will all be similar. It should also be clear that the area of the 2 small triangles will be the same as the area of the large triangle.

**Step 2: Drawing a sketch to make things clearer:**

It always helps to clarify the situation with some diagrams. So, let’s do that first.

**Step 3: Making some equations**

As the area of the 2 small triangles will be the same as the area of the large triangle this gives the following equation:

We also can make the following equation by considering that triangles 2 and 3 are similar

We can now substitute our previous result for x into this new equation (remember our goal is to have an equation just in terms of* a,b,c* so we want to eliminate *x* and *y* from our equations).

We can also make the following equation by considering that triangles 1 and 2 are similar:

And as before, our goal is to remove everything except a,b,c from these equations, so let’s make the substitution for y using our previous result:

And if by magic, Pythagoras’ theorem appears! Remember that the original *a,b,c *related to any right angled triangle with hypotenuse *c, *so we have proved that this equation must always be true for right angled triangles.

You can explore some other ways of proving Pythagoras here. Which is the most elegant?

]]>These now have some great free resources for students to help them with the IB maths course – including full course notes, formula books, Paper 3s, an Exploration guides and a great mind-map. Make sure to check these all out to get some excellent support for the IB maths course.

These now have over 25 worksheets, investigations, paper 3s, treasure hunts and more resources – both with question pdfs and markscheme pdfs. I’ve added a lot of enriching activities that would support explorations and paper 3 style problems and also put a selection of some excellent other resources from IB teachers too.

So be sure to check these both out!

]]>**Finding the average distance in a polygon**

Over the previous couple of posts I’ve looked at the average distance in squares, rectangles and equilateral triangles. The logical extension to this is to consider a regular polygon with sides 1. Above is pictured a regular pentagon with sides 1 enclosed in a 2 by 2 square. The points N and O represent 2 randomly chosen points which we find the distance between. On average what is the distance between these randomly chosen points N and O?

**Starting with a hexagon**

It’s a little easier to start with a hexagon as we get some nicer coordinate points. So, our first challenge is to find the coordinates of a regular hexagon with sides 1. Luckily we can use the complex roots of unity to do this. We start by finding the 6th roots of unity and then converting these to coordinates in an Argand diagram:

This then allows us to plot the following:

We can then work out the inequalities which define the inside of the hexagon when we generate points within the 2×2 square centred at (0,0). This gives:

We can then run the following code to find the average distance:

This gives the following result:

We can check this result as the exact value is:

which is 0.8262589495. So we can see we are accurate here to 3 sf.

**Pentagon**

For the pentagon we can find the coordinates by finding the 5th roots of unity:

We then need to scale all coordinate points by a factor, because in a pentagon the distance from the centre to the points is not 1 (as is the case in roots of unity). We can find the distance from the centre to the edge of a pentagon by the following trigonometry:

So, when we scale all coordinate points by this factor we get:

And we can then do the same method as before and run the following Python code:

This gives:

**n-sided polygon**

We can now consider an n-sided polygon with sides 1. Let’s start with the values we’ve found for an equilateral triangle (0.364), a square (0.522), a pentagon (0.697) and a hexagon (0.826.

When we plot these they appear to follow a linear relationship:

average distance = 0.14n

We can check that this is correct by considering the fact that an n sided polygon will approximate a circle when n gets large. So an n sided polygon with sides length 1 can be approximated by a circle with circumference n. This allows us to work out the radius.

We can then substitute this into the equation for the average distance of 2 points in a circle.

So we would expect the average distance between 2 points in a regular polygon of sides 1 to approach the equation (as n gets large):

average distance = 0.144101239n

And we’ve finished! Everything cross-checks and works nicely. We’ve been able to use a mixture of complex numbers, geometry, coding and trigonometry to achieve this result.

]]>