Google Page Rank: Trillion dollar maths

Screen Shot 2023-11-01 at 7.41.54 PM

Google Page Rank: Billion dollar maths

In the early 1990s search engines used to be text based – and would rank pages based on how many times a key word appeared. But this did not discriminate between useful pages and less useful pages. Larry Page and Segei Brin used some maths to come up with an alternative method of ranking pages and made a trillion dollar company!

Let’s see the basis of Google’s page ranking by starting with the following 4 websites:

Screen Shot 2023-11-01 at 7.43.20 PM

We can see that Website 1 links to 3 other sites. Therefore randomly clicking on these links would lead to a 1/3 probability of landing on each of the other 3 sites. I then can draw arrows from Website 1 to each of the other websites and mark the probability as 1/3 for all of these paths.  We can see that Website 2 links to 2 other sites – and so randomly clicking on these links will lead to a 1/2 probability of landing on Website 3 or 4. Continuing this method creates the following graph:

Screen Shot 2023-11-01 at 7.44.41 PM

We then represent this information in a 4 by 4 matrix where each column represents each of the websites. So the first column represents Website 1 and each of the values are the probability of landing on Website 1, 2, 3, 4 respectively. The second column represents the probability of landing on Website 1, 2, 3, 4 respectively when starting at Website 2. We continue this to give the following:

Screen Shot 2023-11-01 at 7.45.28 PM

The Google Algorithm was then able to see how often each site was linked to, and also able to give extra weight to links from sites that themselves were linked to often. In effect the more links to a site, the “better” that site is considered to be and the more value that links from that site are given.

They could then “rank” sites by considering the steady state of a transition matrix which would show the long term average probabilities of arriving at each page when randomly clicking on links to sites. To do this with a GDC we can simply raise the matrix above to a large power.

In this case (say):

Screen Shot 2023-11-01 at 7.46.26 PM

We notice when we do this that each column is the same to 3sf – i.e. it doesn’t matter where we start, the probability of landing on each site over time is the same:

Screen Shot 2023-11-01 at 7.47.36 PM

This means that randomly clicking will lead to a 0.387 chance of landing on Website 1 etc. So we can rank them in order as Website 1, Website 3, Website 4, Website 2.

Another example:

Let’s look at the following 4 websites:

Screen Shot 2023-11-01 at 7.48.52 PM

As before we can draw a graph mapping the connections:

Screen Shot 2023-11-01 at 7.49.05 PM

If we then create a matrix and raise to a high power we will be able to see the steady state:

Screen Shot 2023-11-01 at 7.49.11 PM

This shows we would rank as follows:

Rank: Guardian, BBC, Daily Mail, CNN

Now, clearly the Google algorithm is more sophisticated – and has been refined to take in lots of other factors, but at its heart this is the idea that reovlutionised search – by deciding to reward websites that other sites linked to (i.e that had authority) and thus creating a measurement of quality.  Mathematically quite simple – but a very lucrative idea!

IB teacher? Please visit my new site http://www.intermathematics.com ! Hundreds of IB worksheets, unit tests, mock exams, treasure hunt activities, paper 3 activities, coursework support and more. Take some time to explore!

Andrew Chambers: (Resources for IB teachers)

Are you a current IB student or IB teacher? Do you want to learn the tips and tricks to produce excellent Mathematics coursework?  Check out my new IA Course in the menu!

Andrew Chambers (Getting a 7 on IB Maths Coursework)

Comments are closed.

Powered by WordPress.com.

Up ↑

Discover more from IB Maths Resources from Intermathematics

Subscribe now to keep reading and get access to the full archive.

Continue reading