Spotting fake data with Benford’s Law

Spotting fake data with Benford’s Law

In the current digital age it’s never been easier to fake data – and so it’s never been more important to have tools to detect data that has been faked.  Benford’s Law is an extremely useful way of testing data – because when people fake data they tend to do so in a predictable way.  Benford’s Law looks at the probability that a number in certain data set (many measurements, street address, stock prices etc.) begins with a given number (its leading digit).  Whilst we might expect the leading digits (d) would be equally likely occur, in reality they follow the following equation:

Screen Shot 2021-10-06 at 6.45.30 AM

So for example we can see that a leading digit of 1 is much more likely than a leading digit of a 9:

Screen Shot 2021-10-06 at 6.47.32 AM

Testing some data

I wanted to test some data to see if it did indeed follow Benford’s Law.  So, I downloaded an Excel file with 531 data points from the CDC website.  This gave the moving 7-day average Covid cases per 100,000 people for every day from 12th March 2020 to 3rd October 2021.   I then used the nice Excel techniques shown above in the video to manipulate the data into a useful form.  Once this had been done I could then use Desmos to plot this data (dot plot and left aligned frequency histogram).  You can see this data below:

Screen Shot 2021-10-06 at 6.36.52 AM

The red curve is the continuous (rather than discrete) curve created by working out the expected frequencies for each digit.  On Desmos I generated this by the following equation:

Screen Shot 2021-10-06 at 7.24.14 AM

We can see that our data largely follows our expected curve – so we would not have any evidence to suggest faked data!  We could conduct a Chi-Squared test to measure the goodness of fit of our data (this is also explained in the video).

Conclusion

This is a simple but effective method to test for faked data – if data fails this test it doesn’t necessarily mean it was faked (eg. data on heights of men in cm will clearly have nearly all 1s as leading digits!) but most non-random real life data measurements do follow this rule.  Try to find your own data (try to do this with a large data set) and try for yourself.

IB teacher? Please visit my new site http://www.intermathematics.com ! Hundreds of IB worksheets, unit tests, mock exams, treasure hunt activities, paper 3 activities, coursework support and more. Take some time to explore!

Andrew Chambers: (Resources for IB teachers)

Are you a current IB student or IB teacher? Do you want to learn the tips and tricks to produce excellent Mathematics coursework?  Check out my new IA Course in the menu!

Andrew Chambers (Getting a 7 on IB Maths Coursework)

Comments are closed.

Powered by WordPress.com.

Up ↑

Discover more from IB Maths Resources from Intermathematics

Subscribe now to keep reading and get access to the full archive.

Continue reading