You are currently browsing the tag archive for the ‘regression’ tag.

**Modelling tides: What is the effect of a full moon?**

Let’s have a look at the effect of the moon on the tides in Phuket. The Phuket tide table above shows the height of the tide (meters) on given days in March, with the hours along the top. So if we choose March 1st (full moon) we get the following graph:

**Phuket tide at full moon:**

If I use the standard sine regression on Desmos I get the following:

This doesn’t look a very useful graph – but the R squared value is very close to one – so what’s gone wrong? Well, Desmos has done what we asked it to do – found a sine curve that goes through the points, it’s just that it’s chosen a b value of close to 120 – meaning that the curve has a very small period. So to prevent Desmos doing this, we need to fix the period first. If we are in radians the we use the formula period = 2pi / b. Therefore looking at the original graph we can see that this period is around 12. Therefore we have:

period = 2pi/b

12 = 2pi/b

b = 2pi/12 or pi/6.

Plotting this new graph gives something that looks a lot nicer:

**Phuket tide at new moon:**

**Analysis:**

Both graphs show a very close fit to the original data – though both under-value the tide at 2300. We can see that the full moon has indeed had an effect on the amplitude of the sine curves – with the amplitude of 1.21m for the full moon and only 1.03m for the new moon.

**Further study:**

We could then see if this relationship holds throughout the year – is there a general formula to explain the moons effect on the amplitude? We could also see how we have to modify the sine wave to capture the tidal height over an entire week or month. Can we capture it with a single equation (perhaps a damped sine wave?) or is it only possible as a piecewise function? We could also use some calculus to find the maximum and minimum points.

There is a very nice pdf which goes into more detail on the maths behind modeling tides here. There we go – a nice simple investigation which can be expanded in a number of directions.

**Predicting the UK election using linear regression**

The above data is the latest opinion poll data from the Guardian. The UK will have (another) general election on June 8th. So can we use the current opinion poll data to predict the outcome?

**Longer term data trends**

Let’s start by looking at the longer term trend following the aftermath of the Brexit vote on June 23rd 2016. I’ll plot some points for Labour and the Conservatives and see what kind of linear regression we get. To keep things simple I’ve looked at randomly chosen poll data approximately every 2 weeks – assigning 0 to July 1st 2016, 1 to mid July, 2 to August 1st etc. This has then been plotted using the fantastic Desmos.

**Labour**

You can see that this is not a very good fit – it’s a very weak correlation. Nevertheless let’s see what we would get if we used this regression line to predict the outcome in June. With the x axis scale I’ve chosen, mid June 2017 equates to 23 on the x axis. Therefore we predict the percentage as

y = -0.130(23) + 30.2

y = 27%

Clearly this would be a disaster for Labour – but our model is not especially accurate so perhaps nothing to worry about just yet.

**Conservatives**

As with Labour we have a weak correlation – though this time we have a positive rather than negative correlation. If we use our regression model we get a prediction of:

y = 0.242(23) + 38.7

y = 44%

So, we are predicting a crushing victory for the Conservatives – but could we get some more accurate models to base this prediction on?

**Using moving averages**

The Guardian’s poll tracker at the top of the page uses moving averages to smooth out poll fluctuations between different polls and to arrive at an averaged poll figure. Using this provides a stronger correlation:

**Labour**

This model doesn’t take into account a (possible) late surge in support for Labour but does fir better than our last graph. Using the equation we get:

y = -0.0764(23) + 28.8

y = 27%

**Conservatives**

We can have more confidence in using this regression line to predict the election. Putting in the numbers we get:

y = 0.411(23) + 36.48

y = 46%

**Conclusion**

Our more accurate models merely confirm what we found earlier – and indeed what all the pollsters are predicting – a massive win for the Conservatives. Even allowing for a late narrowing of the polls the Conservatives could be on target for winning by over 10% points – which would result in a very large majority. Let’s see what happens!