Projecting league positions, part 2

This post turned out to become quite a lot related to programming. As I mentioned in the last post my idea was make my predictive table positions/points model more accurate by adding more data to it. I wanted to use my own football-data parser in order to add shots data, including TSR to all teams included in the model.

It turned out to be a bit harder than I first expected. The parser I have written is actually quite slow. It uses up a lot of cpu-cycles and if you add a lot of leagues (in this case resulting in almost 17000 games) to the parser it gets REALLY slow. If you only are interested in one team or one game it might not be a problem but in my model this has to be repeated over 100 000 times and then it becomes a huge problem. So first I added some modifications to the parser. It now takes the possibility to add just a few leagues when initialized, making it faster. It still was more or less impossible to run the calculations when I worked on this at home on my 8 Core I7 though. Doing all calculations on one league took almost 20 minutes, making the entire calculation run in 20*59 minutes – almost a whole day! So I had to rewrite some code and distribute the calculation to all 8 Cores. The calculation time went down to 2 hours. This still was way to long for me, since I really wanted to run it with different deltas and different Machine Learning Models to be able to compare the predictive strength.

Luckily I have a 24 Core Xeon HP Z820 at work. So I have been running the calculations on that machine during the evenings this week. And these are the results from the model I will use.

I have added one more way of measuring how well the predictive model performs. I call it accuracy and it measures how often my model has made correct predictions on the table-positions. I measure it in three ways, the exact accuracy, the accuracy within 1 table position up or down and with 2 positions up and down.

So if I have a ±1 accuracy of 0.7 it means that the table position the model implies based on the predicted amount of points the team will get has a 70% certainty of being correct.

Delta (rounds away from final round) 20:

Delta 20
************ ACCURACY +-0: 0.230703259005 ***************
************ ACCURACY +-1: 0.494854202401 ***************
************ ACCURACY +-2: 0.674957118353 ***************

Delta 15:

Delta 15
************ ACCURACY +-0: 0.257289879931 ***************
************ ACCURACY +-1: 0.559176672384 ***************
************ ACCURACY +-2: 0.723842195540 ***************

Delta 10:

Delta 10
************ ACCURACY +-0: 0.302744425386 ***************
************ ACCURACY +-1: 0.624356775300 ***************
************ ACCURACY +-2: 0.793310463122 ***************

Delta 5:

Trace 0, y; Trace 1, y vs Trace 0, y; Trace 1, y - fit
************ ACCURACY +-0: 0.424528301887 ***************
************ ACCURACY +-1: 0.763293310463 ***************
************ ACCURACY +-2: 0.897941680961 ***************

Well, the graphs still shows on a large spread in points. I guess it’s just not possible to predict every team making runs upwards or downwards in the table with this model. Still, the ±2 accuracy of 67% already at delta 20 is interesting. I didn’t expect that. If you combine that with the Q1-Q3 in the box-blots as well as the min-max values it is definitely possible to already after 18 rounds of eg Premier League possible to start including and excluding teams for different “races” within the table.

Another conclusion I made was that adding TSR, as well as other shot related data made a very small impact on the model. The r² increaced with 0.1 at delta 10. I had expected some more, especially reading James Graysons excellent post on the topic. Reading that I just had to test my model with presenting it only with shots data. This is what happened:

Only TSR

So, there is, as several people have stated in blog posts, not only James Grayson, a clear connection between shots for/against and the performance of teams. BUT there is an even better match if you take into account the amount of points the teams have taken, league position e.t.c..

One last thing I just had to test was if this model in any way could have predicted Leicesters last 10 games of the last season. The answer – no! Actually when I looked at the max value in the graph for 11 predicted points – it was 23 so probably that is the Leicester dot for the last season one can see right there. And the possibility for this happening statistically is just at a few percent. For the rest of the predictions for that season it seemed to be more or less spot on. It even managed to see that Tottenham would overtake Liverpool by one point!

Within the coming days I’ll set my HP Z820 into some more work and this time I will make my first real predictions! It will be for Allsvenskan and Superettan. Just have to find out an intuitive way of visualizing it.

Delivered by Everysport

Published by

Ola Lidmark Eriksson

Football analyst/programmer

2 thoughts on “Projecting league positions, part 2”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s