With Östersunds FK starting their Europa League group stage campaign with two straight victories I wanted to assess their chances of going through from their group.
As always in media – a teams possibilities to win/get promoted tends to be more based on a gut feeling than on facts. I therefore wanted to look at what possibilities Östersund really has to get trough the group stage based on what we know rather than what we think we know.
One of the first, obvious approaches to do a fact based assessment could be to go trough all past played group stages (384 groups since 2009) at round 2 and see how many teams with 6 points that actually did go trough. A slow and boring process to do. But still fact-based.
I went with another model. Inspired by my league projection model I chose an approach based on Machine Learning. With ML we can ask the computer – Given what we know about any given team att a certain point in the group stage – what is the probability that the team will finish 1st, 2nd, 3rd or 4th?
I started off testing with really basic data, Points, GD etc and looked at what accuracy the model had predicting the final positions in their respective groups. At round 2 the accuracy was at around 35%. Not super good but still fact based and since I have all positions for all teams are presented with probabilities I still found it useful.
Improving with ELO
Now, my old league projection model is based on much more data, and is also a bit more accurate. It is of course easier to find data regarding entire leagues and also to find larger data sets. But I still was interested to see if I could find data sources that could improve this model. The only one I could fint that possibly could add value was the ELO-rating of european teams from Club ELO. They also have an API so adding all teams current ELO-rank was really easy. With ELO added as a parameter I re-evaluated the model and looked at its accuracy at all rounds of the group stage. This is the result:
No surprise here. The accuracy of the model increases very rapidly and at round 5 it will predict the correct final position of all teams almost 80% of the time.
What was a bit more surprising for me was that adding a teams ELO-rank really didn’t have any significant impact to the model. The conclusion I make of that is that all teams qualifying for the group stage are good teams and that it actually is their current performance and nothing else that matters regarding possibility to go trough from the group. For me this makes sense. Any team winning 2 straight games in the EL-group phase obviously has the strength to go trough – they have already beaten enough competition to prove that.
I am very well aware of that the data set I have, 384 finished group stages is insufficient. But it’s the best I can do!
My program only outputed boring text at first, i.e.
[ 0.66289222 0.2781902 0.04733794 0.01157964]
Where the above columns are the possibilities of a team to finish at 1st, 2nd etc positions. So I thought of ways to visualize the output in an understandable way. The most intutitve I could create was a heat map where the possibilities, outputted as a matrix (normalized) were color coded (darker – more possible) for each – team on one axis and position on the other.
So without further due – here are the current projections for Europa League.
It is interesting to see how patterns are starting to form. Blurry heat maps are the ones with uncertain outcome while the ones with a backslash-form already seem to have clear favorites. Milan for instance are huge favorites now while group F seems to be very open…
Thank you for reading. Give me a shout out on twitter if you liked it or/and have any questions!