Some thoughts on Machine Learning and Football Analytics

So, not many blog posts here recently. Basically that just has been an effect of me having too much to do at my day time job. The good news however is that I will be able to do some analytics the coming weeks at work. My cooperation with, now promoted to Allsvenskan, ÖFK has grown and I’ve managed to convince my employer to start looking at a bigger project. This will include experts in design and UI/UX, to visualize and analyze data and stats. It has a huge potential and I’m really excited. I hope there will be more to come from this on the blog the coming months!

The reason writing a post on this day was not sharing that news however, but a direct question on Twitter from Tom Worville. What can Machine Learning do? Well, the simple answer is – a lot. I have earlier described how I’ve created my own XG-model with help of Machine Learning as well as a model for projecting league positions. The later will be interesting to start evaluating in december when I’ll be able to start predicting this years top leagues and comparing them to the existing ones like Michael Caleys.

The XG model I made really didn’t take long to make and seems to perform just as well as Michael Caleys (old model at least). Now, I’m not writing this to state who’s model is best or to diminish anyone elses work. I just like to illustrate that there are other ways to solve problems and analyze data. I can see some clear advantages in using ML for making predictions and estimations in Football Analytics in general actually. Some of the ones that I clearly can see are;

  • Using an ML model gives an unbiased result.
    A computer will never care about names or reputation, only hard data. And adding parameters to a model can be made without self having to figure out how much they matter. The model will do it for you.
  • The model constantly learns.
    This is one of the key benefits as I see it. For every shot taken or pass made, a model will look at the result from it and improve by it.
  • The more data the better
    In my own XG model I have 40000 shots saved. My 24 core HPZ820 has no problem in making XG estimations on the shots, one by one. And when I test my model now, compared to when I first evaluated it on the blog, it has improved.

To exemplify the advantage of using ML instead of a static statistical model I have three examples that I have thought a lot about and where I see a Machine Learning approach being an interesting alternative.

The first (image linked from CARTILAGE FREE CAPTAIN) is the shot matrix:

Any approach of estimating goal conversion using zones would imply that a shot taken from the front part of a zone is the same as one taken from the back of the zone. Such considerations will automatically be done using ML. And you never have to think about it or analyze the zone probability. That is something a computer will do better. And for every new shot taken the estimation will be better.

The next two examples comes from Michael Caleys new XG model. One thing that really struck me there was the inclusion of a variable for which league the estimation is made, “League Effects”. Now, there is nothing wrong with including such a parameter, it obviously matters. But, what happens when the quality of the leagues change? When Serie A once again becomes Europes best league? Will the number 0.07 * SerieA remain the same? Probably not, and that leads me to the third example where ML will be a true advantage. When the league strengts change or other parameters change, an ML model will adapt without any human involvement. Michael Caleys new formula, (-3.19 – 0.095 * distance + 3.18 * inverse_distance + 1.88 * relative_angle + 0.24 * inverse_angle – 2.09 * inverse_dist*angle + 0.45 * throughball_assist + 0.64 * throughball_2nd_assist + 0.31 * assist_across_face – 0.15 * cutback_assist + 2.18 * inverse_assist_distance + 0.12 * assist_angle + 0.23 * fast_break + 0.18 * counterattack + 0.09 * established_possession – 0.18 * following_corner + 1.2 * big_chance + 1.1 * following_error + 0.39 * following_dribble + 0.14 * dribble_distance + 0.37 * rebound + 0.03 * game_state + 0.07 * Bundesliga – 0.1 * EPL – 0.09 * LaLiga – 0.07 * SerieA) seems to me really advanced. For every game played, and every season that goes by. The formula probably has to be evaluated, parameter by parameter. To me that seems extremely time consuming. And maybe unnecessary. What would happen if you just had all the parameters fed to a Machine Learning algorithm instead? I think the answer would be that analysts could spend less time making models work and more time on collecting new data and figuring out what the models actually mean and how to use them in winning football matches.

What I have been missing for my own part to explore the area of ML and Football Analytics further is access to more and better data. I have been stuck with open and semi open data like Football-Data, Everysport, Betfair and Twitter. But now, to return to the start of the blog post and the cooperation with ÖFK – that may very well change…

Published by

Ola Lidmark Eriksson

Football analyst/programmer

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s