To quote coach Lars Lagerbäck – “Goals change games”.
At first, this post was just going to be a brief follow up of the preview I posted last week before ÖFK – Sirius. Well, the problem was that when I made the XG map after the game it just didn´t seem to make sense.
UPDATED. I made an error so the parts in this blog post about that the first goal is of monumental importance is not right. It has some importance but not as much as I state here. Check this blog post out for the correct numbers. The last parts here about time-intervals are correct however.
For anyone who visited the arena, including me. I guess we all would have said that ÖFK was unlucky and personally I would have guessed that the XG would have been at least 2-1 in favor of ÖFK. But, when I made the plot it looked like this.
Now, everyone who saw the game probably would argue that ÖFK should have won, having out shot the opponents 19 to 9. That made me starting to look for errors in my XG model. Something just had to be wrong! One obvious thing to check was if the impact of game-state did this. So I ran the XG model without game-state and then this happened:
The Game state is the current diffenrence in goals between the team taking the finish and the opposing team. GS -2 for instance means that the team is 2 goals down.
Fixed! Or? I started thinking… I gave it one nights sleep. Now, if I try to be objective. In the first half Sirius came really close to scoring 0-2. And the fact is that despite of all finishes ÖFK had, the score became 1-1 in the end, not more, not less. So, could it be that my model just describes what actually happens and just is unbiased? After all it is based on machine-learning and as I have shown earlier the model works better overall when game-state is included.
To check how my model performs I put my 24 core HP Z820 at work. Making the XG approximation for all events I have in my database, (34000) one at a time. It took 9 hours but when I had it I was able to plot the XG diff (difference between Expected goals and actually scored goals) for every game state individually. I wanted to do that to see how the XG is approximated at every game state, in an ideal world it should perform the same no matter what the game state was. Just by curiosity I added the ratio between shots and goals for different game states to the plot. And the later made me really surprised when I made the plot. This is what happened:
The difference between XG and goals works as intended. In fact in the plot it is more or less totally sticked to the y-axis. So, there is really nothing wrong with the model in this sense.
What became much more interesting when I looked at this plot is the ratio between shots taken and goals! The impact of a goal is just huge. I had expected some kind of impact. A team being 2 goals down probably won’t score as often on shots taken as a team leading by two, for several reasons. But I really couldn’t imagine that the impact would be this monumental. To be clear – the events I have in my db are only for the top three Swedish divisions the past two years. It may differ in other leagues and I would really appreciate if someone with access to data for other leagues could verify my findings. The only other league I’ve found this been made for is the Championship.
What does the plot say? To put it in a simple way. When a game is at a draw, every 17.8 shot is converted to a goal. When a team is one goal up they score on almost 41% of their shots.
With these new statistics at hand I have decided to stick with having game state as a parameter to my XG model. The XG map for ÖFK – Sirius is somewhat of an extreme. One could argue for how the map would have looked like if ÖFK had scored the first goal or if neither team would have scored.
I think one of the most interesting thing about sticking to this model is that especially players with a high ratio between scored goals and expected goals definitely has a larger impact of game outcomes than if game-state is excluded from the model. In this way players playing for weaker teams still may stand out since the goal expectancy for teams often trailing is lower than for teams often having the lead.
Another use of a model of this kind would obviously be to make qualified live-betting. I may look in to that in the future. Another interesting thing to look at is if there are teams or players who are better at scoring one down.
Still, and I’d love input on this; is my model to heavily influenced by game-state? I don’t think so right now. I just think it adds a new dimension to it. One way to level out the impact would perhaps be to add TSR to the teams for all events, telling the model which team is the stronger one. In that case at least the example in this post, ÖFK – Sirius would probably have looked a bit different since stronger teams have a higher tendency to convert shots to goals. I may try that at some point to see what it gives. But for now I will stick to this model and just state that ÖFK – Sirius had an XG of 0.7 – 2.
For ÖFK this season, they have had a tendency to quite often concede the first goal and after that still create lots of chances and turn games around. This is an anomaly and that is why the XG maps for ÖFK sometimes may look strange. The last time that happened actually was yesterday. ÖFK conceded a goal on their first shot against and after that I definitely think that the finishing quality of ÖFK worsened. I may have been influenced since I’ve been working on this post for some days but I think that ÖFK struggled for a while after conceding a goal. They got an equalizer on a free kick, 0.11 in XG, and after that they had control of the game. The map for yesterdays game looks like this, game state included:
If ÖFK hadn’t conceded an unnecessary goal we probably would have had a much better match between expected goals and goals scored. So what this really says is that the conceded goal ÖFK had was the most unexpected event of this game.
While I was at investigating the impact of game states I also looked at within which time intervals of a game that most goals are scored in relation to the total shots within that time interval. Can we see a higher expectancy of goals at certain time intervals of games? In deed we can! When I visualized this, it turned out to look like a happy face. In the beginning of games and in the dying minutes of a game the possibility of a shot being converted to a goal is up at over 20% while it is under 15% in the middle of games.
I also looked at the total number of goals scored within time intervals. It kills the myth that most goals are scored after 80 minutes!
Still, I guess that the biggest finding I’ve made is that of a team trailing by one goal scores on 4.9% of the shots taken while a team being up by one goal converts 41% of the shots taken to a goal.
The importance of scoring the first goal is just monumental (in the Swedish leagues that is)!