Rise of Data Analytics in Football: Expected Goals, Statistics, and damned lies

An article 3 years in the making; Jack Coles follows up on one of the best articles of 2013 with a detailed report about how the relationship between Data Analytics and football has progressed.

Editor’s Note: This article will be published in 3 separate posts. The 2013 article can be found here, and the first post of this part can be found here.

Expected goals is another aspect in the broad remit of data analysis in football deserving of attention in the last few years. ‘xG’, ‘ExG’, or ‘ExpG’, some of its many AKA’s, is never akin to baseball’s ‘on-base percentage’ (no matter what Allen Barra thinks), but it’s a football metric which many people invest a lot of faith in in the past year or so, and seems the most advanced at this stage. Simply, Expected Goals is how many goals a team could expect to score from where a shot was taken. But, other metrics are more sophisticated and incorporate the fact of whether the other team have had a man sent off, or whether it’s from a corner or a header, etc. It’s described as: ‘a method for estimating the quality of chances that a football team creates or concedes in a match’, by Michael Caley, an xG modeller.

Data Analytics Two

Expected Goals has many fans, and many passionate detractors, but – correctly – nobody is willing to admit it’s perfect. Back at the 2016 MIT Sloan Analytics Conference, Devin Pleuler, manager of Analytics at Toronto FC, explains: ‘Expected Goals has become pretty standard. Anybody who is doing analytics at a club is using it in some capacity. I don’t actually love it as a metric, it’s got plenty of flaws… It’s a great frame work… We use it, of course’. On the same panel, Blake Wooster, CEO at 21st Club Limited, a consultancy firm who football teams use for help with implementing data analytics at their club, says ‘Most Expected Goals models out there are more predictive of future success than goals and points … [Expected Goals] will predict where you’re going to finish more than the goals that you score. It’s quite a compelling message’.

It’s so popular, even Tim Sherwood saw fit to take a swipe at it with his friends in the broadsheets. In The Telegraph in 2016, he said: ‘There are far too few of the scouts. All that data analysis can be used for something but it can’t be used to pick your players. Some of the data is not about goals, or assists, it’s about ‘expected goals’ when a player got himself in position to score, but didn’t. What a load of nonsense.’

Blake Wooster points out the hilarity of the sentiment: ‘Here’s a guy in his post-match interviews who’ll talk about, “We created the best chances in the game, but I just didn’t think we got the result that we deserved.” … That’s like the definition of expected goals. Obvious the ironic thing is, when he was at Aston Villa, bottom of the table, they were actually incredibly unlucky. They were performing better… the league table was lying. So he got fired at a time when expected goals was telling him actually, Tim, you’re doing okay’. Even when recoiling in initial shock that Tactics Tim hadn’t taken this scoop to The Guardian, and his friend David Hytner, it’s easy to see that if Tim Sherwood had taken Expected Goals more seriously, and Aston Villa had too, he may not have been fired at all.

As Sherwood eludes, it’s true to say that the take of up for data analysis in football in Britain has been the same as its place in British society, seeing the words ‘Big Data’ relegated to the same scrap pile as ‘political correctness’. It’s chiefly one of suspicion. This environment is developed by apparently popular journalists (isn’t it always?) such as Neil Ashton. Ashton, who wrote in the Daily Mail in October 2015, former example, that: ‘[Michael] Edwards [Technical Director at Liverpool FC] can tap away at a laptop and within seconds tell you how many assists the 24-year-old Turkish left back Eren Albayrak has made… The increasing influence of analysts, young men who have no experience of scouting or recruiting players, has meant the end of the road for good football men. Instead a new breed sits in air-conditioned offices, cutting up videos from matches all over the world and burying their heads in the stats’. Seemingly, and unfortunately, quite a lot of people read Neil Ashton, and his words contribute to a tone surrounding data analysis which isn’t necessarily positive. How many assists the 24-year-old Turkish left back Eren Albayrak has made would not be considered data analysis, I hasten to add.

When The Guardian suggested that statistics were nearly useless to a political election campaign, Tim Hartford (in his excellent article ‘How politicians poisoned statistics’) remarked it: ‘was a dismaying read for anyone still wedded to the idea — apparently a quaint one — that gathering statistical information might help us understand and improve our world. But the … cynicism can hardly be a surprise. It is a natural response to the rise of “statistical bullshit” — the casual slinging around of numbers not because they are true, or false, but to sell a message.’ It’s this statistical bullshit which people like Ashton, and data-sceptics, have a problem with – data used to sell a message. This would not be considered data analysis either though.

In this spirit, statistics, totally stripped of context, regularly appear on football television programmes in the UK. Passing accuracy seems a repeat offender, but many redundant graphics and charts appear all over regular football coverage. Heat maps, touches, and average positioning are all regularly paraded out but pundits, but generally only to sell a message. There’s some promise, as at least the rise in data analysis in football is being acknowledged, but what data is useful for, and how it should be applied, is showing how sports broadcasters are totally missing the point.

Perhaps the grimmest assessment about the presence of data analysis in the English Premier League comes from football journalist Gabriel Marcotti, who revealed in 2016 that he’d ‘hooked up with two people I know who are extremely well connected and we sort of did like a minor survey of Premier League clubs’. In this minor survey the feedback was that only two managers in the entire division actually listened to their data analytics department. ‘There’s almost an education gap’, Marcotti explained.

Indeed, Harry Kane told The Telegraph in 2015, ‘Defoe told me once ‘if I miss a chance then the odds are now in my favour to score the next one because the chances of missing two in a row are less than missing one’. And that’s what I try and take. If I miss, then the next one is more in my favour to score. It’s about little things like that.’ Falling for the gambler’s fallacy here may show that data analytics needs to be presented to football employees in a more digestible format for it to make real strides in the industry – something the 2016 MIT Sloan Analytics Conference soccer analytics panel were unanimous on too.

Conversely, some players are very receptive already. Michael Niemeyer, Head of Match Analysis at FC Bayern München, tells a story at the 2015 edition of the MIT Sloan Analytics Conference football panel of when Arjen Robben asked him personally for his match output information at 3 ‘o’ clock in the morning.

Written by Jack Coles

Jack Coles

Jack Coles

Jack Coles likes watching any game of football, but with a particular fondness for the Serie 'A' and Pep Guardiola. He is also interested in the data analysis side of the sport, and how data is used in the game to study opponents, buy players and review performance. An aspiring coach, Jack also works in youth football.
Jack Coles