“6 penalties the right way and I saved three, so basically the homework was very good“
Petr ?ech, 19 May 2012.
Data analytics won the Champions League final in 2012. The above quote was taken from an interview just minutes after Chelsea beat Bayern Munich on penalties in the last game of the tournament that year. Petr ?ech, Chelsea’s goalkeeper on the night, faced one penalty in extra-time and five in the penalty shoot-out and dived the correct way for each one – an incredible feat. That is of course, unless ?ech knew where the penalties were going to go – which he did. As he confirms in the quote, Chelsea had done their homework. The data analysis that was made available to the Czech goalkeeper was vast. ?ech had a 2 hour DVD of every penalty Bayern Munich had ever taken since 2007 which was more than enough to calculate the statistical likelihood of where each penalty was going to be placed from each opposing player. All this information was presented to the goalkeeper from Chelsea’s data department. When Chelsea won, ?ech felt obliged to mention he had help.
This article has featured in A Football Report’s Best of Football Writing 2013 list
Penalties are a good place to start for the emergence of data analysis within football over recent years. Scientists, economists and mathematicians love penalty shoot-outs because they’re overtly applicable to game theory and this has attracted much attention from number-crunchers outside the sport. People like Ignacio Palacios-Huerta, a Spanish economist, know more about penalty shoot-outs than anyone else in the world. Many national teams and club sides are starting to use data on penalties as a result. Palacios-Huerta was even asked to help prepare a dossier for the Dutch national team just days before the World Cup final in 2010 to help beat Spain. Being Basque, he jumped at the chance, and although the information wasn’t needed, these sorts of advantages, as Bayern Munich will tell you, make all the difference in football tournaments.
Penalties included, there has been a phenomenal rise in data analytics in football over the last ten years. Some of it has to do, inevitably, with baseball. Obviously, you cannot have an article about sports analytics and not mention ‘Moneyball’, which seems to have accelerated the application of data use within the sport so effectively. For those who do not know ‘Moneyball’ is the story of how the General Manager of the Oakland Athletics baseball team, Billy Beane, turned a very poor and unfashionable team into something greater than itself. By analyzing statistics and player metrics in a method put forward by a man named Bill James, among others, Beane and his staff found baseball players and methods of playing that were undervalued. They used this data in-game and when recruiting players and went on one of the longest winning streaks in the history of baseball as a result. The use of effective data engineering services might just have revolutionized the sport! Football has inevitably seen the value of this too and slowly but surely many people inspired by ‘Moneyball’ within football are looking at data to give them the edge that Chelsea found on May the 19th, 2012. Although it may not have the profound effect it had on baseball, it will certainly give football teams a competitive advantage. Not only that, such data analysis could also aid the large football betting industry that works behind the scenes. Although we cannot speak of the legalities involved in such endeavors, sports, and especially football betting is something that has a wide reach and occupies millions of people, football fans or otherwise, all over the world. Technologies such as data analytics can potentially be applied to figure out the odds for any given match, provided there is enough data available. Could this make way for a lot more football fix? Maybe or maybe not. However, let’s choose to focus on the fact that teams can vastly improve their game play, which is enough cause for rejoicing.
The path, however, has not always been clear. When John W. Henry purchased Liverpool Football Club in 2010, the environment was perfect for football to embrace data analysis wholesale and for the same ‘Moneyball’ processes to take place in the Premier League. Mr Henry had used the ‘Moneyball’ approach Beane espoused with the other sports team that he owned, the baseball team the Boston Red Sox, and this saw them win their first World Series in 86 years in 2004. However, Liverpool failed miserably to translate this to football during Henry’s first few years and it discouraged many other clubs from attempting the same method on the same scale. Many of us will remember the purchase by Liverpool of Andy Carroll for £35 million in January 2011. It was clearly an alarming transfer, but a transfer driven entirely by Liverpool’s, and then Director of Football Damien Comolli’s data set. According to Comolli (friend of Beane’s and fan of the Oakland A’s) Andy Carroll was supposed to marry with Jordan Henderson and Stewart Downing’s superior passing abilities in the final third to produce an avalanche of goals that would presumably propel Liverpool up the table and hopefully win them the league. Reinforcing this was the fact that Henderson and Downing were found at the top of many statistical lists as among the best passers in the league for the 2010-2011 season, and Comolli’s desire to bring them to the club reflected that. He wanted them to play in the same team as Andy Carroll, who was the best converter of crosses in the league at the time, for this project to really bear fruit. However, Comolli made a huge mistake. His problem was not that he consulted data, but that he consulted the wrong data altogether. He recognised that data can give you an edge, and he recognised the value of goals, but he did not recognise that crosses, statistically, were a terrible way to score those goals. Carroll, with Downing and Henderson were fantastically inefficient. Comolli and Andy Carroll both lost their jobs at Anfield.
This exposed the problems with applying statistics to outfield action. Football was always going to have trouble utilising analytics, but it does appear to have recovered from the Comolli-induced wobble. Clubs are still largely run by individuals who believe they can rely on gut instinct rather than any form of data analysis, but the strength of good data is starting to erode this primitive view as it did in baseball. Data is hugely prominent in the sport today and even more so than it was before Moneyball was written – whether this is a coincidence or not. The only answer to Comolliesque bad data, is more data, good data and better data, not no data at all.
And better data has followed. Stoke City, under Pulis, analysed throw-ins, Manchester City under Mancini analysed corners, and people like Palacios-Huerta are analysing free-kicks as well. We have more data for the optimum time substitutes should be introduced, emerging data on player conditioning, and Comolli can be thankful that many in-game scenarios are being studied too. And better data tells us extraordinary things. Corners for example, like crosses, are terrifically bad at producing goals.
As Sally and Anderson reveal, in their book ‘the Numbers Game: Why Everything You Know About Football is Wrong‘, ‘when we combine the odds of corners generating a shot on goal plus the odds that these shots will find the back of the net, our data show that the average corner is worth about 0.022 goals, or – more simply – that the average Premier League team scores a goal from a corner once every ten games’. This view doesn’t really seem consistent with the masses of fans who celebrate a corner like they’ve scored a last minute winner. ‘89 per cent of shots on goal produced from corners are wasted‘, Sally and Anderson add. And this data can be put to use. Real Salt Lake, an American MLS team (where data in sports in taken more seriously), are one team seemingly making use of short corners to exploit this, from the logical viewpoint that surrendering the ball for 0.022 of a goal is a bad idea. Short corners also give a numerical advantage in the penalty area as well, as Devin Pleuler, a data analyst with the MLS puts forward, ‘RSL, as well as a handful of other MLS clubs (and even the US national team on occasion), have commonly been setting up for corner kicks with two attacking players near the corner – regardless if they eventually attempt a short corner or not. Usually, the defending team is forced to send two defenders in response. When this happens, a very subtle numerical shift occurs in favour of the attacking team’. The resulting set-pieces may look strange, but the application of this information gives Real Salt Lake a competitive advantage.
Roberto Mancini, the former Manchester City manager, owes some of his only Premiership trophy to the use of data from corners too. Simon Kuper, one of the founders of the Soccernomics Consultancy Agency for football teams and co-author of ‘Why England Lose: And other curious phenomena explained‘ details, ‘analysts finally persuaded the club’s then manager, Roberto Mancini, that the most dangerous corner kick is the inswinger, the ball that swings towards goal. Mancini had long argued (strictly from intuition) that outswingers were best. Eventually he capitulated and, in the 2011-2012 season, when City won the English title, they scored 15 goals from corners, the most in the Premier League. The decisive goal, Vincent Kompany’s header against Manchester United, came from an in swinging corner.”. This shows how good data when practically applied can give you an edge. Not a lot of championships were as close as the 2011-2012 Premier League season to which Kuper refers, so those 15 goals were huge. Football teams are using this kind of information more and more.
Substitutes have also been covered in some detail by ex-player (and frequently benched) Bret Myers. Dr Myers, who is now working as an assistant professor at the Villanova School of Business in Pennsylvania, found the optimum time to introduce substitutes when a team were losing the match. Jared Diamond of the Wall Street Journal, who wrote a short piece on Myers, cited that he ‘concluded that if their team is behind, managers should make the first substitution prior to the 58th minute, the second substitution prior to the 73rd minute and the third prior to the 79th minute. Teams that follow these guidelines improve-score at least one goal-roughly 36% of the time. Teams that don’t follow the rule improve about 18.5% of the time. He noted 1,037 instances the rule could have been applied and found that managers abide by it a little less than half the time‘.
It is often said that football is too fluid, too fast and too random to assess open play. Whilst this is true on the surface, hence the better data sets for set-pieces above, this view will eventually, like the belief that statistics aren’t useful at all, deteriorate. Undeniably, data will show us more. Just as science has slowly chipped away at the supernatural, good data will slowly demolish the misconceptions about the world of football as well. The best advances seem to be coming from the Germany, which can add being at the forefront of football analytics to the list of ways in which it is better than everybody else at the moment as well. Jürgen Klinsmann, who is friends with Billy Beane too (Beane has a lot of friends, finding out a cheap way of beating almost everybody you play in sport will do that for you), consulted a data department in Cologne for information about penalties, free-kicks and open play situations in the 2006 World Cup, and during his whole tenure as Germany manager, a tradition which Joachim Löw has continued. As well as the information the optimum distance between defenders in a back four is roughly 8-metres, and statistically the best way to dispossess Lionel Messi (a defender straight on him, and another stationed 1 yard behind him, apparently) everybody will remember a famous list that was produced by the Cologne analytics team in the penalty shoot-out between Germany and Argentina in the quarter-finals of Klinsmann’s one and only World Cup as Germany manager. Presumably the Cologne-based staff were hoping like ?ech, the Germany goalkeeper Jens Lehmann would simply remember where the Argentinean takers liked to place their kicks. He couldn’t however, and kept the list in his sock.
Up to now, football analytics isn’t widespread in football for several reasons. Firstly, as previously stated, managers. They don’t like their status challenged. If they can’t rely on their observations and their instincts then what use are they in the first place? This instinct, usually honed as a player, would start to come into question. Essentially, the game is run by people who don’t like data. The incident at Liverpool only made this worse. Therefore, people like Klinsmann and Löw, and the Premiership’s new golden boys David Moyes and Roberto Martínez, who definitely use data when running their football clubs, have the foresight and humility to bow to superior wisdom. Secondly, data is expensive. Institutions like Opta and ProZone have no trouble giving out some gems on twitter, but their pools of mass data aren’t something they would be willing to part with so readily. Football has only just started to keep data (FIFA didn’t count assists until 1994), so what little there is is hoarded by a select few, driving the price up even more. This keeps it out of the hands of people who could easily make it show us something of value. Manchester City however, who have a Sports Analytics department, did distribute an entire seasons worth of collected data for free, with the hope of inspiring young bloggers with a passion, or economic students without the right numbers to take the information and find hidden values. This unfortunately hasn’t happened. The fact remains that if somebody in football finds a significant competitive edge through data, it will be a very valuable commodity indeed.
Football analytics is certainly growing, however, and as more clubs and nations use it, and as long as people keep looking at numbers, murky information will start to crystalise into something of value. The latest addition to this expanding field is the aforecited, ‘the Numbers Game: Why Everything You Know About Football is Wrong‘, and it already represents a significant step in the direction of better data. The authors posit some very interesting ideas throughout the book, and tell us why Chelsea should have bought Darren Bent, why a clean sheet is more valuable than a goal, and why replacing your weakest player is so much better than getting another 30-goal-a-season man upfront. This information, it seems, is finding more value in football and new breakthroughs are emerging all the time. And hopefully, as FM2022 steps in the use of data analytics, might be used big time. This data might also help Football Manager 2022 wonderkids, as well as ardent fans to make their way through the game.
No doubt about it though, the last word should go to Bill James though, the Godfather of baseball analytics, who wrote about statistical analysis whilst working his job as a night-watchman in a pork and beans factory, and the man who initiated the ‘Moneyball’ movement. In a quote cited in the beginning of Sally and Anderson’s book, James’ statement is proving ever more interesting to people within football, year after year.
‘In sports, what is true is more powerful than what you believe, because what is true will give you an edge’.
Bill James