An insight into Data Analysis in Football: Interview with Dustin Bottger


Global Soccer Network‘s main target is it to revolutionize the modern day scouting in soccer. Influenced by “Moneyball”, Michael Lewis’ international bestseller, they have considered possibilities of how scouting in professional football could become more effective and created their unique rating system, the GSN-Index. We thank Dustin Bottger, CEO of Global Soccer Network, for his time and valuable insight.

Football Data Analysis


From a data analytics perspective, what difficulties do you face when gathering data?

The biggest problem collecting data is having a consistent quality for all relevant competitions. A system like ours only makes sense when we have the equal amount of data for every player in our database. Since we have tie ups with several data companies and football analysts, we have no problem gathering our data.

Give us more detail as to what the GSN Index is and how it works. What factors does it consider and what makes it unique?

We want to create the most comprehensive player evaluation system in the world of football. So we have developed the GSN index, which is based on four different pillars. Here is a short description of our 4 pillars:

  • Soccer Related Characteristics (SRC): The basis for the GSN Index is the evaluation and rating of different characteristics (over 70) which are essential for players. Every player will be evaluated by several scouts independently, in order to achieve the most possible objectivity. The assessment includes technical, tactical, mental and physical characteristics which are summarized in a rating between 0 and 100. The top score of 100 will, however, not be reached by any player. The total rating will be calculated with a sophisticated system which weighs positional characteristics. A central defender requires different capabilities compared to a forward, something which our system will automatically take into consideration. All assessments of the players which are available in our database are regularly updated through our worldwide scouting network.
  • Potential/Capability of development: One of the most important factors when buying a player, is the question of how he will develop in the future. Even today, huge sums are being paid for young players and young talents. It is, therefore, important to recognize at a very early stage the future potential of a young player in order to engage him under economic conditions. GSN has developed a system which is based on                       modified economic and financial algorithms. Various factors are taken into consideration which influence the development of the player, such as the quality of coaches and football education, learning ability and age, just to name a few. Starting value is 0.00 (=no further development in the future). The scale is open at the top.           The higher the value the higher the potential.
  • The +/- statistic: The +/- statistic is the third of the 4 pillars in the GSN Index. Based on their performance data, players receive plus scores for positive actions during the game (goals, assists, penalty saves, complete passes etc.). For negative actions, players receive minus scores (own goals, red cards, incomplete passes etc.) The 100 value represents a balanced +/- statistic. If the value is higher than 100 then the player has had more positive actions than negative actions.If, however, the value is lesser than 100, then the player has had more negative than positive actions.Of course, the statistic will take into consideration position related differences. Offensive actions of a defender will be rated differently than that of a forward.
  • Level of play: The level of play is the last but not least of the 4 pillars in the GSN Index. With our system it is possible to rate and analyze every match a player has played in his entire career. Every match gets rated on a scale between 1 and 20. 20 for example is a World Cup final or UEFA Champions League final while values close to 1 are mostly youth leagues or non-professional leagues. The system also differentiates other                     factors like the age of a player or the minutes on the pitch, for example, a player who is 18 years old and plays 90 minutes in a German Bundesliga match, gets a higher rating than a 30 year old player with the same minutes on the pitch in the Bundesliga. The higher the level of play is, the more positive is it for the GSN-Index.

We are also using the Shapley-value, which shows how much influence a player has during a match. The values which are being produced by the 4 pillar system create the GSN Index.

As mentioned above, it is the most comprehensive player scouting database in the world of football. Our system makes it possible to compare players from different leagues, countries, competitions and age groups. It also makes it possible to show over- and under-valued players which is essential for decision makers inside the clubs to                  make the right transfers. And we are able to rate over 340,000 players worldwide. We were also able to create a price-performance ratio, which shows exactly if a players is worth his money. It is a real Moneyball approach for football.

While stats and numbers can be easily translated to performance for players such as forwards, are there any strides being made to help statistically judge defensive minded players better?

Everything that happens on the field is recorded and saved nowadays including defensive actions like tackles, intercepted passes and aerial duels just to name a few. Nowadays it is also possible to see which player is responsible for goals against and similar big errors which were made. We convert these actions into a goal against value.                 Together with the data for completed passes,fouls or goals and assists and all the other offensive indicators we are able to make an exact overview of how qualified defensive minded players are.

Could you go a bit in depth about the parameters you record and evaluate while tracking a player and why specifically those parameters over any other obvious ones? Are there any parameters that you track that stand out?

We collect almost everything available for each player using every available source (Internet, newspapers, magazines, TV etc). Match reports, injury reports, statistics, our scouting reports. It is a huge effort getting all these things down to one number (the GSN Index)

Match data is the most important for us. Our algorithm uses every event during a game to calculate the +/- statistic. In particular there are key performance indicators (goals and assists to name the most obvious ones) which carry a higher value than other indicators. The rest is a company secret.

While everyone is catching up on data analytics in football, the forerunners of this have made it clear that there is a sound distinction between stats and metrics. Could you give us some insight into that?

A metric is a quantity that tells us useful information about something that is changing. In our case the GSN Index. A metric has a definition and a way of being calculated (usually) and we try to define it and calculate it in a way that is unambiguous. It follows that metrics are values that can be used to observe a trend or to make a comparison based on knowing that something was being calculated in the same way.

Statistics are just raw data items which are directly measured, and that will be used to calculate a metric.

In your experience/knowledge, which clubs across the globe are best making use of data analysis?

In our opinion, FC Midtjylland (Denmark) and also Brentford FC (England). Matthew Benham and Rasmus Ankersen are the pioneers in data analysis and have completely revolutionized their scouting departments.

AZ Alkmaar (Netherlands) is being advised by Billy Beane, which is also a step in the right direction. Also I would like to mention some clubs from the Major League Soccer like New England Revolution, Sporting Kansas City or Toronto FC. Their analysts have often worked in data dominated sports like baseball, American football and ice hockey and introduce new views and intellectual approaches. Obviously smaller clubs are trying to use data analysis to their advantage.

Is there any particular player out there who doesn’t quite get the recognition he deserves, but should be highly regarded as per the data analysis model used by you?

There a some players that are extremely underrated and others which are overrated. Pione Sisto from FC Midtjylland is a real “data monster”. Based on our GSN Index he already has reached world class level, but nobody outside of Denmark has really noticed that. There are many more such players we could mention.

We understand there’s been an inspiration from Moneyball to get this initiative going, but how difficult is it to implement regular data analysis in a continuous sport like football, as compared to a non-continuous sport like baseball?

I have heard and read this so often and honestly I can’t really understand it. Today everything that happens on the field is being analyzed and recorded. Probably it is more difficult compared to baseball but if you concentrate on the key performance indicators and evaluate them correctly and in the right context you will have no problem to use data analysis in football on a large scale. The opponents of data analysis only want to trust what they see with their own eyes with such arguments constantly being made.

What are the biggest challenges data analysis faces in being widely implement in world football?

The biggest challenge is the mentality of the responsible persons inside the clubs. Many of them have fears that digitalisation would translate to loss of jobs, which is completely false. We cannot do without the experience and the knowledge of these people. We at Global Soccer Network could not create our detailed GSN Index without our qualified and global operating scouts.

Tools like ours should not replace managers, coaches, sporting directors or scouts, it should be used to support them and make their work more efficient. Our recommendation for the clubs is simple: use new technologies and new ways of thinking and open your experienced staff to these progressive ideas.

Websites like Squawka.com and WhoScored.com provide freely available data for football enthusiasts to access. How different is the data provided by such websites compared to what football clubs use?

First of all I would like to say that Squawka.com and WhoScored.com are fantastic websites. The difference between their data and the data that clubs are using cannot be answered since every club emphasizes on different data. Naturally clubs have access to huge amounts of data which cover more leagues,countries and competitions. Squawka and Whoscored only cover the leagues and players which are relevant for their viewers. Clubs also use data from their medical section and their training sessions which are not accessible to everyone.

Though data analysis provides some insight into the sport using numbers, it still lacks in determining the physical and technical ability of an individual which the classic form of scouting still looks at. It might create a case where an individual’s physical/technical ability is disregarded purely due to a poor statistical number. Would you agree?

No! If you have experienced and knowledgeable scouts and a sophisticated value table it is possible to convert physical and technical abilities into number values. Every player in the GSN database is being scouted on a regular and independent basis by different scouts. Their reports will be converted into number values. This way we can ensure the best possible objectivity and do this on a regular basis. Through this, we have precise number values for diverse attributes.

Finally, the classic question, do stats lie?

Stats eventually can’t give a 100 percent reflection of a player, but they can get very close to it. Conveyed to football every single percent more on knowledge over players and teams is a contribution to more success.