*The sports data revolution* – Big data and the sports analytics arms race
Global football has entered an arms race in big data and modern analytics, even if most clubs do not know it yet.
I can see the future, as it has already been unfolding in North American sports leagues for 15+ years. I grew up and remain a huge baseball fan, and lived the evolution from Bill James, to Moneyball, to Statcast and now machine learning and advanced analytics.
There are a lot of football clubs who are partaking in the “xG revolution,” but that is largely comparable to the basic Moneyball transition. Yes, expected goals offer a better assessment of underlying performance, just as on-base and slugging percentages are superior to batting average. But how is this all likely to evolve in football?
I myself have an eclectic background, which ranges from investment management using complex system science, enterprise data management, advanced performance and attribution analytics, to the study of human cognition.
I do not know all that much about any of them, but I understand that they are important, and how to use them together.
This set of skills helps me to understand how a club like Atalanta appears to be optimizing their team based upon maximizing xG and xA, which has made them a very dangerous team in this season’s Champions League, and also successful at monetizing underpriced players in the transfer market.
At this point, a club like Atalanta has a tremendous advantage for now, using a stronger caliber gun, but one need only to look at MLB to see the future.
Far more powerful weaponry is on the horizon.
While still early in its evolution within football, the basic concept of valuing players via advanced metrics is common across world sports. However, as Atalanta has shown, it remains ripe for information arbitrage within football. Whether a club or franchise is optimizing their roster using a Wins Above Replacement model like that found in baseball, or xG and xA in football, this is just an “entry level” concept to compete at the highest level.
Football is on the precipice of the full adoption of big data.
Clubs like the Red Bull group and Liverpool appear to be on the frontier. Well-resourced competitors are ramping up to try and catch up, but domain expertise is in short supply. UEFA licensed coaches are a dime a dozen and good data scientists willing to work for what is likely to be lower pay can be difficult to come by!! Also, how many experts in the analytics space are going to work for a mid-table ex-Big 5 club, instead of a hedge fund in Frankfurt, Singapore, London, or New York?
Why is this cross-domain expertise important?
Having exposure across domains can introduce different perspectives into analytics and problem solving. For example, many analysts, trained primarily in football, may view the disparity in the xG differential between Liverpool and Manchester City and deduce that Liverpool enjoyed some luck this past season. That may be true, but I also believe there are other potential explanations; like including Dennis Rodman in a 1990’s Bulls team, or a volatility hedge fund in ones’ investment portfolio. Sometimes the way to optimize the long term can be counterintuitive or different than “highest xG.” Perhaps Liverpool has figured out that optimizing for lower volatility of xG creation throughout the season, to maximise points, is a better model than optimising for maximum xG, if that comes with higher variance. Manchester City can keep enjoying those 4+ xG blowouts, but perhaps they will suffer higher volatility in performance and fewer points?
The volatility and sequence of performance is important.
The next generation of xG is already on its way, using big data harvested from advanced optical cameras, which digitize every action on the pitch. How hard a shot was kicked and at what angle will be the same as exit velocity and launch angle of batted balls in baseball. Style of play of teams and players will be assessed using things like clustering models. As more advanced xG models emerge, understanding their importance will allow early adopters to have a competitive advantage.
Firms, such as Zone7,are already providing machine learning to help clubs dramatically reduce injuries and related risks.
Many more use-cases for big data and analytics will follow.
However, the real and sustainable competitive advance will accrue to those clubs who understand how to build cultures, which understand how these various domains should co-exist.
The clubs which understand that optimising for a 38-game league season may require a different model than knockout tournaments, or the transfer market, or player development, are likely to thrive in this new world.
Who will win the “Manhattan Project” of global football?
If you are at a club, what is your plan to compete in this arms race, and will ex-players with a coaching license suffice?
If you are a data or scouting platform, how are you planning for the next generation of big data and incorporating analytics models and peer benchmarking?
Data is going to be commoditized, so what is your value proposition?
Organizational behavior is simply a scaled manifestation of human behavior, and includes the same pitfalls combined with additional group dynamics. The Scottish author, Charles Mackay’s 1841 book, Extraordinary Popular Delusions and the Madness of Crowds, remains a timeless treatise on the human collective. As the son of a Scotch-Irish immigrant to the US and Celtic supporter by birth, I also see these issues through the green and white glasses of a supporter. I understand the paralysing risk aversion, which is inherent under such relentless and extreme pressures.
Cycles exist for good reasons.
Few, if any, rivalries, and operating conditions are as intense and extreme as that which exists in Glasgow.
Obsessing over the competitive challenge and threat from the “other side of the city”, is more than a century in the making, with the madness of crowds poised on the doorstep, at the grocery store, in the pubs, and just walking down the street. As an interested party analyzing from the outside, I worry that I may have seen signs that Rangers may already be embracing the new paradigm more than Celtic, as they try to stop a potential historic 10th league title in a row.
There is the obvious relationship between Gerrard and Liverpool, and Rangers largely closed the gap in underlying performance metrics during the 2019-2020 season on about 70% of Celtic’s wage bill. A 13-point final margin in the league table is the kind of thing which can seed overconfidence when the underlying data suggests things were already much closer. Cycles exist for good reasons, as organizations and people can grow complacent from success, as the perceived risks of change appear greater than the oncoming risks of the comfortable status quo.
But perception is often not reality. We are only human, after all.
The Glasgow rivalry is an extreme example of the entrenched challenges which football clubs face, as they look towards modernising and evolving into the modern age. It is not too late – this arms race is still in its relative infancy and many of the big clubs will waste massive amounts of money due to lack of cohesive cross-domain expertise and entrenched cultures and interests. Opportunity exists for ambitious smaller and mid-sized clubs to innovate and monetize that innovation- both financially and on the pitch.
Success will require vision in the future.
This evolution is, however, accelerating as big data and associated technologies explode onto the scene. Trying to catch up to the Atalanta or one’s cross city rival will not be enough, as success will require vision into the future.
I just gave you a window into it, so I suggest you get started.
To read all about our work with start-ups and scale-ups, click here.
To find out what we do in change management, see here.
Here you can check out our content development work.
Discover our corporate learning programmes.
If you are interested in our own story, check us out here.