Bringing analytics to the football transfer window
The football season came to an end after an intensive year, and we saw Manchester City claim the Premier League title, after a high-intensity title race against Liverpool. It is unbelievable to think that Liverpool would have won 25 out of the last 27 PL titles with 97 points. Luckily for them, the Reds won the Champions League, and redeemed themselves after losing last year's finale.
Football is over on the pitch for the season, but the battle for the transfer season has just begun. Money shapes major football leagues all over the world despite rare successes by lower-budget teams such as Leicester City in 2016. Teams across Europe are changing the outcome of their domestic leagues with massive transfer budgets. Transfer season is the time when teams are shaping their potential for the next season for sure. Sports analytics is often used to analyse teams on the pitch, but it is possible to bring it to the transfer season also. So, now we have a chance to analyse the upcoming transfer season using mathematical optimisation and the capabilities of SAS Viya, and that is exactly what Sertalp Cay, Operations Research Specialist at SAS, does here.
Analytics for transfer season
In football, players can move between clubs during the transfer season. If they are out of contract, clubs can acquire them and sign a contract. Otherwise, the current contract needs to be terminated before any transfer. In this case, the purchasing team pays an amount called the transfer fee.
"How should we allocate our transfer budget to maximise the benefit we gain?" This is the ultimate question that every team needs to answer. (Teams often try to answer” Which player should we get to make our fans happy?", but no one truly knows what could make fans happy.) Maximising benefit under a limited resource is known as the Knapsack problem in combinatorial optimisation. Given a set of items and their values, the Knapsack problem is to find the optimal selection of items to pack within a weight limit to maximise the total value. We can ask a similar question here: given a set of players, their values and ratings, how to choose which players to transfer to maximise total team rating within a budget limit.
Even though writing a detailed mathematical model of the problem is challenging, I will show how a simple model can be written to benefit from the capabilities of optimisation. Before we dive any further, note that we are solving a simplified problem under the following assumptions to make things easier:
- We consider only the starting lineup to measure team ratings
- Teams can transfer any player as long as their current value is paid
- We only focus on acquiring players, not selling them
- Teams use the same formation for the next year
- Players can be played only at the positions they are listed in the data set
Data
One of the most challenging stages of any analytical problem is to obtain clean data. At this point, we are lucky to have a great web resource: sofifa.com. SoFIFA has more data than we need for this problem. By using parallel web requests, we managed to create a database of 12,000 players sorted by their overall rating. The web scraper is available on GitHub and the data are available as a CSV file.
As an important side note, since these models are being run on data based on the football game FIFA, not on real player metrics, they are a better reflection of the players in the computer game, not the players in real life. However, these same concepts can be applied to real player data if you have access to it.
Model
Our aim is to maximise the sum of player ratings in the starting lineup of teams. We will solve the problem separately for each team. For each position, we filter the list of players who have a better rating than what the team currently has. Then, the increase in the total rating is used to measure the performance of the transfer for the team.
Let us define PP as the set of all players, SS as the set of team positions, and EE as the set of player-position pairs. The following parameters are used to define the problem:
- R¯jR¯j: Current rating of the player at position jj
- RiRi: Overall rating of player ii
- BB: Team budget
- ViVi: Transfer value of player ii
The main decision variable tijtij represents a binary variable, whether player ii is transferred for position jj. We also have an auxiliary variable rjrj to define the final rating for position jj in the formation.
The objective function can be written as the summation of the final ratings:
maximise∑j∈Srjmaximise∑j∈Srj
Our first constraint is the budget for the transfer. The total value of players transferred cannot exceed the team budget:
∑(i,j)∈EVi⋅tij≤B∑(i,j)∈EVi⋅tij≤B
The next constraint defines the final rating for each position. This constraint accounts for transfer player ii replacing the current player at position jj:
rj=R¯j+∑i∈P:(i,j)∈E(Ri−R¯j)⋅tij∀j∈Srj=R¯j+∑i∈P:(i,j)∈E(Ri−R¯j)⋅tij∀j∈S
The following two constraints satisfy conditions that at most one player is transferred for a given position, and the one player cannot be transferred for two different positions: ∑j∈S:(i,j)∈Etij≤1∀i∈P∑j∈S:(i,j)∈Etij≤1∀i∈P
∑i∈P:(i,j)∈Etij≤1∀j∈S∑i∈P:(i,j)∈Etij≤1∀j∈S
Python Model
We model this problem using sasoptpy, an open-source Python interface of SAS Optimisation.
m = so.Model(name='optimal_team', session=session)
rating = m.add_variables(POSITIONS, name='rating') transfer = m.add_variables(ELIG, name='transfer', vartype=so.BIN)
so.quick_sum(rating[j] for j in POSITIONS), name='total_rating', sense=so.MAX)
so.quick_sum(transfer[i, j] * value[i] for (i, j) in ELIG) <= budget, name='budget_con')
rating[j] == overall[member[j]] + so.quick_sum( transfer[i, j] * (overall[i] - overall[member[j]]) for (i, j2) in ELIG if j==j2) for j in POSITIONS), name='transfer_con')
so.quick_sum(transfer[i, j] for (i2, j) in ELIG if i==i2) <= 1 for i in PLAYERS), name='only_one_position')
so.quick_sum(transfer[i, j] for (i, j2) in ELIG if j==j2) <= 1 for j in POSITIONS), name='only_one_transfer')
|
Notice that it is very easy to model this problem using the Python interface. Our open-source optimisation modelling package sasoptpy uses the runOptmodel action under the hood, as shown in examples in the documentation. If you are familiar with PROC OPTMODEL, you can write the SAS code and run it on SAS Viya directly.
Results
We have run the optimal transfer problem for the top six teams in Premier League standings: Manchester City, Liverpool, Chelsea, Tottenham, Arsenal, Manchester United. The current team and budget information are obtained from SoFIFA at the time of execution. We filtered out all the players older than 33 years old since a majority of players reach their peak before 33 and steadily lose performance.
See the table below for a comparison between optimal transfers for each team. The positions of the transfers are given in the following figures below the table.
Team |
Old Rating |
Avg |
New Rating |
Avg |
Budget |
Money Spent |
Efficiency |
Transfers |
Manchester City |
944 |
972 |
€170.0M |
€170.0M |
0.164706 |
Giorgio Chiellini, Thiago Emiliano da Silva, Jordi Alba Ramos, C. Ronaldo dos Santos Aveiro |
||
Liverpool |
932 |
949 |
€90.0M |
€89.5M |
0.189944 |
Łukasz Piszczek, Giorgio Chiellini, Sergio Busquets Burgos |
||
Chelsea |
925 |
948 |
€95.0M |
€94.0M |
0.244681 |
Samir Handanovič, Giorgio Chiellini, Thiago Emiliano da Silva, Marco Parolo |
||
Tottenham Hotspur |
933 |
949 |
€85.0M |
€82.0M |
0.195122 |
Filipe Luís Kasmirski, Marco Parolo, Sergio Busquets Burgos |
||
Arsenal |
905 |
933 |
€92.5M |
€90.0M |
0.311111 |
Lars Bender, Giorgio Chiellini, Filipe Luís Kasmirski, Fernando Luiz Rosa |
||
Manchester United |
915 |
951 |
€175.0M |
€174.0M |
0.206897 |
César Azpilicueta Tanco, Giorgio Chiellini, Thiago Emiliano da Silva, Filipe Luís Kasmirski, Luka Modrić |
As mentioned above, we do not consider the likelihood of the transfer itself. We consider what money could buy if teams are able to get players at their current valuation.
Manchester City increases its total team rating from 944 to 972 by 28 points if they spend all of their current transfer budget of €170M. It is not surprising to see that with a rather limited budget of €90M, Liverpool can increase its total rating by 17 points, whereas Manchester United's total team rating can increase 36 points with their massive budget of €175M.
The efficiency column is calculated by dividing the change in total rating by total money spent in million euros. We expect the efficiency of the transfer to be larger when a few players have significantly lower ratings compared to the rest of the team and can be replaced with rather cheap alternatives. Arsenal has the highest efficiency and can increase its total rating 0.31 per millon euros by purchasing 4 players.
The reason why the total rating of Liverpool does not increase as much as Arsenal's despite having close transfer budgets can be explained by the variation of the player ratings. The rating of the right back (RB) is increased 9 points (from 73 to 82) with a transfer worth of €17M for Arsenal. Liverpool's lowest rating in the current team is 80. Player values tend to increase sharply as we increase the rating:
Therefore, it is clear why some teams have an advantage in the transfer season. For these teams, it is easy to improve the team by replacing the weakest player. Consider these two extremes: Manchester City has to spend €170M to improve its total rating by 28 points, whereas Arsenal increases its total rating the same amount by spending €90M only.
Here's how the old and new lineups look for each team. New transfers are coloured red while existing players are in blue:
Budget limitations
In the last problem, we will have a look at how the budget is affecting the decisions. We will be varying the transfer budget of Liverpool from €0 to €200M in increments of €10M to see how it affects the outcome.
New Rating |
Budget |
Money Spent |
Efficiency |
Transfers |
|
932 |
€0M |
€0M |
0 |
||
933 |
€10M |
€7M |
0.142857 |
Łukasz Piszczek |
|
936 |
€20M |
€20M |
0.205128 |
Łukasz Piszczek, João Miranda de Souza Filho |
|
939 |
€30M |
€24M |
0.291667 |
Thiago Emiliano da Silva |
|
942 |
€40M |
€38M |
0.263158 |
Łukasz Piszczek, Giorgio Chiellini |
|
943 |
€50M |
€48M |
0.229167 |
Lars Bender, Giorgio Chiellini |
|
945 |
€60M |
€56M |
0.234234 |
Kyle Walker, Giorgio Chiellini |
|
946 |
€70M |
€62M |
0.227642 |
César Azpilicueta Tanco, Giorgio Chiellini |
|
947 |
€80M |
€76M |
0.197368 |
Joshua Kimmich, Giorgio Chiellini |
|
949 |
€90M |
€90M |
0.189944 |
Łukasz Piszczek, Giorgio Chiellini, Sergio Busquets Burgos |
|
950 |
€100M |
€98M |
0.183673 |
Giorgio Chiellini, Luka Modrić |
|
952 |
€110M |
€107M |
0.186916 |
Kyle Walker, Giorgio Chiellini, Sergio Busquets Burgos |
|
953 |
€120M |
€116M |
0.181818 |
Kyle Walker, Giorgio Chiellini, David Josué Jiménez Silva |
|
955 |
€130M |
€129M |
0.178988 |
César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić |
|
955 |
€140M |
€129M |
0.178988 |
César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić |
|
957 |
€150M |
€149M |
0.167785 |
César Azpilicueta Tanco, Giorgio Chiellini, Fernando Luiz Rosa, Luka Modrić |
|
958 |
€160M |
€160M |
0.163009 |
César Azpilicueta Tanco, Giorgio Chiellini, Jordi Alba Ramos, David Josué Jiménez Silva |
|
959 |
€170M |
€167M |
0.162162 |
César Azpilicueta Tanco, Giorgio Chiellini, Jordi Alba Ramos, Luka Modrić |
|
961 |
€180M |
€180M |
0.161111 |
César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, Sergio Busquets Burgos |
|
962 |
€190M |
€189M |
0.159151 |
César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, David Josué Jiménez Silva |
|
962 |
€200M |
€189M |
0.159151 |
César Azpilicueta Tanco, Giorgio Chiellini, Luka Modrić, David Josué Jiménez Silva |
As seen below in detail, efficiency (total rating increase per million euros) decreases as we pay more money for a relatively lower change, as expected.
It seems Liverpool gets the best worth of its money if the Reds transfer Thiago Emiliano da Silva for CB position. Notice that efficiency converges to 0.16 total rating increase per million euros spent as we keep increasing the budget.
Increasing the potential
We have looked only at the current ratings of the players up to this point. The next problem we solve includes "potential" ratings of the new transfers. Naturally, young players have a significantly higher potential value compared to the old players. We need to replace the rating constraint as follows:
rj=P¯j+∑i∈P:(i,j)∈E(Pi−P¯j)⋅tij∀j∈Srj=P¯j+∑i∈P:(i,j)∈E(Pi−P¯j)⋅tij∀j∈S
where PiPi is the potential rating of a player, and P¯jP¯j is the potential of the current player at position jj in the team.
For players under 25 years old, the optimal solution is to replace Henderson and Matip with Melo and de Ligt for €36M and €44M, respectively. These changes increase the potential rating by 18 points:
Edit: An earlier version of the blog post compared potential ratings of new transfers to current ratings of the current team. After fixing the problem, results have changed slightly.
Edit #2: We have updated results after fixing a filtering issue with the CSV database.
Dream Team under 23
Based on reader suggestions, we had a look at the optimal squad under €150M budget. Our objective is to maximise the potential rating and create a full team. I chose 4-4-2 formation for illustration purposes. The optimal squad cost €148.3M and the potential rating is 982:
Pos |
Player |
Rating |
Potential |
Paid |
GK |
Gianluigi Donnarumma |
83 |
94 |
33.5M |
LB |
Thilo Kehrer |
79 |
87 |
16.0M |
LCB |
William Saliba |
71 |
88 |
4.2M |
RCB |
Boubacar Kamara |
75 |
88 |
10.5M |
RB |
Trent Alexander-Arnold |
80 |
89 |
19.0M |
LCM |
Rodrigo Bentancur |
78 |
90 |
18.5M |
CM |
Ricard Puig Martí |
69 |
89 |
2.1M |
RCM |
Sandro Tonali |
73 |
90 |
7.5M |
CAM |
Phil Foden |
75 |
90 |
13.5M |
LS |
Christian Kouamé |
75 |
89 |
15.0M |
RS |
Ezequiel Barco |
73 |
88 |
8.5M |
Total |
831 |
982 |
148.3M |
This concludes this brief analysis of potential transfers for top Premier League teams using Python and SAS Viya. As usual, all the code for the problem is available at GitHub.