In the past, to know which team was going to win the Football World Cup, you had to consult crystal ball clairvoyants, read the future in tea leaves or hope that Paul the Octopus would tell us what was going to happen. Today, data science offers a more reliable alternative. As part of a team of statisticians, I helped train a machine learning algorithm to predict the most likely tournament scenario.
Probabilistic forecasts and loaded dice
The algorithm we designed works in two steps. First, sophisticated statistical models are combined with bookmaker analysis and transfer market data, to assess the strength of all teams and their players. Second, a machine learning algorithm determines the best way to combine these estimates with other team information.
This approach produces a probabilistic forecast for each possible match in the tournament. You can imagine it like a pair of loaded dice: instead of presenting the numbers 1 to 6 with identical probability, these dice assign different probabilities to the number of goals each team is likely to score.
For example, according to our forecast, Mexico’s die produces an average of 1.9 goals in the opening match of the 2026 World Cup, while that of its opponent, South Africa, produces only 0.7. However, this does not mean that Mexico will definitely win. A Mexican victory simply constitutes the most likely outcome1with a probability of 65%. A draw is less likely (21%), while a South Africa victory is the least likely scenario (14%).
Spain, England, France, Germany, Portugal, Argentina?
By using different pairs of loaded dice, it is possible to simulate the outcome of each World Cup match. We took into account the official tournament draw, as well as all FIFA rules, including overtime and penalty shootouts. We then carried out 100,000 simulations to determine the most likely scenario of the competition.
The results show that Spain are favorites for the title, with a winning probability of 14.5%. It is closely followed by England and France, both at 12.4%, then by Germany with 11.2%.
Due to the expansion of the tournament – this World Cup brings together forty-eight teams and has five direct elimination rounds – the gaps between the favorites remain relatively small. Portugal and Argentina also have solid chances of winning the trophy, with 8.9% and 8.2% probability of final victory respectively.
For their part, the United States has a good chance of reaching the round of 16: 78%. This is the highest probability in their group, which includes three other teams (Australia, Türkiye and Paraguay). On the other hand, during the knockout phase, where each match is decisive, the chances of the American team to continue their journey diminish quite quickly. The probability of seeing the host country lift the trophy in the final played on July 19 at MetLife Stadium, near New York, is only 1%.
Behind the scenes of the model
Our machine learning algorithm and resulting simulations rely on a mix of data, expertise and statistical models. First of all, all international matches played over the last eight years serve as the basis for a retrospective estimation of the teams’ level. Then, a prospective estimate is established based on the odds offered by different international bookmakers, which reflect their expert assessment of the upcoming tournament.
Third, individual player evaluations are established based on their contribution to goals scored both for club and national team. Finally, the current quality of players and their future potential are understood through their estimated market value. This data is available on the Transfermarkt website, which relies on an approach based on collective intelligence to estimate market values which, by nature, remain unknown.
These four variables are then combined with a wide range of other relevant indicators describing the current state of the different teams and the countries they represent. This includes elements specific to the selections, such as their FIFA ranking or the number of players who reached the semi-finals of the Champions League this year. We also incorporated country-specific socio-economic factors, such as gross domestic product (GDP) per capita.
To determine whether and to what extent these variables actually influence the results of a World Cup, we used a machine learning algorithm. Specifically, we used what is called a “random forest” (random forest), a model composed of a large number of decision trees, each trained on slightly different subsets of the data.
The algorithm was trained on all matches played in major international football competitions since the 2006 World Cup. It learns to relate the level of teams, the market value of their players and other factors to the number of goals scored in World Cup matches. It is this information that allows us to “pipe the dice” used in our simulations.
How reliable?
This is not the first time that our team, consisting of Andreas Groll, Rouven Michels and their colleagues from the Technical University of Dortmund in Germany, as well as Lars Magnus Hvattum from Molde University College in Norway, Gunther Schauberger from the Technical University of Munich and me, have collaborated to predict the outcome of a World Cup.
During the 2019 Women’s World Cup, we correctly named the United States as the eventual winners. In the 2023 Women’s World Cup and the 2022 Men’s World Cup, the crowned teams – Spain and Argentina – were not our favorites, even if our model identified them as serious title contenders.
The main lesson is that a forecast is based on probabilities. Our program does not claim to predict the winner with absolute certainty. But perhaps it has a better chance of success than an eight-armed mollusc.
![]()
