In my previous post, I described an extension to Elo that could handle multiple players. The next restriction to overcome is that Elo assumed players of equal skill have an equal chance of winning.
In most games, players don’t actually have an equal chance of winning a single game. Chess overcomes this by having a match consist of several individual games with players switching who goes first. This is a good solution for two player games but gets awkward for multiplayer games. It is also inconvenient if having a multi-game match is undesirable for any reason.
Elo is fundamentally designed to handle players of different skill levels and produce a probability of them winning. The approach is therefore to determine an adjusted rating for someone such that the probability of winning is as desired.
The formula to work out an Elo expected score is:
In our case we have an expected score (the win probability) and want to work out a rating difference, so we can just rearrange to get:
We then subtract this adjustment from the player’s actual rating to get their effective rating to use in the rest of the Elo calculations.
The reason we subtract it is that for players A and B, the rating difference is calculated as RB - RA which is the opposite way round to what we want.
Here are the rating adjustments for some sample win probabilities:
|Win probability||Rating adjustment|
If you try to calculate the adjustment for a win probability of
1, you get
This means, for example, if you beat someone of an equal rating at a game you only have a 10% chance of winning, it’s the same as beating someone rated 382 points higher at a fair game.
Putting it all together
After explaining the theory of how to expand an Elo rating to support all the variations in Tic-tac-toe Collection, it’s time to put it into practice.
While writing this I’ve also been working on an implementation.
As well as adding it to Tic-tac-toe Collection, I’m planning on creating some kind of site and/or app just to work out Elo ratings for various results (and hopefully even to maintain ratings for players in any kind of competition you might want to run).