Elo rating for ad hoc teams

All articles in the series Expanded Elo rating:

  1. Expanding the Elo rating system
  2. Elo rating for unfair games
  3. Elo rating for ad hoc teams

One final feature my expanded Elo rating needs (or at least the last I can think of) is the ability to deal with ad hoc teams.

By “ad hoc teams”, I mean teams of individual players with their own ratings that are on the same team for a specific game, but don’t generally stay as a team (established teams that always play together should be treated as their own “players” with their own rating).

This is not a common requirement, but the specific use case I had was an office ping pong table. Some times people would play singles and some times they would play doubles, but with no really established teams.

Necessary features

Firstly, the two key ratings operations need to work:

  • Estimate the result of an unplayed game
  • Updating ratings after an actual result

And all the existing features should be supported:

  • Two or more teams
  • Unfair games
  • Ties

Additionally, it should support teams of arbitrary (and mixed) sizes, including teams of size one. This brings us to one of our first less-obvious requirements - since this is expanding an existing system, it should be compatible with the existing system where it overlaps. So the following additional requirement makes sense:

  • Teams of one should give the same result as just using individuals

Simple solution

Just like with unfair games in which an adjusted rating is calculated first, and then used in the rest of the algorithm, and adjusted rating should be calculated for a team. This would trivially allow all the existing features to just work.

The most obvious way to calculate such a rating would be a simple arithmetic mean of all the players. This would definitely support our key requirement, but would it produce meaningful results?

At this point I think simplicity has to win out over sophistication. The most general solution would allow players to be weighted on each team (perhaps different roles in a team have different impacts on the result) but I think those situations are more likely to be handled with a per team rating.