Single Comment

Knightmate. Win by mating the knight. (8x8, Cells: 64) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Mon, Aug 16, 2010 05:29 PM UTC:

> Anyway, your rating evaluation builds on matches between computers, 
> which won't work because Zillions can never accumulate rating by 
> beating weaker opponents as there aren't any.

This is not true. Even against stronger opponents you will win a game now and then. (e.g. against a 280 Elo stronger opponent you should score 16%). 

It is more reliable to also test against weaker opponents, of course. But one can create artificially weaker opponents by giving time odds. When I have only few engines that play a variant I make a tourney where each engine participates in several versions, one unhandicapped, the others with factors 3, 10, 30 and 100 time odds. (For the weakest engines I don't have to go that far.) Then you play a tourney and caluclate the performance ratings (not incrementally, but in one fitting procedure, e.g. with EloStat or BayesElo). The unhandicapped strongest and most handicapped engines might have (slightly) distorted ratings because of the one-sided testing, but you simply take a set that usesd a Time Control somewhere in the middle, that all have opponenets on either side.

But to get back to the main point: you do seem to admit that the current ChessV is stronger than Zillions. But I know for a fact that Fairy-Max is (somewhat) stronger than ChessV, and that Joker derivatives (and SMIRF, in Capablanca) are again some 400 Elo points above that. Unlike Zillions, all the other engines support a universal protocol and can be automatically played against each other, so that I have hundreds of games between them. So if you assign Zillions 2300, I am really curious what ratings you would assign to Fairy-Max, SMIRF and Joker. Note that in norml Chess, Joker is some 700 Elo behind the top engines, like Rybka and its clones. It seems you quickly would climb to unrealistically high values.