Single Comment

Including Piece Values on Rules Pages[Subject Thread] [Add Response]

H. G. Muller wrote on Tue, Mar 12 11:29 AM UTC in reply to Kevin Pacey from 12:26 AM:

Systematic errors can never be estimated. There is no limit to how inaccurate a method of measurement can be. The only recourse is to be sure you design the method as good as you can. But what you mention is a statistical error, not a systematic one. Of course the weaker side can win, in any match with a finite number of games, by a fluke. Conventional statistics tells you how large the probability for that is. The probability to be off by 2 standard deviations or more (in either direction) is about 5%. To be off by 3 about 0.27%. It quickly tails off, but to make the standard deviation twice smaller you need 4 times as many games.

So it depends on how much weaker the weak side is. To demonstrate with only a one-in-a-million probablity for a fluke that a Queen is stronger than a Pawn wouldn't require very many games. The 20-0 result that you would almost certainly get would only have a one-in-a-million probability when the Queen was not better, but equal. OTOH, to show that a certain material imbalance provides a 1% better result with 95% 'confidence' (i.e. only 5% chance it is a fluke), you will need 6400 games (40%/sqrt(6400) = 40%/80 = 0.5%, so a 51% outcome is two standard deviations away from equality).

My aim is usually to determine piece values with a standard deviation of about 0.1 Pawn. Since Pawn odds typically causes a 65-70% result, 0.1 Pawn would result in 1.5-2% excess score, and 400-700 games would achieve that (40%/sqrt(400) = 2%). I consider it questionable whether it makes sense to strive for more accurate values, because piece values in themselves are already averages over various material combinations, and the actual material that is present might affect them by more than 0.1 Pawn.

I am not sure what you want to say in your first paragraph. You still argue like there would be an 'absolute truth' in piece values. But there isn't. The only thing that is absolute is the distance to checkmate. Piece values are only a heuristic used by fallible players who cannot calculate far enough ahead to see the checkmate. (Together with other heuristics for judging positional aspects.) If the checkmate is beyond your horizon you go for the material you think is strongest (i.e. gives the best prospects for winning), and hope for the best. If material gain is beyond the horizon you go for the position with the current material that you consider best. Above a certain level of play piece values become meaningless, and positions will be judged by other criteria than what material is present. And below that level they cannot be 'absolute truth', because it is not the ultimate level.

I never claimed that statistics of computer-generated games provide uncontestable proof of piece values. But they provide evidence. If a program that human players rated around 2000 Elo have difficulty beating in orthodox Chess hardly does better with a Chancellor as Queen replacement than as with an Archbishop (say 54%), it seems very unlikely that the Archbishop would be two Pawns less valuable. As that same engine would have very little trouble to convert other uncontested 2-Pawn advantages (such as R vs N, or 2N+P vs R) to a 90% score. It would require pretty strong evidence to the contrary to dismiss that as irrelevant, plus an explanation for why the program systematically blundered that advantage away. But there doesn't seem to be any such evidence at all. That a high-rated player thinks it is different is not evidence, especially if the rating is only based on games where neither A nor C participate. That the average number of moves on an empty board of A is smaller than that of C is not evidence, as it was never proven that piece values only depend on average mobility. (And counter examples on which everyone would agree can easily be given.) That A is a compound of pieces that are known to be weaker than the pieces C is a compound of is no evidence, as it was never proven that the value of a piece is equal to the sum of its compounds. (The Queen is an accepted counter-example.)

As to the draw margin: I usually took that as 1.5 Pawn, but that is very close to 4/3, and my only reason to pick it was that it is somewhere between 1 and 2 Pawns. And advantage of 1 Pawn is often not enough, 2 usually is. But 'decisive' is a relative notion. At lower levels games with a two-Pawn advantage can still be lost. GMs would probably not stand much chance against Stockfish if they were allowed to start with a two-Pawn advantage. At high levels a Pawn advantage was reported by Kaufmann to be equivalent to a 200-Elo rating advantage.