[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments by HGMuller

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 02:59 AM EDT:

Ha, finally my registration could be processed manually, as all automatic procedures consistently failed. So this thread is now also open to me for posting. Let me start with some remarks to the ongoing discussion. * I tried Reinhards 4A vs 8N setup. In a 100-game match of 40/1' games with Joker80, the Knights are crushed by the Archbishops 80-20. So although in principle I agree with Reinhard that such extreme tests with setups that make the environment for the pieces very alien compared to normal Chess could be unreliable, I certainly would not take it for granted that his claim that 8 Knights beat 4 Archbishops is actually true. Possible reasons for the discrepancy could be: 1) Reinhard did not base his conclusion on enough games. In my experience using anything less than 100 games is equivalent to making the decision by throwing dice. It often happens that after 30 games the side that is leading by 60% will eventually lose by 45%. 2) Smirf does not handle the Archbishop well, because it is programmed to underestimate its value, and is prepared to trade it to easily for two Knights to avoid or postpone a Pawn loss, while Joker80 just gives the Pawn and saves its Archbishops until he can get 3 Knights for it. 3) The shorter time control used does restrict search depth such that this does not allow Joker80 to recognize some higher, unnatural strategy (which has no parallel in normal Chess) where all Knights can be kept defending each other multiple times, because they all have identical moves, and so judges the pieces more on their tactical merits that would be relevant for normal Chess. * The arguments Reinhard gives against more realistic 'asymmetrical platesting': | Let me point to a repeatedly written detail: if a piece will be | captured, then not only its average piece exchange value is taken | from the material balance, but also its positional influence from | the final detail evaluation. Thus it is impossible to create | 'balanced' different armies by simply manipulating their pure material | balance to become nearly equal - their positional influences probably | would not be balanced as need be. seem invalid. For one, all of us are good enough Chess players that we can recognize for ourselves in the initial setup we use for playtesting if the Archbishop or Knight or whatever piece is part of the imbalance is an exceptionally strong or poor one, or just an average one. So we don't put a white Knight on e5 defended by Pf4, while the black d- and f-pawn already passed it, and we don't put it on a1 with white pawns on b3, c2 and black pawns on b4, c3. In particular, I always test from opening positions, where non of the pieces is on a particularly good square, but they can be easily developed, as the opponent does not inderdict access to any of the good squares either. So after a few opening moves, the pieces get to places that, almost by definition, are the average where you can get them. Secondly, when setting up the position, we get the evaluation of the engine for that position telling us if the engine does consider one of the sides highly favored positionally (by taking the difference between the engine evaluation and the known material difference for the piece values we know the engine is using). Although I would trust this less than my own judgement, it can be used as additional confirmation. Like Derek says, averaging over many positions (like I always do: all my matches are played starting from 432 different CRC opening positions) will tend to have avery piece on the average in an average position. If a certain piece, like A, would always have a +200cP 'positional' contribution, (e.g. calculated as its contribution to mobility) no matter where you put it, then that contribution is not positional at all, but a hidden part of the piece value. Positional contributions should average to zero, when averaged over all plausible positions. Furthermore, in Chess positional contributions are usually small compared to material ones, if they do not have to do with King safety or advanced passers. And none of the latter play a role in the opening positions I use. * Symettrical playtesting between engines with different piece-value sets is known to be a notoriously unreliable method. Dozens of people have reported trying it, often with quite advanced algorithms to step through search space (e.g. genetic algorithms, or annealing). The result was always the same: in the end (sometimes after months of testing) they obtained piece values that, when pitted against the original hand-tuned values, would consistently lose. The reason is most likely that the method works in principle, but requires too many games in practice. Derek mentioned before, that if two engines value certain piece combinations differently, they often exchange them for each other, creating a material imbalance, which then affects their winning chances. Well, 'often' is not the same as 'always'. For very large errors, like putting AR the undervaluation of A only can lead to much more complicated bad trades, as you have to have at least two pieces for A. The probability that this occurs is far smaller, and only 10-20% of the games will see such a trade. Now the problem is that the games in which the bad trades do NOT happen will not be affected by the wrong piece value. So this subset of games will have a 50-50 outcome, pushing the outcome of the total score average towards 50%. If A vs R+N gives you 60% winning chance,(so 10% excess), if it is the only bad trade that happens (because you set A slightly under 8), and happens in only 20% of the cases, the total effect you would see (and on which you would have to conclude the A value is suboptimal) would be 52%. But the 80% of games that did not contribute to learning anything about A value, because in the end A was traded for A, will contribute to the statistical noise! To recognize a 2% excess score in stead of a 10% excess score you need a 5 times lower statistical error. But statistical errors only decrease as the SQUARE ROOT of the number of games. So to get it down a factor 5, you need 25 times as many games. You could not conclude anything before you had 2500 games! Symmetrical playtesting MIGHT work if you first discard all the games that traded A for A (to eliminate the noise they produce, and they can't say anything about the correctness of the A value), and make sure you have about 100 games left. Otherwise, the result will be garbage.

[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 03:14 AM EDT:

Well, this is exactly the kind of games I played. Plus that I do not play
from a single position, but shuffle the pieces in the backrank to have 432
different initial positions. This to minimize the risk that I am putting to
much emphasis on a position that inadvertantly contained hidden tactics,
biasing the score.

If there are such positions, sometimes one side should be favored,
sometimes the other, and the effect will average out. If the posession of
one piece as opposed to another (or a set of others) would systematically
lead to more tactics in favor of that piece even from an opeing position,
I think that is a valid contribution to the piece value of such a piece.

Of course I did all games on a 10x8 board, as I wanted to have piece
values for Capablanca Chess. If I were to do it on 8x8, I would use a
setup like yours, but with Q next to K for both sides, to make the piece
mix to which it is exposed even more natural. (Of course there always is a
problem introducing A and C in 8x8 Chess that they don't fit naturally on
the board, s you have to kick out some other pieces at their expense. But
you don't have to kick out the same pieces all the time. It is perfectly
valid to sometimes give both sides an A on d1/d8, some times a Q, some
times a C, or sometimes Q+C at the expense of a Bishop. The total mix of
pieces in the game should be N AVERAGE close to what it will be in real
games, or you cannot be sure that results are meaningful.

I never went more extreme than giving one side two A and the other two C
(or similarly AA vs QQ and CC vs QQ), by substituting A->C for one side of
the Capablanca array, and C-> for the other. For the total list of
combinations I tried, see:
http://z13.invisionfree.com/Gothic_Chess_Forum/index.php?showtopic=389&st=1
(For clarity: the pieces mentioned in that list where in general the
pieces I deleted from the opening array.)

H. G. Muller wrote on Sat, May 3, 2008 04:58 AM EDT:

For completeness, I listed the combinations that are relevant for
comparison of the Q, A and C value here:

Q-BNN    (172+ 186- 75=) 48.4%
Q-BBN    (143+ 235- 54=) 39.4%
C-BNN    (130+ 231- 71=) 38.3%
C-BBN    ( 39+  86- 11=) 32.7%
A-BNN    (124+ 241- 67=) 36.5%
RR-Q     (174+ 194- 64=) 47.7%
RR-CP    (131+ 227- 74=) 38.9%
RR-AP    (166+ 199- 67=) 46.2%
RR-C     (188+ 170- 74=) 52.1%
RR-A     (197+ 162- 73=) 54.1%
QQ-CC    (131+ 55-  30=) 67.6%
QQ-AA    (117+ 60-  39=) 63.2%
QQ-CCP   (112+ 72-  32=) 59.3%
QQ-AAP   (112+ 78-  26=) 57.9%
CC-AA    (102+ 89-  25=) 53.0%
Q-CP     (164+ 191- 77=) 46.9%
Q-AP     (191+ 186- 55=) 50.6%
Q-C      (215+ 161- 56=) 56.3%
Q-A      (219+ 138- 75=) 59.4%
C-A      (187+ 182- 63=) 50.6%
A-RN     (261+ 122- 49=) 66.1%
C-RN     (273+ 101- 58=) 69.9%
A-RNP    (247+ 121- 64=) 64.6%
C-RNP    (242+ 144- 46=) 61.3%

So it is not only that C and A has been tried against each other, alone or
in pairs. They have also been tested against Q (alonme or in pairs, with or
without pawn odds for the latter), BNN, RR and RN (with or without Pawn
odds). On the average, C does only slightly better than A, on the average
2-3%, where giving Pawn odds makes a difference of ~12%. 

The A-RNP result seems a statistical fluke, as it is almost the same as
A-RN, while the extra Pawn obviously should help, and the A even does
better there than C-RNP. Note the statistical error in 432 games is 2.2%,
so that 32% of the results (so eight) should be off by more than 2.2%, and
5% (1 or 2) should be off by more than 4.5%. And A-RNP is most likely to be
that latter one.

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, May 3, 2008 05:15 AM EDT:

Note that a Nash equilibrium in a symmetric zero-sum game must be the globally optimum strategy. If it weren't, the player scoring negative could unilaterally change its strategy to be the same as his opponent applies, and by symmetry then raise his score to 0, showing that the earlier situation could not heave been a Nash equilibrium.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 06:14 AM EDT:

Sorry my original long post got lost.

As this is not a position where you can expect piece values to work, and
my computers are actually engaged in useful work, why don't YOU set it
up?

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, May 3, 2008 06:34 AM EDT:

As piece values are only useful as strategic guidelines for quiet positions, they cannot be sensitive to who has the move. A position where it matters who has the move is by definition nont quiet, as one ply later that characteristic will have essentially changed. So at the level of piece-value strategies, Chess is a perfectly symmetric game.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 06:36 AM EDT:

It seems to me that that is bad strategy. If you fail you should keep
trying until you succeed. Only when you succeed you can stop trying...

Aberg variation of Capablanca's Chess. Different setup and castling rules. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, May 3, 2008 11:22 AM EDT:

Sure, this is what people do and have done for ages. It is well known that the advantage of having the move is worth 1/6 of a Pawn, (corresponding in normal Chess to a white score of 53-54%) and that, by inference, wasting a full move is equivalent to 1/3 of a Pawn.

But the point is that this does not alter the piece values. It just adds to them, like every positional advantage adds to them. In my test the advantage of having the lead move is neutralized by playing every position both with white to move and black to move.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 12:15 PM EDT:

To summarize the state of affairs, we now seem to have sets of piece
values for Capablanca Chess by:

Hans Aberg (1)
Larry Kaufman (1)
Reinhard Scharnagl (2)
H.G. Muller (3)
Derek Nalls (4)

1) Educated guessing based on known 8x8 piece values and assumptions on
synergy values of compound pieces
2) Based on board-averaged piece mobilities
3) Obtained as best-fit of computer-computer games with material
imbalance
4) Based on mobilities and more complex arguments, fitted to experimental
results ('playtesting')

I think we can safely dismiss method (1) as unreliable, as the (clearly
stated) assumptions on which they are based were never tested in any way,
and appear to be invalid.
Method (3) and (4) now are basically in agreement. 
Method (2) produces substantially different results for the Archbishop.

One problem I see with method (2) is that plain averaging over the board
does not seem to be the relevant thing to do, and even inconsitent at
places: suppose we apply it to a piece that has no moves when standing in
a corner, the corner squares would suppress the mobility. If otoh, the
same piece would not be allowed to move into the corner at all, the
average would be taken over the part of the board that it could access
(like for the Bishop), and would be higher than for the piece that could
go there, but not leave it (if there weren't too many moves to step into
the corner). While the latter is clearly upward compatible, and thus must
be worth more.

The moral lesson is that a piece that has very low mobility on certain
squares, does not lose as much value because of that as the averaging
suggest, as in practice you will avoid putting the piece there. The SMIRF
theory doe not take that into account at all.

Focussing on mobility only also makes you overlook disastrous handicaps a
certain combination of moves can have. A piece that has two forward
diagonal moves and one forward orthogonal (fFfW in Betza notation) has
exactly the same mobility as that with forward diagonal and backward
orthogonal moves (fFbW). But the former is restricted to a small (and ever
smaller) part of the board, while the latter can reach every point from
every other point. My guess is that the latter piece would be worth much
more than the former, although in general forward moves are worth more
than backward moves. (So fWbF should be worth less than fFbW.) But I have
not tested any of this yet.

I am not sure how much of the agreement between (3) and (4) can be
ascribed to the playtesting, and how much to the theoretical arguments:
the playtesting methods and results are not extensively published and not
open to verification, and it is not clear how well the theoretical
arguments are able to PREdict piece values rather than POSTdict them. IMO
it is not possible to make an all encompasisng theory with just 4 or 6
empirical piece values as input, as any elaborate theory will have many
more than 6 adjustable parameters.

So I think it is crucial to get accurate piece values for more different
pieces. One keystone piece could be the Lion. This is can make all leaps
to targets in a 5x5 square centered on it (and is thus a compound of Ferz,
Wazir, Alfil, Dabbabah and Knight). This piece seems to be 1.25 Pawn
stronger than a Queen (1075 on my scale). This reveals a very interesting
approximate law for piece values of short-range leapers with N moves:

value = (30+5/8*N)*N

For N=8 this would produce 280, and indeed the pieces I tested fall in the
range 265 (Commoner) to 300 (Knight), with FA (Modern Elephant), WD (Modern
Dabbabah) and FD in between. For N=16 we get 640, and I found WDN
(Minister) = 625 and FAN (High Priestess) and FAWD (Sliding General) 650.
And for the Lion, with N=24, the formula predicts 1080.

My interpretation is that adding moves to a piece does not only add the
value of the move itself (as described by the second factor, N), but also
increases the value of all pre-existing moves, by allowing the piece to
better manouevre in place for aiming them at the enemy. I would therefore
expect it is mainly the captures that contribute to the second factor,
while the non-captures contribute to the first factor.

The first refinement I want to make is to disable all Lion moves one at a
time, as captures or as non-captures, to see how much that move
contributes to the total strength. The simple counting (as expressed by
the appearence of N in the formula) can then be replaced by a weighted
counting, the weights expressing the relative importance of the moves. (So
that forward captures might be given a much bigger weight than forward
non-captures, or backward captures along a similar jump.) This will
require a lot of high-precision testing, though.

H. G. Muller wrote on Sat, May 3, 2008 12:21 PM EDT:

Oh Yes, I forgot about:

[name removed] (5)

5) Based on safe checking

I am not sure that safe checking is of any relevance. Most games are not
won by checkmating the opponent King in an equal-material position, but
by
annihilating the opponent's forces. So mainly by threatening Pawns and
other Pieces, not Kings. A problem is that safe checking seems to predict
zero value for pieces like Ferz, Wazir and Commoner, while the latter is
not that much weaker than the Knight. (And, averaged over all game
stages,
might even be stronger than a Knight.) This directly seems to falsify the
method.

[The above has been edited to remove a name and/or site reference. It is
the policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. -D. Howe]

H. G. Muller wrote on Sat, May 3, 2008 12:46 PM EDT:

Reinhard, why do you attach such importance to the 4A-9N position. I think
that example is totally meaningless. If it would prove anything, it is
that you cannot get the value of 9 Knights by taking 9 times the Knight
value. It will prove _nothing_ about the Archbishop value. Chancellor and
Queen will encounter exactly the same problems facing an army of 9
Knights.

The problem is that there is a positional bonus for identical pieces
defending each other. This is well known (e.g. connected Rooks). Problem
is that such pair interactions grow as the square of the number of pieces,
and thus start to dominate the total evaluation if the number of identical
pieces gets extremely high (as it never will in real games).

Pieces like A, C and Q (or in particular the highest-valued pieces on the
board) will not get such bonuses, as the bonus is asociated with the
safety of mutually defending each other, and tactical security in case the
piece is traded, because the recapture then replaces it by an identical
one, preserving all defensive moves it had. In absence of equal or higher
pieces, defending pieces is a useless exercise, as recapture will not
offer compensation. If you are attacked, you will have to withdraw. So the
mutual-defence bonus is also dependent on the piece makeup of the opponent,
and is zero for Archbishops when the opponent only has Knights, and very
high for Knights when the opponent has only Archbishops.

If you want to playtest material imbalances, the positional value of the
position has to be as equal as possible. The 4A-9N position violates that
requirement to an extreme extent. It thus cannot tell us anything about
piece values. Just like deleting the white Queen and all 8 black Pawns
cannot tell us anything about the value of Q vs P.

H. G. Muller wrote on Sat, May 3, 2008 01:18 PM EDT:

Well, Reinhard, there could be many explanations for the 'surprising'
strength of an all-Knight army, and we could speculate forever on it. But
it would only mean anything if we could actually find ways to test it. I
think the mutual defence is a real effect, and I expect an army of all
different 8-target leapers to do significantly worse than an army of all
Knights, even though all 8-target leapers are almost equally strong. But
it would have to be tested.

Defending each other for Archbishops is useless (in the absence of opponet
Q, C or A), as defending Archbishop in the face of Knight attacks is of
zero use. So the factthey can do it is not worth anything.

Nevertheless, the Archbishops do not do so bad as you want to make us
believe, and I think they still would have a fighting chance against 9
Knights. So perhaps I will run this tests (on the Battle-of-the-Goths
port, so that everyone can watch) if I have nothing better to do. But
currently I have more important and urgent things to do on my Chess PC. I
have a great idea for a search enhancement in Joker, and would like to
implement and test it before ICT8.

The Pizza Kings. An experimental army for Chess with Different Armies, with lots of calories.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, May 3, 2008 01:24 PM EDT:

I thought this piece (W+D+A+F+N) was called a Lion, but it seems I was misinformed. I playtested this piece in a Capablanca Chess environment, and it is not that excessively strong. It is about 1.25 pawn stronger than a Queen, 1075 on my scale (on 10x8 board).

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 3, 2008 02:32 PM EDT:

Well, I got that from the beginning. But the problem is not that the A
cannot be defended. It is strong and mobile enough to care for itself. The
problem is that the Knights cannot be threatened (by A), because they all
defend each other, and can do so multiple times. So you can build a
cluster of Knights that is totally unassailable. That would be much more
difficult for a collection of all different pieces. This will be likely to
have always some weak spots, which the extremely agile Archbishops then
seek out and attack that point with deadly precision.

But I don't see this as a fundamental problem of pitting different armies
against each other. After an unequal trade, andy Chess game becomes a game
between different armies. But to define piece values that can be helpful
to win games, it is only important to test positions that could occur in
chames, or at least are not fundamentally different in character from what
you might encounter in games. and the 4A-9N position definitely does not
qualify as such.

I think this is valid critisism against what Derek has done (testing
super-pieces only against each other, without any lighter pieces being
present), but has no bearing on what I have done. I never went further
than playing each side with two copies of the same super-piece, by
replacing another super-piece (which was then absent in that army). This
is slightly unnatural, but I don't expect it to lead to qualitatively
different games, as the super-pieces are similar in value and mobility.
And unlike super-pieces share already some moves, so like and unlike
super-pieces can cooperate in very similar ways (e.g. forming batteries).
It did not essentially change the distribution of piece values, as all
lower pieces were present in normal copy numbers.

I understand that Derek likes to magnify the effect by playing several
copies of the piece under test, but perhaps using 8 or 9 is overdoing it.
To test a difference in piece value as large as 200cP, 3 copies should be
more than enough: This can still be done in a reasonably realistic mix of
pieces, e.g. replacing Q and C on one side by A, and on the other side by
Q and A by C, so that you play 3C vs 3A, and then give additional Knight
odds to the Chancellors. This would predict about +3 for the Chancellors
with the SMIRF piece values, and -2.25 according to my values. Both
imbalances are large enough to cause 80-90% win percentages, so that just
a few games should make it obvious which value is very wrong.

H. G. Muller wrote on Sat, May 3, 2008 02:42 PM EDT:

Derek Nalls:
| Given enough years (working with only one server), this quantity of 
| well-played games may eventually become adequate.

I never found any effect of the time control on the scores I measure for
some material imbalance. Within statistical error, the combinations I
tries produced the same score at 40/15', 40/20', 40/30', 40/40',
40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I
did not consider it worth doing just to prve that it was a waste of
time...

The way I see it, piece-values are a quantitative measure for the amount
of control that a piece contributes to steering the game tree in the
direction of the desired evaluation. He who has more control, can
systematically force the PV in the direction of better and better
evaluation (for him). This is a strictly local property of the tree. The
only advantage of deeper searches is that you average out this control
(which highly fluctuates on a ply-by play basis) over more ply. But in
playing the game, you average over all plies anyway.

H. G. Muller wrote on Sat, May 3, 2008 04:18 PM EDT:

| And by that this would create just the problem I have tried to 
| demonstrate. The three Chancellors could impossibly be covered, 
| thus disabling their potential to risk their own existence by 
| entering squares already influenced by the opponent's side.

You make it sound like it is a disadvantage to have a stronger piece,
because it cannot go on squares attacked by the weaker piece. To a certain
extent this is true, if the difference in capabilities is not very large.
Then you might be better off ignoring the difference in some cases, as
respecting the difference would actually deteriorate the value of the
stronger piece to the point where it was weaker than the weak piece. (For
this reason I set the B and N value in my 1980 Chess program Usurpator to
exactly the same value.) But if the difference between the pieces is
large, then the fact that the stronger one can be interdicted by the
weaker one is simply an integral part of its piece value.

And IMO this is not the reason the 4A-9N example is so biased. The problem
there is that the pieces of one side are all worth more than TWICE that of
the other. Rooks against Knights would not have the same problem, as they
could still engage in R vs 2N trades, capturing a singly defended Knight,
in a normal exchange on a single square. But 3 vs 1 trades are almost
impossible to enforce, and require very special tactics.

It is easy enough to verify by playtesting that playing CCC vs AAA (as
substitutes for the normal super-pieces) will simply produce 3 times the
score excess of playing a normal setup with on one side a C deleted, and
at the other an A. The A side will still have only a single A to harrass
every C. Most squares on enemy territory will be covered by R, B, N or P
anyway, in addition to A, so the C could not go there anyway. And it is
not true that anything defended by A would be immune to capture by C, as
A+anything > C (and even 2A+anything > 2C. So defending by A will not
exempt the opponent from defending as many times as there is attack, by
using A as defenders. And if there was one other piece amongst the
defenders, the C had no chance anyway. 

The effect you point out does not nearly occur as easily as you think.
And, as you can see, only 5 of my different armies did have duplicated
superpieces. All the other armies where just what you would get if you
traded the mentioned pieces, thus detecting if such a trade would enhance
or deteriorate your winning chances or not.

H. G. Muller wrote on Sat, May 3, 2008 05:31 PM EDT:

Reinhard, if I understand you correct, what you basically want to introduce
in the evaluation is terms of the type w_ij*N_i*N_j, where N_i is the
number of pieces of type i of one side, and N_j is the number of pieces of
type j of the opponent, and w_ij is an tunable weight.

So that, if type i = A and type j = N, a negative w_ij would describe a
reduction of the value of each Archbishop by the presence of the enemy
Knights, through the interdiction effect. Such a term would for instance
provide an incentive to trade A in a QA vs ABNN for the QA side, as his A
is suppressed in value by the presence of the enemy N (and B), while the
opponent's A would not be similarly suppressed by our Q. On the contrary,
our Q value would be suppressed by the the opponent's A as well, so
trading A also benefits him there.

I guess it should be easy enough to measure if terms of this form have
significant values, by playing Q-BNN imbalances in the presence of 0, 1
and 2 Archbishops, and deducing from the score whose Archbishops are worth
more (i.e. add more winning probability). And similarly for 0, 1, 2
Chancellors each, or extra Queens. And then the same thing with a Q-RR
imbalance, to measure the effect of Rooks on the value of A, C or Q.

In fact, every second-order term can be measured this way. Not only for
cross products between own and enemy pieces, but also cooperative effects
between own pieces of equal or different type. With 7 piece types for each
side (14 in total) there would be 14*13/2 = 91 terms of this type possible.

H. G. Muller wrote on Sun, May 4, 2008 04:57 AM EDT:

Derek Nalls: | The additional time I normally give to playtesting games to improve | the move quality is partially wasted because I can only control the | time per move instead of the number of plies completed using most | chess variant programs. Well, on Fairy-Max you won't have that problem, as it always finishes an iteration once it decides to start it. But although Fairy-Max might be stronger than most other variant-playing AIs you use, it is not stronger than SMIRF, so using it for 10x8 CVs would still be a waste of time. Joker80 tries to minimize the time wastage you point out by attempting only to start iterations when it has time to finish them. It cannot always accurately guess the required time, though, so unlike Fairy-Max it has built in some emergency breaks. If they are triggered, you would have an incomplete iteration. Basically, the mechanism works by stopping to search new moves in the root if there already is a move with a similar score as on the previous iteration, once it gets in 'overtime'. In practice, these unexpectedly long iterations mainly occur when the previously best move runs into trouble that so far was just beyond the horizon. As the tree for that move will then look completely different from before, it takes a long time to search (no useful information in the hash), and the score will have a huge drop. It then continues searching new moves even in overtime in a desparate attempt to find one that avoids the disaster. Usually this is time well spent: even if there is no guarantee it finds the best move of the new iteration, if it aborts it early, it at least has found a move that was significantly better than that found in the previous iteration. Of course both Joker80 and Fairy-Max support the WinBoard 'sd' command, allowing you to limit the depth to a certain number of plies, although I never use that. I don't like to fix the ply depth, as it makes the engine play like an idiot in the end-game. | Can you explain to me in a way I can understand how and why | you are able to successfully obtain valuable results using this | method? Well, to start with, Joker80 at 1 sec per move still reaches a depth of 8-9 ply in the middle-game, and would probably still beat most Humans at that level. My experience is that, if I immediately see an obvious error, it is usually because the engine makes a strategic mistake, not a tactical one. And such strategic mistakes are awefully persistent, as they are a result of faulty evaluation, not search. If it makes them at 8 ply, it is very likely to make that same error at 20 ply. As even 20 ply is usually not enough to get the resolution of the strategical feature within the horizon. That being said, I really think that an important reason I can afford fast games is a statistical one: by playing so many games I can be reasonably sure that I get a representative number of gross errors in my sample, and they more or less cancel each other out on the average. Suppose at a certain level of play 2% of the games contains a gross error that turns a totally won position into a loss. If I play 10 games, there is a 20% error that one game contains such an error (affecting my result by 10%), and only ~2% probability on two such errors (that then in half the cases would cancel, but in other cases would put the result off by 20%). If, OTOH, I would play 1000 faster games, with an increased 'blunder rate' of 5% because of the lower quality, I would expect 50 blunders. But the probability that they were all made by the same side would be negligible. In most cases the imbalace would be around sqrt(50) ~ 7. That would impact the 1000-game result by only 0.7%. So virtually all results would be off, but only by about 0.7%, so I don't care too much. Another way of visualizing this would be to imagine the game state-space as a2-dimensional plane, with two evaluation terms determining the x- and y-coordinate. Suppose these terms can both run from -5 to +5 (so the state space is a square), and the game is won if we end in the unit circle (x^2 + y^2 < 1), but that we don't know that. Now suppose we want to know how large the probability of winning is if we start within the square with corners (0,0) and (1,1) (say this is the possible range of the evaluation terms when we posses a certain combination of pieces). This should be the area of a quarter circle, PI/4, divided by the area of the square (1), so PI/4 = 79%. We try to determine this empirically by randomly picking points in the square (by setting up the piece combination in some shuffled configuration), and let the engines play the game. The engines know that getting closer or farther away of (0,0) is associated with changing the game result, and are programmed to maximize or minimize this distance to the origin. If they both play perfectly, they should by definition succeed in doing this. They don't care about the 'polar angle' of the game state, so the point representing the game state will make a random walk on a circle around the origin. When the game ends, it will still be in the same region (inside or outside the unit circle), and games starting in the won region will all be won. Now with imperfect play, the engines will not conserve the distance to the origing, but their tug of war will sometimes change it in favor of one or the other (i.e. towards the origin, or away from it). If the engines are still equally strong, by definition on the average this distance will not change. But its probability distribution will now spread out over a ring with finite width during the game. This might lead to won positions close to the boundary (the unit circle) now ending up outside it, in the lost region. But if the ring of final game states is narrow (width << 1), there will be a comparable number of initial game states that diffuse from within the unit circle to the outside, as in the other direction. In other words, the game score as a function of the initial evaluation terms is no longer an absolute all or nothing, but the circle is radially smeared out a little, making a smooth transition from 100% to 0% in a narrow band centered on the original circle. This will hardly affect the averaging, and in particular, making the ring wider by decreasing playing accuracy will initially hardly have any effect. Only when play gets so wildly inaccurate that the final positions (where win/loss is determined) diverge so far from the initial point that it could cross the entire circle, you will start to see effects on the score. In the extreme case wher the radial diffusion is so fast that you could end up anywhere in the 10x10 square when the game finishes, the result score will only be PI/100 = 3%. So it all depends on how much the imperfections in the play spread out the initial positions in the game-state space. If this is only small compared to the measures of the won and lost areas, the result will be almost independent of it.

Simplified Chess. Missing description (8x7, Cells: 56) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, May 8, 2008 10:44 AM EDT:

I don't think that 'promoting to a captured piece only' is a simplification of the rules. 'Always promote to Queen' would be a simplification. This just adds a complex rule.

H. G. Muller wrote on Thu, May 8, 2008 12:15 PM EDT:

Well, I do not consider the stalemate rule essential to Chess, and there are many variants where stalemate = loss.

You won't get rid of many draws, though, when you abolish it. To get rid of draws entirely, you could add some kind of a tie-break to the game, like penalty shootouts in soccer: In a position where FIDE rules would declare draw (50-moves, 3-fold-rep, insuff. material) you could trigger this tie-break. From the moment on it is triggered, the opponent can do two moves in a row, then you can do three moves, then he 4, etc. This would even work for King vs King, as in the end there will be no place you can hide without his King being able to capture you.

H. G. Muller wrote on Fri, May 9, 2008 06:24 AM EDT:

Rich Hutnik:
| Do you want a flipped rook to become a 'Jester' piece that can 
| represent any other piece on the board?  Guess what the rook does 
| now.  It is that.  
This has never been a problem when I was playing OTB games. In most variants the choice of promotion piece is a rather academic one anyway, as in practice almost always the strongest piece is chosen. After the promotion, if the piece is then not captured, the game is over in 5 or 6 moves...

Even in Capablanca Chess, where there are 3 nearly equivalent pieces available (Q, C and A), it took me months before I discvered that 'underpromotion' to C or A was not properly implemented in my engine Joker80. Although it was considering other promotions than Q in its search, the MakeMove routine at game level always promoted to Q, overruling the choice. This never changed the game result, and I only discovered it when Joker80 announced mate-in-1 on a promotion move, and then the game continued a few more moves before it actually was checkmate.

People that want to play variants should have a wider choice of piece equipment anyway. An inverted Rook is a warning sign that whatever it is, it is not a Rook. But nothing is more annoying to a Chess player than having a Knight on the board that doesn't move like a Knight, but as a Camel or Zebra. The solution to that is easy enough:

http://home.hccnet.nl/h.g.muller/ultima.html

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Mon, May 12, 2008 01:57 AM EDT:

To Derek:

I am aware that the empirical Rook value I get is suspiciously low. OTOH,
it is an OPENING value, and Rooks get their value in the game only late.
Furthermore, this only is the BASE VALUE of the Rook; most pieces have a
value that depends on the position on the board where it actually is, or
where you can quickly get it (in an opening situation, where the opponent
is not yet able to interdict your moves, because his pieces are in
inactive places as well). But Rooks only increase their value on open
files, and initially no open files are to be seen. In a practical game, by
the time you get to trade a Rook for 2 Queens, there usually are open
files. So by that time, the value of the Q vs 2R trade will have gone up
by two times the open-file bonus. You hardly have the possibility of
trading it before there are open files. So it stands to reason that you
might as well use the higher value during the entire game.

In 8x8 Chess, the Larry Kaufman piece values include the rule that a Rook
should be devaluated by 1/8 Pawn for each Pawn on the board there is over
five. In the case of 8 Pawns that is a really large penalty of 37.5cP for
having no open files. If I add that to my opening value, the late
middle-game / end-game value of the Rook gets to 512, which sounds a lot
more reasonable.

There are two different issues here:
1) The winning chances of a Q vs 2R material imbalance game
2) How to interpret that result as a piece value

All I say above has no bearing on (1): if we both play a Q-2R match from
the opening, it is a serious problem if we don't get the same result. But
you have played only 2 games. Statistically, 2 games mean NOTHING. I don't
even look at results before I have at least 100 games, because before they
are about as likely to be the reverse from what they will eventually be,
as not. The standard deviation of the result of a single Gothic Chess game
is ~0.45 (it would be 0.5 point if there were no draws possible, and in
Gothic Chess the draw percentge is low). This error goes down as the
square root of the number of games. In the case of 2 games this is
45%/sqrt(2) = 32%. The Pawn-odds advantage is only 12%. So this standard
error corresponds to 2.66 Pawns. That is 1.33 Pawns per Rook. So with this
test you could not possibly see if my value is off by 25, 50 or 75. If you
find a discrepancy, it is enormously more likely that the result of your
2-game match is off from to true win probability.

Play 100 games, and the error in the observed score is reasonable certain
(68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only thn
you can see with reasonable confidence if your observations differ from
mine.

[Subject Thread] [Add Response]

H. G. Muller wrote on Mon, May 12, 2008 02:06 AM EDT:

Note that you can also use WinBoard as a FEN editor. There are commands
(with shortcut keys) to copy FENs from and to the clipboard. And there is
an edit-position mode that allows you to conveniently drag and drop pieces
over the board, and add new ones from a popup menu when right-clicking a
square.

http://home.hccnet.nl/h.g.muller/winboardF.html

Knightmate. Win by mating the knight. (8x8, Cells: 64) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Mon, May 12, 2008 05:31 PM EDT:

Note that there has just been released a WinBoard compatible version of the variant-capable engine Dabbaba of Jens Baek Nielsen. One of the games it knows is Knightmate. You can currently watch it play a Knightmate match live against my own engine Fairy-Max, on my Chess-Live! webserver

http://80.100.28.169/gothic/knightmate.html

for the next one or two days.

If anyone knows any other WinBoard engines that can play Knightmate, let me know; then I could hold a tournament.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Mon, May 12, 2008 06:12 PM EDT:

Drek Nalls:
| They definitely mean something ... although exactly how much is not 
| easily known or quantified (measured) mathematically.
Of course that is easily quantified. The entire mathematical field of
statistics is designed to precisely quantify such things, through
confidence levels and uncertainty intervals. The only thing you proved
with reasonable confidence (say 95%) is that two Rooks are not 1.66 Pawn
weaker than a Queen. So if Q=950, then R > 392. Well, no one claimed
anything different. What we want to see is if Q-RR scores 50% (R=475) or
62% (R=525). That difference just can't be seen with two games. Play 100.
There is no shortcut. Even perfect play doesn't help. We do have perfect
play for all 6-men positions. Can you derive piece values from that, even
end-game piece values???

| Statistically, when dealing with speed chess games populated 
| exclusively with virtually random moves ... YES, I can understand and 
| agree with you requiring a minimum of 100 games.  However, what you 
| are doing is at the opposite extreme from what I am doing via my 
| playtesting method.
Where do you get this nonsense? This is approximately master-level play.
Fact is that results from playing opening-type positions (with 35 pieces
or more) are stochastic quantity at any level of play we are likely to see
the next few million years. And even if they weren't, so that you could
answer the question 'who wins' through a 35-men tablebase, you would
still have to make some average over all positions (weighted by relevance)
with a certain material composition to extract piece values. And if you
would do that by sampling, the resukt would again be a sochastic quantity.
And if you would do it by exhaustive enumeration, you would have no idea
which weights to use.
And if you are sampling a stochastic quantity, the error will be AT LEAST
as large as the statistical error. Errors from other sources could add to
that. But if you have two games, you will have at least 32% error in the
result percentage. Doesnt matter if you play at an hour per move, a week
per move, a year per move, 100 year per move. The error will remain >=
32%. So if you want to play 100 yesr per move, fine. But you will still
need 100 games.

| Nonetheless, games played at 100 minutes per move (for example) have 
| a much greater probability of correctly determining which player has 
| a definite, significant advantage than games played at 10 seconds per 
| move (for example).
Why do I get the suspicion that you are just making up this nonsense? Can
you show me even one example where you have shown that a certain material
advantage would be more than 3-sigma different for games at 100 min / move
than for games at 1 sec/move? Show us the games, then. Be aware that this
would require at least 100 games at aech time control. That seems to make
it a safe guess that you did not do that for 100 min/move.
 On the other hand, in stead of just making things up, I have actually
done such tests, not with 100 games per TC, but with 432, and for the
faster even with 1728 games per TC. And there was no difference beyond the
expected and unavoidable statistical fluctuations corresponding to those
numbers of games, between playing 15 sec or 5 minutes. 
The advantage that a player has in terms of winning probability is the
same at any TC I ever tried, and can thus equally reliably be determined
with games of any duration. (Provided ou have the same number of games).
If you think it would be different for extremely long TC, show us
statistically sound proof.

I might comment on the rest of your long posting later, but have to go
now...

H. G. Muller wrote on Tue, May 13, 2008 03:17 AM EDT:

This discussion is pointless. In dealing with a stochastic quantity, if
your statistics are no good, your observations are no good, and any
conclusions based on them utterly meaningless. Nothing of what you say
here has any reality value, it is just your own fantasies. First you
should have results, then it becomes possible to talk about what they
mean. You have no result. Get statistically meaningful testresults. If
your method can't produce them, or you don't feel it important enough to
make your method produce them, don't bother us with your cr*p instead.

Two sets of piece values as different as day and knight, and the only
thing you can come up with is that their comparison is 'inconclusive'.
Are you sure that you could conclusively rule out that a Queen is worth 7,
or a Rook 8, by your method of 'playtesting'? Talk about pathetic: even
the two games you played are the same. Oh man, does your test setup s*ck!
If you cannot even decide simple issues like this, what makes you think
you have anything meaningful to say about piece values at all?

H. G. Muller wrote on Tue, May 13, 2008 06:59 AM EDT:

Once upon a time I had a friend in a country far, far away, who had
obtained a coin from the bank. I was sure this coin was counterfeit, as it
had a far larger probability of producing tails. I even PROVED it to him: I
threw the coin twice, and both times tails came up. But do you think the
fool believed me? No, he DIDN'T! 

He had the AUDACITY to claim there was nothing wrong with the coin,
because he had tossed it a thouand times, and 523 times heads had come up!
While it was clear to everyone that he was cheating: he threw the coin only
10 feet up into the air, on each try. While I brought my coin up to 30,000
feet in an airplane, before I threw it out of the window, BOTH times! And,
mind you, both times it landed tails! And it was not just an ordinary
plane, like a Boeing 747. No sir, it was a ROCKET plane!

And still this foolish friend of mine insisted that his measly 10 feet
throws made him more confident that the coin was OK then my IRONCLAD PROOF
with the rocket plane. Ridicuoulous! Anyone knows that you can't test a
coin by only tossing it 10 feet. If you do that, it might land on any
side, rather than the side it always lands on. He might as well have
flipped a coin! No wonder they send him to this far, far away country: no
one would want to live in the same country as such an idiot. He even went
as far as to buy an ICECREAM for that coin, and even ENJOYED eating that!
Scandalous! I can tell you, he ain't my friend anymore! Using coins that
always land on one side as if it were real money.

For more fairy tales and bed-time stories, read Derek's postings on piece
values...
:-) :-) :-)

H. G. Muller wrote on Tue, May 13, 2008 09:58 AM EDT:

Jianying Ji:
| Two suggestion for settling debates such as these. First distributed
| computing to provide as much data as possible. And bayesian statistical
| methods to provide statistical bounds on results.

Agreed: one first needs to generate data. Without data, there isn't even
a debate, and everything is just idle talk. What bounds would you expect
from a two-game dataset? And what if these two games were actually the
same?

But the problem is that the proverbial fool can always ask more than
anyone can answer. If, by recruting all PCs in the World, we could
generate 100,000 games at an hour per move, an hour per move will of
course not be 'good enough'. It will at least have to be a week per
move. Or, if that is possible, 100 years per move.

And even 100 years per move are of course no good, because the computers
will still not be able to search into the end-game, as they will search
only 12 ply deeper than with 1 hour per move. So what's the point?

Not only is his an énd-of-the-rainbow-type endeavor, even if you would get
there, and generate the perfect data, where it is 100% sure and prooven for
each position what the outcome under perfect play is, what then? Because
for simple end-games we are alrady in a position to reach perfect play,
through retrograde analysis (tablebases).

So why not start there, to show that such data is of any use whatsoever,
in this case for generating end-game piece values? If you have the EGTB
for KQKAN, and KAKBN, how would you extract a piece value for A from it?

H. G. Muller wrote on Tue, May 13, 2008 12:57 PM EDT:

Is this story meant to illustrate that you have no clue as to how to
calculate statistical significance? Or perhaps that you don't know what
it is at all?

The observation of a single tails event rules out the null hypothesis that
the lottery was fair (i.e. that the probability for this to happen was
0.000,000,01) with a confidence of 99.999,999%.

Be careful, though, that this only describes the case where the winning
android was somehow special or singled out in advance. If the other
participants to the lottery were 100 million other cheating androids, it
would not be remarkable in anyway that one of them won. The null
hypothesis that the lottery was fair predicted a 100% probability for
that.

But, unfortunately for you, it doesn't work for lotteries with only 2
tickets. Then you can rule the null hypothesis that the lottery was fair
(and hence the probability 0.5) with a confidence of 50%. And 50%
confidence means that in 50% of the cases your conclusion is correct, and
in the other 50% of the cases not. In other words, a confidence level of
50% is a completely blind, uninformed random guess.

H. G. Muller wrote on Tue, May 13, 2008 01:06 PM EDT:

Reinhard Scharnagl:
| I am still convinced, that longer thinking times would have an 
| influence on the quality of the resulting moves.

Yes, so what? Why do you think that is a relevant remark? The better moves
won't help you at all, if the opponent also does better moves. The result
will be the same. And the rare cases it is not, on the average cancel each
other.

So for the umptiest time:
NO ONE DENIES THAT LONGER THINKING TIME PRODUCES SOMEWHAT BETTER MOVES.
THE ISSUE IS THAT IF BOTH SIDES PLAY WITH LONGER TC, THEIR WINNING
PROBABILITIES WON'T CHANGE.

And don't bother to to tell us that you are also convinced that the
winning probabilities will change, without showing us proof. Because no
one is interested in unfounded opinions, not even if they are yours.

H. G. Muller wrote on Tue, May 13, 2008 05:13 PM EDT:

Reinhard, that is not relevant. It will happen on the average as often for
the other side. It is in the nature of Chess. Every game that is won, is
won by an error, that might not have been made on longer thinking. As the
initial position is not a won position for eaiter side. But most games are
won by either side, and if they are allowed to think longer, most games are
still won by either side.

What is so hard to understand about the statement 'the win probability
(score fraction, if you allow for draws) obtained from a given quiet, but
complex (many pieces) position between equal opponents does not depend on
time control' that it prompt people to come up with irrelevancies? Why do
you think that saying anything at all that does not mention an observed
probability would have any bearing on this statement whatsoever?

I don't think the ever more hollow sounding selfdeclared superiority of
Derek need much comment. He obviously doesn't know zilch about
probability theory and statistics. Shouting that he does won't make it
so, and won't fool anyone.

H. G. Muller wrote on Wed, May 14, 2008 03:09 AM EDT:

This discussion is too silly for words anyway. Because even if it were true that the winning probability for a given material imbalance would be different at 1 hour per move than it would be at 10 sec/move, it would merely mean that piece values are different for different quality players. And although that is unprecedented, that revelation in itself would not make the piece values at 1 hour per move of any use, as that is a time control that no one wants to play anyway.

So the whole endeavor is doomed from the start: by testing at 1 hour per move, either you measure the same piece values as you would at 10  sec/move, and wasted 99.7% of your time, or you find different values, and then you have wrong values, which cannot be used at any time control you would actually want to play...

Knightmate. Win by mating the knight. (8x8, Cells: 64) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Wed, May 14, 2008 05:11 PM EDT:

My small live tourney has led to a proliferation of WinBoard-compatible Knightmate engines. We now have:

JokerKM ( http://home.hccnet.nl/h.g.muller/jokerKM.exe)
CCCP-Knightmate ( http://www.marittima.pl/cccp)
Fairy-Max ( http://home.hccnet.nl/h.g.muller/dwnldpage.html, do not forget to download the accompanying fmax.ini with game definitions!)
Dabbaba ( http://homepages.tesco.net/henry.ablett/jims.html)

JokerKM is the strongest, CCCP and Fairy-Max are both about 400 Elo points weaker. Dabbaba is a rebuild of an old DOS engine from the 90s, and is some 300 Elo points behind that.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Thu, May 15, 2008 12:22 PM EDT:

Rich Hutnik:
| Anyone think this might be a sound approach?

Well, not me! Science is not a democracy. We don't interview people in
the street to determine if a neutron is heavier than a proton, or what the 100th decimal of the number pi is.

At best, you could use this method to determine the CV rating of the
interviewed people. But even if a million people would think that piece A
is worth more than piece B, and none the other way around, that doesn't
make it so. The only thing that counts is if A makes you win more often
than B would. If it doesn't, than it is of lower value. No matter what people say, or how many say it.

Capablanca's chess. An enlarged chess variant, proposed by Capablanca. (10x8, Cells: 80) (Recognized!)[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, May 17, 2008 09:50 AM EDT:

Note there are now many free computer programs that can play the 10x8
variants with the Capablanca piece set. Many do use the WinBoard protocol
to communicate their moves, so they can be made to play each other
automatically under the WinBoard GUI.

Pages with many links to downloadable engines you will find at 
http://home.hccnet.nl/h.g.muller/10x8.html and
[at another site.]

The results of a recent tournament of the WB compatible engines at long
time control (55 min + 5 sec/move), where each engine had to play each
other engine 10 times, over 5 different opening setups (Carrera, Bird,
Capablanca, and Embassy), led to the following ranking:

Rank Name               Elo    +    - games score oppo. draws 
   1 Joker80 n         2432   96   83    70   80%  2110    0% 
   2 TJchess10x8       2346   83   76    70   72%  2122    4% 
   3 Smirf 1.73h       2304   80   75    70   68%  2128    4% 
   4 Smirf Donation    2165   73   73    70   53%  2148    9% 
   5 [other software] 
   6 Fairy-Max 4.8 v   2027   72   77    70   34%  2168   11% 
   7 BigLion80 4apr    1945   76   84    70   26%  2179    7% 
   8 ArcBishop80 1.00  1822   86  103    70   15%  2197    4% 

Except for Smirf 1.73h, all the engines are available for free download,
from their various sources. In addition, there exist several programs with
incompatible interfaces, such as ChessV and Zillions of Games. Their level
of play is not thoroughly tested, as the incompatibility of their
interfaces makes it impossible to play them against each other without
assistance of a Human operator, which again makes it difficult to conduct
the hundreds of games necessary for reliable rating determination.
Compared to the ranking above, Zillions would rank at the very bottom. 

[The above has been edited to remove a name and site reference. It is the
policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. - J. Joyce]

H. G. Muller wrote on Sat, May 17, 2008 01:05 PM EDT:

I am sorry to have put your site in jeopardy, I was not aware that giving a link to a site as a source of information could make you subject to a lawsuit.

But why did you delete the reference to poor Michel's program? My own engines are mentioned on the unspeakable website as well, on the very page of which you deleted the link. I even gave permission to its owner to host them there for download, should I no longer want to host them myself. Does that mean I will in the future also not be allowed to mention any of my own engines here???

Would it at least be allowed to mention the perfomance rating of the [other software]? Anyway, people interested in the complete result of the WinBoard General 10x8 Championship 2008, can find it on my own website, on the page:

http://home.hccnet.nl/h.g.muller/BotG08G/finalstanding.html

H. G. Muller wrote on Sun, May 18, 2008 02:01 AM EDT:

OK, fair enough. But 10x8 variants are rapidly growing more popular with engine programmers, and I intend to contribute to that process through organizing the 'Battle of the Goths' tournament series, and publishing rating lists. I might want to share important developments in that area here, so it would be useful to know which engines can be mentioned, and which not. Is the problem caused by the 'G-word', and should I avoid any reference to engines that contain the G-word as part of their name? So far there are only two of this, but there are likely to be many more in the future, as people tend to name their engines after the variant they are playing.

H. G. Muller wrote on Mon, May 19, 2008 04:06 AM EDT:

Would it be OK then, if I just circumscribe the [other software] in my tournament as 'a version of the well known open-source program TSCP, adapted to play some 10x8 variants', and call it 'TSCP-derivative' for short?

Or is it too risky to mention the name of popular Chess engines like TSCP even in their normal Chess version, (or Capablanca version), once someone created a derivative of them that is capable to play the unspeakable variant?

Squirrel. Jumps two orthogonally, two diagonally, or like a knight.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Mon, May 19, 2008 01:51 PM EDT:

I am currently engaged in a massive test effort to understand such short-range leapers. It is slow going, though: there are many possible combinations of moves, especially if you drop the requirement for 8-fold symmetry. And I need at least 400 games to get an acceptable accuracy for the empirical piece value of a ertain piece type. Even then, the statistical (random) error in the piece values is about 0.1 Pawn, if I test them in pairs (to double the effect of any value difference). Your estimate seems reasonable, from what I have learned so far. 8-fold-symmetric SR compound leapers with N moves seem to have a value close to (30+5/8*N)*N, in centiPawn. That would evaluate to 640 for the Squirrel. And I expect the Squirrel to be one of the stronger such compounds, with this number of moves, because of the 'front' of 5 contiguous forward moves.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Tue, May 20, 2008 02:39 AM EDT:

First about the potential bug: I am afraid that I need more information to figure out what exactly was the problem. This is not a plain move-generator bug; when I feed the game to to my version of Joker80 here (which is presumably the same as that you are using), it accepts the move without complaints. It would be unconceivable anyway that a move-generator bug in such a common move would not have manifested itself in the many hundreds of games I had it play against other engines. OTOH, Human vs. engine play is virtually untested. Did you at any point of the game use 'undo' (through the WinBoard 'retract move')? It might be that the undo is not correctly implemented, and I would not notice it in engine-engine play. In fact it is very likely to be broken fter setting up a position, as I implemented it by resetting to the opening position and replaying all moves from there. But this won't work after loading a FEN (a feature I added only later). This is indeed something I should fix, but the current work-around would be not to use 'undo'. To make sure what happened, I would have to see the winboard.debug file (which records all communication between engine and GUI, including a lot of debug output from the engine itself). Unfortunately this file is not made by default. You would have to start WinBoard with the command-line option /debug, or press + + after starting WinBoard. And then immediately rename the winboard.debug to something else if a bug manisfests itself, to prevent it from being overwritten when you run WinBoard again. Joker80 also makes a log file 'jokerlog.txt', but this also is overwritten each time you re-run it. If you didn't run Joker80 since the bug, it might help if you sent me that file. Otherwise, I am afraid that there is little I can do at the moment; we would have to wait until the problem occurs again, and examine the recorded debug information. About the piece values: I could make a Joker80 version that reads the piece base values from a file 'joker.ini' at startup. Then you could change them to anything you want to test, without the need to re-compile. Would that satisfy your needs? Note that currently Joker80 is not really able to play CRC, as it only supports normal castling

H. G. Muller wrote on Tue, May 20, 2008 07:06 AM EDT:

OK, I replaced the joker80.exe on my website by one with adjustable piece
values. (If you run it from the command line, it should say version 1.1.14
(h).) I also tried to fix the bug in undo (which I discoverd was disabled
altogether in the previous version), and although it seemed to work, it
might remain a weak spot. (I foresee problems if the game contained a
promotion, for instance, as it might not remember the correct promotion
piece on replay.) So try to avoid using the undo.

I decided to make the piece values adjustable through a command-line
option, rather than from a file, to avoid problems if you want to run two
different sets of piece values (where you then would have to keep the
files separate somehow). The way it works now is that for the engine name
(that WinBoard asks in the startup dialog, or that you can put in the
winboard.ini file to appear in the selectable engines there), you should
write:

joker80.exe P85=300=350=475=875=900=950

The whole thing should be put between double quotes, so that WinBoard
knows the P... is an option to the engine, and not to WinBoard. The
numerical values are those of P, N, B, R, A, C and Q, respectively, in
centiPawn. You can replace them by any value you like. If you don't give
the P argument, it uses the default values. If you give a P argument with
not enough values, the engine exits.

Note that these are base values, for the positionally average piece. For N
and B this would be on c3, in the presence (for B) of ~ 6 own Pawns, half
of them on the color of the Bishop. A Bishop pair further gets 40cP bonus.
For the Rook it is the value for one in the absence of (half-)open files.
The Pawn value will be heavily modified by positional effects
(centralization, support by own Pawns, blocking by enemy Pawns), which on
the average will be positive.

Note that you can play two different versions against each other
automatically. The first engine plays white, in two-machines mode. (You
won't be able to recognize them from their name...)

H. G. Muller wrote on Tue, May 20, 2008 07:39 AM EDT:

One small refinement:

If the command-line argument was used to modify the piece values, Joker80
will give its own name to WinBoard as 'Joker80.xp', in stead of
'Joker80.np', so that it becomes less hard to figure out which engine
was winning (e.g. from the PGN file).

Note also that at very long time control you might want to enlarge the
hash table; default is 128MB, but if you invoke Joker80 as

'joker80.exe 22 P100=300=....'

it will use 256MB (and with 23 in stead of 22 it will use 512MB, etc.)

[Subject Thread] [Add Response]

H. G. Muller wrote on Tue, May 20, 2008 10:08 AM EDT:

What characterizes Chess variants:

1) move one piece at a time to an empty cell, or
2) capture an enemy piece by moving into its cell
3) win by capture of royal piece
4) many different piece types
5) a large fraction of the pieces are pawns
6) pawns are weak pieces which move irreversibly, and
   promote to a stronger piece when advanced enough.

Some of these rules can be violated, but only if all other
characteristics are very close to a very common variant.

H. G. Muller wrote on Tue, May 20, 2008 02:05 PM EDT:

Well, even FIDE Chess violates the defining characteristics, by the
non-Chess-like moves of castling and e.p. capture. But, like I stated,
violation of some of the rules does not immediately disqualify a game as a
CV. Extinction Chess doesn't have a royal piece, but in all other respects
it is identical to FIDE Chess. So it is clearly a CV.

But I would not call checkers or draughts CVs. In the interpretation that
the chips are pawns, (they do promote...), the capture mode and piece
variety is too different from common variants to qualify.

I do not consider Ultima / Baroque a Chess variant. It does have piece
variety, and even a royal piece, but the capture modes are too alien, only
the King has a Chess-like capture, most pieces don't.

I see no problem with Jacks and Witches. The majority of the pieces are
normal Chess pieces. OK, so some Witch moves violate the one-at-a-time
rule, like castling does. No problem, as even within this game this is an
exception.

IMO the array is not relevant as a distinctive trait of variants. You
could call them sub-variants at best. Near Chess is simply FIDE Chess. The
opening position of Near Chess occurs even in the game tree of FIDE Chess.
In that respect FRC is more different from FIDE Chess than Near Chess is:
there at least the opening position can be unreachable frrom the FIDE
opening.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Tue, May 20, 2008 02:43 PM EDT:

Well, to get an impression at what you can expect: In my first versions of
Joker80 I still used the Larry-Kaufman piece values of 8x8 Chess. So the
Bishop was half a Pawn too low, nearly equal to the Knight (as with more
than 5 Pawns, Kaufman has a Knight worth more than a lone Bishop,
neutraling a large part of the pair bonus.) Now unlike a Rook, a Bishop is
very easy to trade for a Knight, as both get into play early. Making the
trade usually wrecks the opponent's pawn structure by creating a doubled
Pawn, giving enough compensation to make it attractive.

So in almost all games Joker played with two Knights against two Bishops
after 12 moves or so. Fixing that did increase the playing strength by
~100 Elo points. So where the old version would score 50%, the improved
version would score 57%.

Now a similarly bad value for the Rook would manifest itself much more
difficultly: the Rooks get into play late, there is no nearly equal piece
for which a 1:1 trade changes sign, and you would need 1:3 trades (R vs
B+2P) or 2:2 trades (R+P for N+N), which are much more difficult to set
up. So I would expect that being half a Pawn off on the Rook value would
only reduce your score by about 3%, rather than 7% as with the Bishop.
After playing 100 games, the score differs by more than 3% from the true
win probability more often than not. So you would need at least 400 games
to show with minimal confidence that there was a difference.

Beware that the result of the games are stochastic quantities. Replay the
game at the same time control, and the game Joker80 plays will be
different. And often the result will be different. This is true at 1 sec
per move, but it is equally true at 1 year per move. The games that will
be played, are just a sample from the myriads of games Joker80 could play
with non-zero probability. And with fewer than 400 games, the difference
between the actually measured score percentage and the probability you
want to determine will be in most cases larger than the effect of the
piece values, if they are not extremey wrong (e.g. setting Q < B).

H. G. Muller wrote on Wed, May 21, 2008 08:48 AM EDT:

It looks OK to me.

One caveat: the normalization (e.g. Pawn = 100) is not completely
arbitrary, as the engine weights material against positional terms, and
doubling all piece values would effectively scale down the importance of
passers and King Safety.

In addition, the engine also uses some heavily rounded 'quick' piece
values internally, where B=N=3, R=5, A=C=8 and Q=9, to make a rough guess
if certain branches stand any chance to recoup the material it gave
earlier in the branche. So in certain situations, when it is behind 800
cP, it won't consider capturing a Rook, because it expects that to be
worth about 500 cP, and thus falls 300 cP below the target. Such a large
deficit would be beyond the safety margin for pruning the move. But if the
piece values where scaled up such that the 800 merely represented being a
Bishop behind, this obviously would be an unjustified pruning.

The safety margin is large enough to allow some leeway here, but don't
overdo it. It would be safest to keep the value of Q close to 950.

I am indeed skeptical to the possibility to do enough games to measure the
difference you want to see in the total score percentage. But perhaps some
sound conclusions could be drawn by not merely looking at the result, but
at the actual games, and single out the Q vs 2R trades. (Or actually any
Rook versus other material trade before the end-game. Rooks capturing
Pawns to prevent their promotion probably should not count, though.) These
could then be used to separately extracting the probability for such a
trade for the two sets of piece values, and determine the winning
probability for each of the piece values once such a trade would have
occurred. By filtering the raw data this way, we get rid of the stochastic
noise produced by the (majority of) games whwre the event we want to
determine the effect of would not have occurred.

H. G. Muller wrote on Wed, May 21, 2008 01:49 PM EDT:

Well, I share that concern. But note that the low Rook value was not only
based on the result of Q-2R assymetric testing. I also played R-BP and
NN-RP, which ended unexpectedly bad for the Rook, and sets the value of
the Rook compared to that of the minor pieces. While the value of the
Queen was independently tested against that of the minor pieces by playing
Q-BNN.

The low difference between R and B does make sense to me now, as the wider
board should upgrade the Bishop a lot more than the Rook. The Bishop gets
extra forward moves, and forward moves are worth a lot more than lateral
moves. I have seen that in testing cylindrical pieces, (indicated by *),
where the periodic boundary condition w.r.t. the side edges effectifely
simulates an infinitely wide board. In a context of normal Chess pieces,
B* = B+P, while R* = R + 0.25P. OTOH, Q* = Q+2P. So it doesn't surprise
me that on wider boards R loses compared to Q and B.

I can think of several systematic errors that lead to unrealistically poor
performance of the Rook in asymmetric playtesting from an opening position.
One is that Capablanca Chess is a very violent game, where the three
super-pieces are often involved in inflicting an early chekmate (or nearly
so, where the opponent has to sacrifice so much material to prevent the
mate, that he is lost anyway). The Rooks initially offer not much defense
against that. But your chances for such an early victory would be strongly
reduced if you were missing a super-piece. So perhaps two Rooks would do
better against Q after A and C are traded. This explanation would do
nothing for explaining poor Rook performance of R vs B, but perhaps it is
B that is strong (it is also strong compared to N). The problem then would
be not so much low R value, but high Q value, due to cooperativity between
superpieces. So perhaps the observed scores should not be entirely
interpreted as high base values for Q, C and A, but might be partly due to
super-piece pair bonuses similar to that for the Bishop pair. Which I would
then (mistakenly) include in the base value, as the other super-pieces are
always present in my test positions.

Another possible source of error is that the engine plays a strategy that
is not well suited for playing 2R vs Q. Joker80's evaluation does not
place a lot of importance to keeping all its pieces defended. In general
this might be a winning strategy, giving the engine more freedom in using
its pieces in daring attacks. But 2R vs Q might be a case where this
backfires, and where you can only manifest the superiority of your Rook
force by very careful and meticulous, nearly allergic defense of your
troops, slowly but surely pushing them forward. This is not really the
style of Joker's play. So it would be interesting to do the asymmetreic
playtesting for Q vs 2R also with other engines. But TJchess10x8 only
became available long after I started my piece value project, TSCP-G does
not allow setting up positions (although now I know a work-around for
that, forcing initial moves with both ArchBishops to capture all pieces to
delete, and then retreating them before letting the engine play). And Smirf
initially could not play automatically at all, and when I finally made a WB
adapter for it so that it could, fast games by it where more decided by
timing issues than by play quality (many losses on time with scores like
+12!). And Fairy-Max is really a bit too simplistic for this, not knowing
the concept of a Bishop pair or passed pawns, besides being a slower
searcher.

[Subject Thread] [Add Response]

H. G. Muller wrote on Wed, May 21, 2008 04:10 PM EDT:

[I deleted this post, because I accidentally posted it in the wrong discussion.]

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Wed, May 21, 2008 04:29 PM EDT:

Is there any special reason you want to keep the Pawn value equal in all
trial versions, rather than, say, the total value of the army, or the
value of the Queen? Especially in the Scharnagl settings it makes almost
every piece rather light compared to the quick guesses used for pruning.

Note that there are so many positional modifiers on the value of a pawn
(not only determined by its own position, but also by the relation to
other friendly and enemy pawns) that I am not sure what the base value
really means. Even if I say that it represents the value of a Pawn at g2,
the evaluation points lost on deleting a pawn on g2 will depend on if
there are pawns on e- and i-file, and how far they are advanced, and on
the presence of pawns on the f- and h-file (which mighht become backward
or isolated), and of course if losing the pawn would create a passer for
the opponent.

If I were you, I would normalize all models to Q=950, but then replace
the
pawn value everywhere by 85 (I think the standard value used in Joker is
even 75). I don't think you could say then that you deviate from the
model, as the models do not really specify which type of Pawn they use as
a standard. My value refers to the g2 pawn in an opening setup. Perhaps
Reinhard's value refers to an 'average' pawn, in a typical pawn chain
occurring in the early middle game, or a Pawn on d4/e4 (which is the most
likely to be traded).

As to the B-pair: tricky question. The way you did it now would make the
first Bishop to be traded of the value the model prescribes, but would
make the second much lighter. If you would subtract half the bonus, then
on the average they would be what the model prescribes. The value is
indeed hard-wired in Joker, but if you really want, I could make it
adjustable through a 8th parameter.

[Subject Thread] [Add Response]

H. G. Muller wrote on Wed, May 21, 2008 04:38 PM EDT:

Well, I do not really play CVs myself, but I love to watch games played by
my engines, especially blitz games. And from this I learned that
Knightmate is a CV that definitely works. It is just different enough from
FIDE Chess to make it interesting, but familiar enough that you immediately
can grasp it. Great game!

Similarly for the 10x8 Capablanca variants. They are very interesting
because of the Archbishop, which tends to be very active.

Capablanca's chess. An enlarged chess variant, proposed by Capablanca. (10x8, Cells: 80) (Recognized!)[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, May 22, 2008 03:10 AM EDT:

'Have you tried the Modern Capablanca Random Chess viariant with your engines?'

No, my engines do not have FRC-type castling ability yet. It is still on my to-do list for Joker80, together with allowing it to play on 8x8 by filling up part of the board with impassible objects. (It already uses such objects to confine the pieces to 10x8, as its internal board is 32x12, so this is a minor change; it just has to adapt the positional center-points table to where the new corners are. And of course use a different type of castling.) The main objective would be to play in FRC competitions.

The Modern CRC variant doesn't particularly appeal to me. The resulting games should be indistinguishable from normal CRC. The only difference is the opening array. The Bishop adjustment rule is also an opening thing. Opening theory never had much appeal to me, I consider it the dullest part of Chess. None of my engines ever had an opening book, even in variants like 8x8 FIDE, where extensive opening theory exists. The Bishop adjustment rule seems awkward from an aestethic point of view, and half-hearted from a logical point of view: first you change the rules by allowing arrays with like Bishops, and then you largely subvert the effect of itby allowing the adjustment. As the disadvantage of having the Bishops on like colors was measured by me to be half a Pawn, not doing it would be very poor strategy.

For exploring the possibilities like Bishops offer, it would be much cleaner to augment the Bishop with a single orthognal backward step as non-capture only. Then people can actually use it without hesitation, as they can always undo the effect later. The extra move of such a 'Naughty Bishop' hardly has any tactical value in itsels, as it is a non-capture, and directed backwards. It added only about 15 cP to the piece value. Introducing a piece of different gait is much cleaner than adding a special, complicated rule.

The symmetric castling seems to add nothing, it looks just like a difference for the sake of being different. The same holds for the inversion symmetry in stead of vertical-flip symmetry. This doesn't mean this would be a poor game to play, of course. But I think such irrelevant differences do make it a poor design as a CV.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Thu, May 22, 2008 04:07 AM EDT:

'I cannot speak for Reinhard Scharnagl at all, though.'

This is exactly the problem. 'base value' for Pawns is a very
ill-defined concept, as it is the smallest of all piece base values, while
the positional terms regarding to Pawns are usually the largest of all
positional terms. And the whole issue of pawn-structure evaluation in
Joker is so complex that I am not even sure if the average of positional
terms (over all pawns and over a typical game) is positive or negative.
Pawns get penalties for being doubled, or having no Pawns next or behind
them on neigboring files. They get points for advancing, but they get
penalties for creating squares that no longer can be defended by any Pawn.
My guess is that in general, the positional terms are slightly positive,
even for non-passers not involved in King Safety.

A statement like 'a Knight is worth exactly 3 Pawns' is only meaningful
after exactly specifying which kind of pawn. If the Scharnagl model
evaluates all non-passers exactly the same (except, perhaps, edge Pawns),
then the question still arises how to most-closely approximate that in
Joker80, which doesn't. And simply setting the Joker80 base value equal
to the single value of the Scharnagle model is very unlikely to do it. 

Good differentiation in Pawn evaluation is likely to impact play strength
much more than the relative value of Pawns and Pieces, as Pawns are traded
for other Pawns (or such trades are declined by pushing the Pawn and
locking the chains) much more often than they can be traded for Pieces.

H. G. Muller wrote on Thu, May 22, 2008 04:13 AM EDT:

'Do you think these piece values will work smoothly with Joker80 running
under Winboard F yet remain true to all three models?'

Yes, I think these values will not conflict in anyway with any of the
hard-wired value approximates that are used for pruning decisions. At
least not to the point where it would lead to any observable effect on
playing strength. (Prunings based on the piece values occur only close to
the leaves, and engines are usually quite insensitive as to how exactly
you prune there.)

H. G. Muller wrote on Thu, May 22, 2008 03:05 PM EDT:

'Let me provide another challenge for people here regarding pawns.  How
much is a pawn that moves only one space forward (not initial 2) but
starts on the third row instead of second worth in contrast to a normal
chess pawn?  How much is it worth alone, and then in a line of pawns that
start on the third row?'

But this is a totally normal FIDE Pawn...

It would get a pretty large positional penalty if it was alone
(isolated-pawn penalty). In a complete line of pawns on the 3rd rank it
would be worth a lot more, as it would not be isolated, and not be
backward. All in all it would be fairly similar to having a line of Pawns
on second rank, as the bonus for pushing the Pawns forward 1 square is
approximately cancelled by not having Pawn control anymore over any of the
squares on the 3rd rank.

H. G. Muller wrote on Fri, May 23, 2008 04:16 AM EDT:

'Because of all this, I suggest evaluating entire configuration of
pieces,
rather than a single piece.'

This is exactly what Chess engines do. But it is a subject that transcends
piece values. Material evaluation is supposed to answer the question:
'what combination of pieces would you rather have, without knowing where
they stand on the board'. Piece values are an attempt to approximate the
material evaluation as a simple sum of the value of the individual pieces,
making up the army.

It turns out that material evaluation is by far the largest component of
the total evaluation of a Chess position. And this material evaluation
again can be closely approximated by a sum of piece values. The most
well-known exception is the Bishop pair: having two Bishops is worth about
half a Pawn more than double the value of a single Bishop. Other
non-additive terms are those that make the Bishop and Rook value dependent
on the number of Pawns present. To account for such effects some engines
(e.g. Rybka) have tabulated the total value of all possible combinations
of material (ignoring promotions) in a 'material table'. Such tables can
then also account for the material component of the evaluation that gives
the deviation from the sum of piece values due to cooperative effects
between the various pieces.

Useful as this may be, it remains true that piece values are by far the
largest contribution to the total evaluation. The only positional terms
that can compete with it are passed pawns (a Pawn on 7th rank is worth
nearly 2.5 normal Pawns) and King Safety (having a completely exposed King
in the middle game, when the opponent still has a Queen or similar
super-piece, can be worth nearly a Rook).

H. G. Muller wrote on Fri, May 23, 2008 05:36 AM EDT:

Derek Nalls:
| This might require very deep runs of moves with a completion time 
| of a few weeks to a few months per pair of games to achieve 
| conclusive results.

It still escapes me what you hope to prove by playing at such an
excessively long Time Control. If the result would be different from
playing at a a more 'normal' TC,  like one or two hours per game, (which
IMO will not be the case), it would only mean that any conclusions you draw
on them would be irrelevant for playing Chess at normal TC.

Furthermore, playing 2 games will be like flipping a coin. The result,
whatever it is, will not prove anything, as it would be different if you
would repeat the test. Experiments that do not give a fixed outcome will
tell you nothing, unless you conduct enough of them to get a good
impression on the probability for each outcome to occur.

H. G. Muller wrote on Sat, May 24, 2008 05:49 AM EDT:

Derek:
| Conclusions drawn from playing at normal time controls are
| irrelevant compared to extremely-long time controls.

First, that would only be true if the conclusions would actually depend on
the TC. Which is a totally unproven conjecture on your part, and in fact
contrary to any observation made at TCs where such observations can be
made with any accuracy (because enough games can be played). This whole thing reminds me of my friend, who always claims that stones fall upward. When I then drop a stone to refute him, he jsut shrugs, and says it proves nothing because the stone is 'not big enough'. Very conveniently for him, the upward falling of stones can only be observed on stones that are too big for anyone to lift...
But the main point is of course, if you draw a conclusion that is valid
only at a TC that no one is interested in playing, what use would such a
conclusion be?

| The chance of getting the same flip (heads or tails) twice-in-a-row
| is 1/4. Not impressive but a decent beginning. Add a couple or a
| few or several consecutive same flips and it departs 'luck' by a
| huge margin.

Actually the chance for twice the same flip in a row is 1/2. Unless you
are biased as to what the outcome of the flip should be (one-sided
testing). And indeed, 10 identical flips in a row would be unlikely to
occur by luck by a large margin. But that is rather academic, because you
won't see 10 identical results in a row between the subtly different
models. You will see results like 6-4 or 7-3, which will again be very
likely to be a result of luck (as that is exactly what they are the result
of, as you would realize after 10,000 games when the result is standing at
4,628-5,372).

Calculate the number of games you need to typically get a result for a
53-47 advantage that could not just as easily have been obtained from a
50-50 chance with a little luck. You will be surprised...

| I have wondered why the performance of computer chess programs is
| unpredictable and varied even under identical controls. Despite
| their extraordinary complexity, I think of computer hardware,
| operating systems and applications (such as Joker80) as deterministic.

In most engines there alwas is some residual indeterminism, due to timing
jitter. There are critical decision points, where the engine decides if it
should do one more iteration or not (or search one more move vs aborting
the iteration). If it would take such decisions purely on internal data,
like node count, it would play 100% reproducible. But most engines use the
system clock, (to not forfeit on time if the machine is also running other
tasks), and experience the timing jitter caused by other processes
running, or rotational delays of the hard disk they had been using. In
multi-threaded programs this is even worse, as the scheduling of the
threads by the OS is unpredictable. Even the position where exactly the
program is loaded in physical memory might have an effect.

But in Joker the source of indeterminism is much less subtle: it is
programmed explicitly. Joker uses the starting time of the game as the
seed of a pseudo-random-number generator, and uses the random numbers
generated with the latter as a small addition to the evaluation, in order
to lift the degeneracy of exactly identical scores, and provide a bias for
choosing the move that leads to the widest choice of equivalent positions
later.

The non-determanism is a boon, rather than a bust, as it allows you to
play several games from an identical position, and still do a meaningful
sampling of possible games, and of the decisions that lead to their
results. If one position would always lead to the same game, with the same
result (as would occur if you were playing a simple end-game with the aid
of tablebases), it would not tell you anything about the relative strength
of the armies. It would only tell you that this particular position was won
/ drawn. But noting about the millions of other positons with the same
material on the board. And the value of the material is by definition an
average over all these positions. So with deterministic play, you would be
forced to sample the initial positions, rather than using the indeterminism
of the engine to create a representative sample of positions before
anything is decided.

| In fact, to the extent that your remarks are true, they will
| support my case if my playtesting is successful that the
| unlikelihood of achieving the same outcome (i.e., wins or
| losses for one player) is extreme.
This sentence is to complicated for me to understand. 'Your case' is
that 'the unlikelyhood of achieving the same outcome is extreme'? If the
unlikelyhood is extreme, is that the same as that the likelyhood is
extreme? Is the 'unlikelyhood to be the same' the same as the
'likelyhood to be different'? What does 'extreme' mean for a
likelyhood? Extremely low or extremely high? I wonder if anything is
claimed here at all...

I think you make a mistake by seeing me as a low-quality advocate. I only
advocate minimum quantity to not make the results inconclusive.
Unfortunately, that is high, despite my best efforts to make it as low as
possible through asymmetric playtesting and playing material imbalances in
pairs (e.g. 2 Chancellors agains two Archbisops, rather than one vs one).
And that minimum quantity puts limits to the maximum quality that I can
afford with my limited means. So it would be more accurate to describe me
as a minimum-(significant)-quantity, maximum-(affordable)-quality
advocate...

H. G. Muller wrote on Sun, May 25, 2008 05:14 AM EDT:

'Do not you realize that forcing Joker80 to do otherwise must reduce its
playing strength significantly from its maximum potential?'

On the contrary, it makes it stronger. The explanation is that by adding a
random value to the evaluation, branches with very many equal end leaves
have a much larger probability to have the highest random bonus amongst
them than a branch that leads to only a single end-leaf of that same
score.

The difference can be observed most dramatically when you evaluate all
positions as zero. This makes all moves totally equivalent at any search
depth. Such a program would always play the first legal move it finds, and
would spend the whole game moving its Rook back and forth between a1 and
b1, while the opponent is eating all its other pieces. OTOH, a program
that evaluates every position as a completely random number starts to play
quite reasonable ches, once the search reaches 8-10 ply. Because it is
biased to seek out moves that lead to pre-horizon nodes that have the
largest number of legal moves, which usually are the positions where the
strongest pieces are still in its possession.

It is always possible to make the random addition so small that it only
decides between moves that would otherwise have exactly equal evaluation.
But this is not optimal, as it would then prefer a move (in the root) that
could lead (after 10 ply or so) to a position of score 53 (centiPawn),
while all other choices later in the PV would lead to -250 or worse, over
a move that could lead to 20 different positions (based on later move
choices) all evaluating as 52cP. But, as the scores were just
approximations based on finite-depth search, two moves later, when it can
look ahead further, all the end-leaf scores will change from what they
were, because those nodes are now no longer end-leaves. The 53 cP might
now be 43cP because deeper search revealed it to disappoint by 10cP. But
alas, there is no choice: the alternatives in this branch might have
changed a little too, but now all range from -200 to -300. Not much help,
whe have to settle for the 43cP... 

Had it taken the root move that keeps the option open to go to any of the
20 positions of 52cP, it would now see that their scores on deeper search
would have been spread out between 32cP and 72cP, and it could now go for
the 72cP. In other words, the investment of keeping its options open
rather than greedily commit itself to going for an uncertain, only
marginally better score, typically pays off. 

To properly weight the expected pay-back of keeping options that at the
current search depth seem inferior, it must have an idea of the typical
change of a score from one search depth to the next. And match the size of
the random eval addition to that, to make sure that even sligtly (but
insignificantly) worse end-leaves still contribute to enhancing the
probability that the branch will be chosen. Playing a game in the face of
an approximate (and thus noisy) evaluation is all about contingency
planning.

As to the probability theory, you don't seem to be able to see the math
because of the formulae...

P(hh) = 0.5*0.5 = 0.25
P(tt) = 0.5*0.5 = 0.25
______________________+
P(two equal)    = 0.5

H. G. Muller wrote on Sun, May 25, 2008 07:13 AM EDT:

Indeed, it is a stochastic way to simulate mobility evaluation. In the presence of other terms it should of course not be made so large that it dominates the total evaluation. Like explicit mobility terms should not dominate the evaluation. But its weight should not be set to zero either: properly weighted mobility might add more than 100 Elo to an engine.

Joker has no explicit mobility in its evaluation, and relies entirely on
this probabilistic mechanism to simulate it. The disadvantage is that,
because of the probabilistic nature, it is not 100% guaranteed to always
take the best decision. On rare occasions the single acceptable end leave
does draw a higher random bonus than one-hundred slightly better positions
in another branch. OTOH it is extremely cheap to implement, while explicit
mobility is very expensive. As a result, I might gain an extra ply in
search depth. And then it becomes superior to explicit mobility, as it
only counts tactically sound moves, rather than just every move. So it is
like safe mobility verified by a full Quiescence Search.

In my assesment, the probabilistic mobility adds more strength to Joker
than changing the Rook value by 50cP would add or subtract. This can be
easily verified by play-testing. It is possible to switch this evaluation
term off. In fact, you have to switch it on, but WinBoard does this by
default. To prevent it from being switched on, one should run WinBoard
with the command-line option /firstInitString='new'. (The default
setting is 'new\nrandom'. If Joker is running as second engine, you
will of course have to use /secondInitString='new'.)

H. G. Muller wrote on Sun, May 25, 2008 11:07 AM EDT:

I would have thought that 'twice the same flip in a row' was pretty
unambiguous, especially in combination with the remark about two-sided
testing. But let's not quibble about the wording.

The point was that for two-sided testing, if you suspect a coin to be
loaded, but have no idea if it is loaded to produce tail or heads, thw two
flips tell you exactly nothing. They are either the same or different, and
on an unbiased coin that would occur with equal probability. So the
'confidence' of any conclusion as to the fairness of the coin drawn from
the two flips would be only 50%. I.e. not better than totally random, you
might as well have guessed if it was fair or not without flipping it at
all. That would also have given you a 50% chance of guessing correct.

H. G. Muller wrote on Mon, May 26, 2008 04:09 AM EDT:

Derek: 'I hope you can handle constructive advice.'

It gives me a big laugh, that's for sure.

Of course none of what you say is even remotely true. That is what happens
if you jump to conclusions regarding complex matters you are not
knowledgeable about, without even taking the trouble to verify your ideas.

Of course I extensively TESTED how the playing strength of Joker80, (and
all available other engines), varied as a function of time control. This
was the purpose of several elaborate time-odds tournament I conducted,
where various versions of most engines participated that had to play their
games in 36, 12, 4, 1:30, 0:40 or 0:24 min, where handicapped engines were meeting non-handicapped ones in a full round robin. (I.e. the handicaps were factors 3, 9, 24, 54 or 90, where only the strongest engines were handicapped upto the very maximum, and the weakest only participated in an unhandicapped version).

And of course Joker80 behaves similar to any Shannon-type engine that is reasonably free of bugs: its playing strength measured in Elo monotonically increases in a logarithmic fashion, approximately to the formula rating = 100*ln(time). So Joker80 at 5 min/move crushes Joker80 at 1 sec per move, as you could have easily found out for yourself. So that much for your nonsense about Joker80 failing to improve its move quality with time. For some discussion on one of the tournaments, see:

http://www.talkchess.com/forum/viewtopic.php?t=19764&postdays=0&postorder=asc&topic_view=flat&start=34

At that time Fairy-Max still had a hash-table bug that made it hang (and
subsequently forfeit on time) that was striking at a fixed rate per
second, so that Fairy-Max started to forfeit more and more games at longer
TC. Since then the bug has been identified and repaired, and now also
Fairy-Max performs progressively better at longer TC.

So nice try, but next time better save your breath for telling the surgeon
how to do his job before he will perform open heart surgery on you. Because
he has no doubt much more to learn from you regarding cardiology than I
have in the area of building Chess engines...

Things are as they are, and can become known by observation and testing.
Believing in misconceptions born out of ignorance is not really helpful.
Or, more explicitly: if you think you know how to build better Chess
engines than other people, by all means, do so. It will be fun to confront
your ideas with reality. In the mean time I will continue to build them as
I think best, (and know is best, through extensive testing), so you should have every chance to surpass them. Lacking that, you could at least _use_ the engines of others to check out if your theories of how they behave have any reality value. You don't have to depend on the time-odds tourneys and other tests I conduct. You might not even be aware of them, as the developers of Chess engines hardly ever publish the thousands of games they do for testing if their ideas work in practice.

H. G. Muller wrote on Mon, May 26, 2008 12:47 PM EDT:

Derek Nalls:
| Nonetheless, completing games of CRC (where a long, close, 
| well-played game can require more than 80 moves per player) 
| in 0:24 minutes - 36 minutes does NOT qualify as long or even, 
| moderate time controls.  In the case of your longest 36-minute games, 
| with an example total of 160 moves, that allows just 13.5 seconds per 
| move per player.  In fact, that is an extremely short time by any 
| serious standards.  

In my experience most games on the average take only 60 moves (perhaps
because of the large strength difference of the players). As early moves
are more important for the game result as late moves (even the best moves
late in the game do not help you if your position is already lost), most
engines use 2.5% of the remaining time for their next move (on average,
depending on how the iterations end compared to the target time). That
would be nearly 54 sec/move at 36 min/game in the decisive phase of the
game. That is more than you thought, but admittedly still fast. Note,
however, that I also played 60-min games in the General Championship
(without time odds), and that Joker80 confirms its lead over the
competitors it manifested at faster time controls.

But I don't see the point: Joker80's strength increases with time as
expected, in the range from 0.4 sec to 36 sec per move, in a regular and
theoretically expected way. This is over the entire range where I tested
the dependence of the scoring percentage of various material imbalances,
which extended to only 15 sec/move, and found it to be independent of TC.
So your 'explanation' for the latter phenomenon is just nonsense. The
effect you mention is observed NOT to occur, and thus cannot explain
anything that was observed to occur.

Now if you want to conjecture that this will all miraculously become very
different at longer TC, you are welcome to test it and show us convincing
results. I am not going to waste my computer time on such a wild and
expensive goose chase. Because from the way I know the engines work, I
know that they are 'scalable': their performance at 10 ply results from
one ply being put in front of 9-ply search trees. And that extra ply will
always help. If they have good 9-ply trees, they will have even better
10-ply trees. But you don't have to take my word for it. You have the
engine, and if you don't want to believe that at 1 hour per move you will
get the same win probability as at 1 sec/move, or that at 1 hour per move
it won't beat 10 min//move, just play the games, and you will see for
yourself. It would even be appreciated if you publish the games here or on
your website. But, needless to say, one or two games won't convince anyone
of anything.

| 'since I am not a computer chess programmer, I cannot possibly 
| know what I am talking about when I dare criticize an important 
| working of your Joker80 program'
Well, you certainly make it appear that way. As, despite the elaborate
explanation I gave of why programs derive extra strength from this
technique, you still draw a conclusion that in practice was already shown
to be 100% wrong earlier. And if you think you will run into the problem
you imagine at enormously longer TC, well, very simple: don't use
Joker80, but use some other engine. You are on your own there, as I am not
specifically interested in extremely long TC. There is always a risk in
using equipment outside the range of conditions for which it was designed
and tested, and that risk is entirely yours. So better tread carefully,
and make sure you rule out the percieved dangers by concise testing.

| You must decide upon and define the primary function of your 
| Joker80 program.

I do not see the dilemma you sketch. The purpose is to play ON AVERAGE the
best possible move. If you do that, you have the best chance to win the
game. If I can achieve that through a non-deterministic algorithm better
than through a deterministic one, I go for the nondeterministic method.
That it also diversifies play, and makes me less sensitive to prepared
openings from the opponent, is a win/win situation. Not a compromise.

As I explained, it is very easy to switch this feature off. But you should
be prepared for significant loss of strength if you do that.

H. G. Muller wrote on Mon, May 26, 2008 06:10 PM EDT:

| I just cannot understand how any rational, intelligent man could 
| believe that introducing chaos (i.e., randomness) is beneficial
| (instead of detrimental) to achieving a goal defined in terms of 
| filtering-out disorder to pinpoint order.

It would be very educational then to get yourself acquainted with the
current state of the art of Go programming, where Monte-Carlo techniques
are the most successful paradigm to date...

| When you reduce the power of your algorithm in any way to 
| filter-out inferior moves, you thereby reduce the average 
| quality of the moves chosen and consequently, you reduce 
| the playing strength of your program- esp. at long time controls.  

Exactly. This is why I _enhance_ the power of my algorithm to filter out
inferior moves. As the inferior moves have a smaller probability to draw a
large positive random bonus than the better moves. They thus have a lower
probability to be chosen, which enhances the average quality of the moves,
and thus playing strength. At any time control.

It is a pity this suppression of inferior moves is only probabilistic, and
some inferior moves by sheer luck can still penetrate the filter. But I
know of no deterministic way to achieve the same thing. So something ais
better as nothing, and I settle for the inferior moves only getting a
lower chance to pass. Even if it is not a zero chance, it is still better
than letting them pass unimpededly.

| In any event, the addition of the completely-unnecessary module of 
| code used to create the randomization effect within Joker80 that 
| you desire irrefutably makes your program larger, more complicated 
| and slower.  Can that be a good thing?

Everything you put into a Chess engine makes it larger and slower. Yet,
taking almost everything out, only leaves you with a weak engine like
micro-Max 1.6. The point is that putting code in also can make the engine
smarter, improve its strategic understanding, reduce its branching ratio,
etc. So if it is a good thing or not does not depend on if it makes the
engine larger, motre complicated, or slower. It depends on if the engine
still fits in the available memory, and from there produces better moves
in the same time. Which larger, more complicated and slower engines often
do. As always, testing is the bottom line.

Actually the 'module of code' consists only of only 6 instructions, as I
derive the pseudo-random number from the hashKey.

But the point you are missing is this: I have theoretical understanding of
how Chess engines work, and therefore are able to extrapolate their
behavior with high confidence from what I observe under other conditions
(i.e. at fast TC). Just like I don't have to travel to the Moon and back
to know its distance from the Earth, because I understand geometry and
triangulation. So I know that if including a certain evaluation term gives
me more accurate scores (and thus more reliable selection of the best move)
from 8-ply search trees, I know that this can only give better moves from
18-ply search trees. As the latter is nothing but millions of 8-ply search
trees grafted on the tips of a mathematically exact 10-ply minimax
propagation of the score from the 8-ply trees towards the root. 

Anyway, it is not of any interest to me to throw months of valuable CPU
time to answer questions I already know the answer to.

H. G. Muller wrote on Tue, May 27, 2008 02:14 AM EDT:

Derek:
| The moral of the story is that randomization of move selection 
| reduces  the growth in playing strength that normally occurs with 
| time and plies completed.

This is not how it works. For one, you assume that at long TC there would
be fewer moves to chose from, and they would be farther apart in score.
This is not the case. The average distribution of move scores in the root
depends on the position, not on search depth.

And in cases were the scores of the best and second-best move are far
apart, the random component of the score propagating from the end-leaves
to the root is limited to some maximum value, and thus could never cause
the second-best move to be preferred over the best move. The mechanism can
only have any effect on moves that would score nearly equal (within the
range of the maximum addition) in absence of the randomization.

For moves that are close enough in score to have an effect on, the random
contribution in the end-leaves will be filtered by minimax while trickling
down to the root in such a way that it is no longer a homogeneously
distributed random contribution to the root score, but on average
suppresses scores of moves leading to sub-trees where the opponent had a
lot of playable options, and we only few, while on average increasing
scores where we have many options, and the opponent only few. And the
latter are exactly the moves that, in the long term, will lead you to
positions of the highest score.

H. G. Muller wrote on Tue, May 27, 2008 01:13 PM EDT:

No engine I know of prunes in the root, in any iteration. They might reduce
the depth of poor moves compared to that of the best move by one ply, but
they will be searched in every iteration except a very early one (where
they were classified as poor) to a larger depth then they were ever
searched before. So at any time their score can recover, and if it does,
they are re-searched within the same iteration at the un-reduced depth.

This is absolutely standard, and also how Joker80 works. Selective search,
in engines that do it, is applied only very deep in the tree. Never close
to the root.

10x8 Variants. Missing description[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, May 29, 2008 04:26 AM EDT:

It is a bit misleading to list Capablanca Random Chess (and its Modern variant) here with a fixed array. It would have been more logical to depict an empty board, with the pieces next to it... For completeness, it should at least have mentioned what the restrictions for setting up the pieces are.

Note that most engines able to play Scharnagl's CRC are also capable of playing opening setups he explicitly excludes from being CRC, i.e. with undefended Pawns, with Bishops next to each other, or with Q and A on like colors. They in general consider this all the same variant, 'Capablanca Random Chess', as opening arrays in the program logic are not part of the variant definition, but are simply set by loading a FEN. CRC in some programs is considered a different variant from Capablanca, dure to the different castling rules (like FRC is considered a distinct variant from normal FIDE Chess).

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sun, Jun 1, 2008 01:24 PM EDT:

George Duke:
| However, the reality is if one is playing many CVs, precisely 
| Number One, not any of the other 3, is far and away the most valuable
| and reliable tool, effectively building on experience. Time is also
| factor, and unless Player can adjust quickly, without extensive
| playtesting, and make ballpark estimates of values, all is lost on 
| new enterprise. We recommend just this Method One, increasing
| facility at it, for serious CV play, and in turn the designer
| needs to try to keep the game somewhat out of reach for Computer.

Well, I guess that it depends on what your standards are. If you are
satisfied with values that are sometimes off by 2 full Pawns, (as the case
of the Archbishop demonstrates to be possible), I guess method #1 will do
fine for you. But, as 2 Pawns is almost a totally winning advantage, my
standards are a bit higher than that. If I build an engine for a CV, I
don't want it to strive for trades that are immediately losing.

H. G. Muller wrote on Tue, Jun 17, 2008 02:33 AM EDT:

Derek:
| Could you please give me example lines within the 'winboard.ini' 
| file that would successfully do so?  I need to make sure every 
| character is correct.

Sorry for the late response; I was on holiday for the past two weeks. The
best way to do it is probably to make the option dependent on the engine
selection. That means you have to write it behind the engine name in the
list of pre-selectable engines like:

/firstChessProgramNames={...
'C:/engines/joker/joker80.exe 23' /firstInitString='new\n'
...
}

And something similar for the second engine, using /secondInitString. The
path name of the joker80 executable would of course have to be where you
installed it on your computer; the argument '23' sets the hash-table
size. you could add other arguments, e.g. for setting the piece values,
there as well. Note the executable name and all engine argument are
enclosed by the first set of quotes (which are double quotes, but these
for some reason refuse to  print in this forum), and everything after this
first syntactical unit on the line is interpreted as WinBoard arguments
that should be used with this engine when it gets selected. Note that
string arguments are C-style strings, enclosed in double quotes, and
making use of escape sequences like '\n' for newline. The defauld value
for the init strings is 'new\nrandom\n'.

H. G. Muller wrote on Tue, Jun 17, 2008 02:51 AM EDT:

George Duke:
| Has initial array positioning already entered discussion for
| value determinations?

No, it hasn't, and I don't think it should, as this discussion is about
Piece Values, and not about positional play. Piece values are by
definition averages over all positions, and thus independent on the
placement of pieces on the board.

Note furthermore that the heuristic of evaluation is only useful for
strategic characteristics of a position, i.e. characteristics that tend to
be persistent, rather than volatile. Piece placement can be such a trait,
but not always. In particular, in the opening phase, pieces are not locked
in the places they start, but can find plenty better places to migrate to,
as the center of the board is still complete no-man's land. Therefore, in
the opening phase, the concept of 'tempo' becomes important: if you waste
too much time, the opponent gets the chance to conquer space, and prevent
your pieces that were badly positioned in the array to properly develop.

I did some asymmetric playtesting for positional values in normal Chess, swapping Knights and Bishops for one side, or Knights and Rooks. I was not able to detect any systematic advantage the engines might have been deriving from this. In my piece value testing I eliminate positionsal influences by playing from positions that are as symmetric as possible given the material imbalance. And the effect of starting the pieces involved in the imbalance in different places is averaged out by playing from shuffled arrays, so that each piece is tried in many different locations.

H. G. Muller wrote on Tue, Jun 17, 2008 01:27 PM EDT:

Well, never mind. The symmetrical playtesting would not have given any
conclusive results with anything less than 2000 games anyway.

The asymmetrical playtesting sounds more interesting. I am not completely
sure what Smirf bug you are talking about, but in the Battle of the Goths
Championship it happened that Smirf played a totally random move when it
could give mate in 3 (IIRC) according to both programs (Fairy-Max was the
lucky opponent). This move blundered away the Queen with which Smirf was
supposed to mate, after which Fairy-Max had no trouble winning with an
Archbishop agains some five Pawns. 

This seems to happen when Smirf has seen the mate, and stored the tree
leading to it completely in its hash table. It is then no longer
searching, and it reports score and depth zero, playing the stored moves
(at least, that was the intention).

I have never seen any such behavior when Smirf was reporting non-zero
search depth, and in particular, the last non-zero-depth score before such
an occurence (a mate score) seemed to be correct. So I don't think there
is much chance of an error when you believe the mate announce,emt and call
the game.

Of course you could also use Joker80 or TJchess10x8, which do not suffer
from such problems.

H. G. Muller wrote on Tue, Jun 17, 2008 05:40 PM EDT:

| However, TJChess cannot handle my favorite CRC opening setup, 
| Embassy Chess, without issuing false 'illegal move' warnings and 
| stopping the game.

Remarkable. I played this opening setup too, in Battle of the Goths, and
never noticed any problems with TJchess. It might have been another
version, though.

If you have somehow saved the game, be sure to send it to Tony, so he can
fix the problem.

H. G. Muller wrote on Wed, Jun 18, 2008 03:24 AM EDT:

OK, I see the problem now. I forgot that the Embassy array is a mirrored
one, with the King starting on e1, rather than f1. And that to avoid any
problems with it in Battle of the Goths, I did not really play Embassy,
but the fully equivalent mirrored Embassy. And with that one, none of the
engines had problems, of course.

Actually it seems that it is not TJchess that is in error here: e1b1 does
seem a legal castling in Embassy. It is WinBoard_F which unjustly rejects
the move. Most likely because of the FEN reader ignoring specified
castling rights for which it does not find a King on f1 and a Rook in the
indicated corner.

The fact that you don't have this problem with Joker80 is because Joker80
is buggy. (Well, not really; it is merely outside its specs. Joker80
considers all castlings with a non-corner Rook and King not in the f-file
as CRC castlings, which are only allowed in variant caparandom, but not in
variants capablanca or *UNSPEAKABLE*. And Joker80 does not support
caparandom yet.) So the fact that you don't see any problems with Joker80
is because it will never castle when you feed it the Embassy setup, so that
WinBoard doesn't get the chance to reject the castling as illegal. And if
the opponent castles, WinBoard would reject it as illegal, and not pass it
on to Joker80.

I guess the fundamental fix will have to wait until I implement variant
caparandom in WinBoard; I think that both WinBoard and Joker80 are correct
in identifying the Embassy opening position as not belonging to Capablanca
Chess, but needing the CRC extension of castling. (Even if it is only a
limited extension, as the Rooks are still in thre corner.) And after I fix
it in WinBoard, I still would have to equip Joker80 with CRC capability
before you could use it to play the Embassy setup.

It is not very high on my list of priorities, though, as I see little
reason to play Embassy rather than mirrored Embassy.

Grotesque Chess. A variant of Capablanca's Chess with no unprotected Pawns. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, Jun 19, 2008 06:52 AM EDT:

I am still contemplating how to generalize the castling in Joker80. There are two issues there: how to commnicate the move from and to the GUI, and how to indicate the existence of the rights. Currently WinBoard protocol has two mechanisms to set up a position: by loading the engine with a FEN, or (obsolete) through an edit command to enter a list of pieces+squares combinations. The latter mode does not support transmission of castling rights at all, and is only a legacy for backward compatibility with old engines. So for loading position, we only have to provide a mechanism for indicating castling rights in a FEN.

The FRC-type notation only indicates the position of the Rook. The King does not need to be indicated in games where there is only one King, and the positioning of Rook w.r.t. King implies where both will end up. This means we would have to devise some other notation for cases where the King ends elsewhere. I am not sure if it would make sense to generalize so much as to allow castlings where the Rook does not end up next to, and on the other side of the King. There is of course no limit to the craziness of moves that could be called a castling, but one would have to put a limit somewhere, to not fall victim to the 'maximum-flexibility, minimum-usefulness' principle.

I would probably implement it like this: in the castling-rights field of the FEN, the letter indicating the file of the Rook that can castle (which does not necessarily have to be an orthodox Rook, as the FEN makes it obvious what piece is standing there) can be followed by a digit, indicating the number of squares the King ends up away from the corner. The final position of the Rook would be implied by this.

Example: normal King-side castling rights could be indicated by H1. The 1 would be the default (on an 8x8 board), and could be omitted for upward compatibility with Shredder-FEN. In Capablanca Chess the opening would have castling rights A2J1a2j1, equivalent to AJaj (or KQkq). Symmetric castling rights like in Janus Chess would be indicated as A1J1a1j1, or A1Ja1j when deleting the redundant defaults. Multiple castling rights to the same side could exist next to each other: A2A1J2J1a2a1j2j1 would allow short as well as long castling in both directions.

For transmitting the castling moves, one could use King captures own Rook. In games where the same Rook could be used for castlings with multiple King destinations, one could give the King step to its final destination in stead. If this could also be a normal King move, one could append an r as 5th character to identify it as a castling, using the syntax that would otherwise be used for promotions. In PGN one could use similar strategies to indicate non-orthodox castlings, and use suffix =R on a King move to specify castling.

I think this covers most cases encountered in practice. Problems only occur only if there would be multiple castlings with the same Rook, and at the same time castlings with a Rook on the left would have the same King destination as those with a Rook on the right. Because the move notation cannot indicate at the same time which Rook to use and specify where the King should go. But this seems to outlandish to worry about.

To cover cases where K and R do not end up next to each other, we could put a second digit in the FEN castling-rights specifier for the final position of the Rook wrt the corner. (I.e. normal king-side castling = h12.) This obviously could lead to problems on very wide boards, that require multiple digits to specify distance to the corner. So perhaps it is better to separate King and Rook destination by a period (h1.2). Indicating the move would be a problem, as two destinations might have to be specified to unambiguously identify the move (e.g. if all castlings are allowed weher the King steps any number of squares >=2 towards a Rook, and then the Rook can go to any square the King passed over.) One could just specify King and Rook final squares (i.e. O-O = g1f1), but in FRC there is no guarantee that this cannot be a normal move. In which case the 'r' could again be used as 5th character, to indicate castling. In PGN we could reserve a character used in stead of the piece indicator for castlings, say 'O'.

Conclusion: it is difficult to design a notation that would be general and universal; different games seem to need different ways to specify the moves and rights.

H. G. Muller wrote on Thu, Jun 19, 2008 09:35 AM EDT:

Well, one has to think ahead a little bit to keep the road to future extensions open, and not paint oneself into a corner. This is why I tackle a fairly large number of cases at once.

I don't see the unicity of the FEN strings as a serious problem; if the logic behind the various systems would allow a certain castling to be described in multiple ways, one can supply an additional rule to specify which method should be used preferentially. e.g. if K or H could be used to unambiguously specify king-side castling, one should use K. In the FEN reader I would not even pay attention to that, and have it understand both, as this is usually easier.

An important issue is how much effort one should put into upkeeping a unified approach, in which both game state and played variant are unambiguously specified by the FEN. One might wonder if it is sensible to require, say, that a position from Janus Chess and a position from Capablanca Chess should be considered as different positions from the same variant, 'fullchess'. This puts a lot of extra burdon on the FEN:

For indicating game state, the castling rights have to indicate only which pieces moved. Wanting the FEN to specify the castling method, or other aspects of the rules (e.g. if Pawns can promote to Chancellors, or not), might just be asking for trouble.

So perhaps I was overdoing it. It might be more useful to consider variants like Janus or Grotesque as distinct from Capablanca. KQkq could then be used to indicate castling rights in all three cases. Games with more than 2 Rooks could use the Shredder-FEN system without any problem, as long as there is only one King (so that all rights disappear once this King moves). Only in games with multiple Kings AND multiple Rooks there would be a problem.

This only leaves move notation. In particular in variants where a castling to a particular side can be performed in more than one way, like in Grotesque. A very general way to solve this in PGN would be to provide a mechanism to specify moves that displace more than one piece, by joining the moves with an &. So an alternative to write h-side castling in Grotesque could be Ke1-i1&Rj1-h1 (or in short, Ki1&Rh1).

In WinBoard protocol, the moves between engine and GUI are not transmitted in SAN, but simply as FROM and TO square appended to each other, with an optional 5th character to indicate promotion piece (e.g. e7e8q). Perhaps the best system there would be to encode variable castlings by using k or q as the 'promotion' character, to indicate if the K-side or Q-side Rook is to be used, and make the squares indicate the to-square of King and Rook, respectively. These notations would always be recognizable as not indicating promotions, as both the mentioned squares would be on the same rank.

H. G. Muller wrote on Thu, Jun 19, 2008 11:31 AM EDT:

'If the effort isn't too big' is a big if. Normal chess, Capablanca, FRC and CRC are similar enough not to cause too much trouble. Although I consider it already a nasty trait that some of the rules have to be implied by the board size, such as to which pieces a Pawn may promote. If a board width of 10 is taken to imply Chancellor and Archbishop are allowed, the problems with Janus Chess or Chancellor Chess I would consider already pretty bad. To unify those with Capablanca/CRC would require different letters for their Pawns. In Janus Chess you would have to indicate the deviating castling mode amongst the rights.

In the O-i-h system you would always be able to deduce if the K-side or Q-side rook is to be used by the ordering of the King and Rook destination? I guess we could indeed consider it a defining property of a castling that it swaps the order of King and Rook. I am not aware of any exceptions to this, even in FRC/CRC the King is required to be between the two Rooks. So I guess your system is acceptable for in the PGN, with the additional preference rule that if there is only one castling possible to the given side, it would be written as O-O or O-O-O.

The way the move is transmitted between engine and GUI in WB protocol is a matter specific to the WinBoard GUI. And WinBoard does generate the list of allowed moves itself, there is no way in WB protocol to request it from the engine. As this type of castling with multiple King and Rook destinations is about as crazy as they get, anticipating this format would probably enough to cover everything. Even the normal castling requires the GUI to recognize castlings, and know which Rook to move and where. (This caused problems when I had Fairy-Max play Cylinder Chess, as a King crossing the side edge was considered a castling, and led to the displacement of a second piece on the display board!) In fact, with the assumption that the relative orientation of King and Rook destination squares implies which Rook has to be used (and even if there are several Rooks on that side, only the one nearest to the King could be involved in castling), there is no need to convey any information in the 5th character other than that it is a castling. So an O here would be quite convenient, as promotion pieces have to be lower case in WB protocol.

For a really dumb interface (like my battle-of-the-Goth javascript viewer) it is necessary to fully specify from- and to-square of each piece that is moved separately. So there I transmit O-O as e1g1h1f1 and e4xf5 e.p. as e4f5e4f4.

H. G. Muller wrote on Thu, Jun 19, 2008 01:15 PM EDT:

'I regard this to be a very big weakness, ...'

It is not that bad, if you realize that there is a different mechanism in WB protocol to achieve the same thing: by sending a move to the engine, you implicitly ask the engine if this move is valid. If it isn't, the engine ignores it, and sends an 'Illegal move' message to the GUI. The GUI then undoes the move on the display, and relays the illegal-move message to the user, or engine opponent. As most engines are not considering the possibility that their moves might be illegal, the game then usally hangs, however. But the GUI doesn't have to know about the rules in this case.

There is a big advantage of having the GUI understand the game, though, when you cannot rely 100% on the engines. Then the GUI itself can judge the legality of the moves sent to it, make the engine that does the illegal move forfeit (you don't want to give engines the power to make their opponent forfeit...), and forfeit engines that do make false illegal-move claims. And in WinBoard_F this functionality can be switched on optionally in many variants. The other variants can still be played, with legality checking off.

| if there is only ONE castling move possible at all, it will be
| matched by ANY string starting with 'O'.

Why would you want that? It still would not help to understand all games of variants that interchange the O-O and O-O-O notation.

I am not sure why you would want to allow translation in PGN, but not in FEN. In WinBoard I added the external option (/pieceToCharTable) to alter the piece indicators. These would then be used consistently in both FEN and PGN, without any internal tags. The most common case where this is needed is for reading PGN files from a different language. The problem with the tags you propose, is that they still would have to single out one language as 'standard', and other languages will never agree with that. So I think it is better to make that specification external. After experience with some less-known variants (e.g. Knightmate) I realized that this translation should be independently settable for the external interface (saving and loading FEN and PGN), and each engine. This to play engines using different 'standards' against each other. It might even be desirable to have a separate translation table for external reading and writing, so that the GUI could be used to convert PGN files in one language to another one.

I don't think the J-system you use for Janus Chess is acceptable: it only works for opening positions. It should be possible to play from arbitrary positions specified by FEN. And you have no way to indicate that Pawns cannot promote to C in positions where there is no A/Janus on the board. The only logical solution would be to use the J for Janus Pawns, different from Capablanca Pawns by not being able to promote to C. But for Chancellor-Chess positions without a Chancellor, you would then need yet another letter for 'Chancellor Pawns'. And this is what I don't like at all. In Seirawan Chess you would need Capablanca Pawns on an 8x8 board, different from normal Pawns, so you would also need different indicators for Pawns in normal and Capablanca. Or use different letters for Capablanca Pawns on 8x8 and 10x8 boards, to preserve compatibility with existing 10x8 FENs. It quickly prolifirates, and becomes very awkward... Much better to just consider them different variants.

Wazir. Moves one square orthogonally.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, Jun 26, 2008 02:18 AM EDT:

I thought the Arabic Queen (i.e. in Shatranj) was called Ferz (=general), not Wazir (= Grandvizer).

In WinBoard_F I took the turban as the symbol for representing a Wazir. This based on the observation that the standard symbols for the pieces of FIDE Chess are mostly head covers symbolizing the profession of the piece. (Animals and buildings are necessary exceptions to this.) And a turban seemed fitting for a Grandvizer.

For the Ferz WinBoard_F uses a Chinese mandarin cap. This because the Mandarins / Ministers / Advisors of Xiangqi are basically Ferzes that are not allowed to leave the palace. 

WinBoard_F uses these Ferz and Wazir symbols also in Shogi, for Silver and Gold, respectively. The latter can be seen as Ferz and Wazir augmented with one or two forward moves. (In Shogi all pieces move just a bit different, also the Knight and Pawn, which are represented by their standard symbol.)

[Subject Thread] [Add Response]

H. G. Muller wrote on Thu, Jun 26, 2008 02:40 AM EDT:

| If 10x8 (10 wide, 8 deep) which is what I assume is the subject 
| is dead, it is because you can't find the board anywhere.  

This is not true at all. 10x8 boards and Chess sets are even sold
commercially, and a lot of people play 10x8 variants. There are even
internet servers dedicated to it. It is just that it is not allowed on
this forum to mention where. This has to do more with being brain dead,
than with the game being dead, though.

Where I live, virtually every Chess board has a 10x10 board on the back
(for playing draughts). For instance, I have a very nice one where the
squares are wood inlays of light and dark wood. If I want to play a 10x8
game, I can simply cover the two back ranks by a piece of cardboard or
clip a small wooden plank over it.

Falcon Chess. Game on an 8x10 board with a new piece: The Falcon. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Fri, Jun 27, 2008 06:19 AM EDT:Poor ★

I think this page does a very poor job in describing Falcon Chess compared to the compact description other CVs get on these pages. And this for addition of only a single new piece, for which the move rules could have been described (within the context of what can be supposed common background knowledge for visitors of these pages) with the in a single sentence:

'The Falcon is a lame (1,3)+(2,3) compound leaper, which follows any of the three shortest paths to its desination consisting of orthogonal and diagonal steps, which can be blocked on any square it has to pass over to reach its destination.'

That, plus possibly a diagram of the Falcon moves and a diagram of the array should have been sufficient. As it is now, I could not even find the rules for promotion amongst the landslide of superfluous description.

Note that my rating only applies to the page, not to the game. I haven't formed an opinion on that yet, it could be the greatest game in the World for all I know.

I have a question, though:

What exactly does the patent cover? As a layman in the field of law, I associate patents with material object which I cannot manufacture and sell without a license. Rules for a Chess variant are not objects, though. So which of the following actions would be considered infringements on the Falcon patent, if performed without licensing:

1) I play a game of Falcon Chess at home
2) I publish on the internet the PGN of a Falcon Chess game I played at home
3) I write a computer program that plays Falcon Chess, and let it play in my home
4) I publish on the internet the games this program played
5) I conduct a Falcon Chess tournament with this engine in various incarnations as participant, and make it available for life viewing on the internet
6) I post my Falcon-Chess capable engine for free download on my website
7) I post the source code of that engine for free download on my website
8) I sell the engine as an executable file
9) I sell a staunton-style piece set with 10 Pawns, orthodox Chess men, and two additional, bird-like pieces
10) I sell a set of small wooden statues, looking like owls, falcons, elephants and lions, plus some staunton-style pawns, plus a 10x8 board.
?????????

And more specifically: would it require a license to equip my engine Joker80 to play Falcon Chess (next to Janus, Capablanca and CRC) and post it on the internet for free download? If so, could such a license be granted, and what would be the conditions?

H. G. Muller wrote on Fri, Jun 27, 2008 09:27 AM EDT:

On a more chessic note:

Why are you saying the Falcon does not have mating potential? I ran a tablebase for the Bison (a non-lame (1,3)+(2,3) compound leaper), and the KBiK ending on 8x8 is generally won (100.00% with wtm, 80% with btm including King captures, longest mate 27 moves). I think it should make no difference that the Falcon, unlike the Bison, is lame: to block any Falcon move, at least 2 obstacles are needed, and this is very unlikely to ever occur with only two other pieces (the Kings) on the board. In the mating sequence I looked at, the Bison is mainly shutting in the bare King from open space, the attacking King closing off another direction.

I also cannot imagine that expanding the board size from 8x8 to 10x8 would make any difference. Usually it is the narrowest dimension that counts. So I really think King+Falcon vs King is a totally won end-game on 10x8, although I could not exactly say in how many moves.

H. G. Muller wrote on Fri, Jun 27, 2008 05:08 PM EDT:

Incredible! After four posts of extremely verbose and incoherent ranting you managed to address exactly zero of my questions / issues.

So let me repeat the most important ones:

1) Am I allowed to include Falcon Chess as a variant that Joker80 can play, and offer it for free download?
2) To which pieces can a Pawn promote in this game?
3) Does, according to you, a single Falcon have mating potential against a bare King on a 10x8 board? And on 8x8?

Note that the fact that this page is a copy of a patent application, which by necessity has to be elaborate, is in no way an excuse. No one forces you to publish the full patent application here. In fact patent applications are utterly unsuitable as contents on chessvariants.com. They are meant for lawyers.

H. G. Muller wrote on Fri, Jun 27, 2008 06:57 PM EDT:

You talk a lot, but you say very little. I have no idea what Game Courier is, and I see no reason why anything that should be said between us cannot be said here. If you see this CV-page as advertizement for your patented game, you would do well to declare your licensing policy here. That would be much more useful than describing the excruciating detail, and boasting how many variants the patent covers. The latter just scares people away from the variant.

But you made it clear you don't want me to make an engine to play your game. Well, so be it. There are plenty of other variants that are not patented. Even the patented UNSPEAKABLE variant does allow me to implement the game in an engine. But if you want to use your patent to prevent anyone can play the game, it is up to you...

I am not sure what better place there could be to discuss the KFaK end-game than here, or why the mating potential of a piece that (due to the patent) can only occur in this variant would be 'of lesser interest'. What do you think the CV pages are for, really? To talk about Chess, or to talk about patents????

[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, Jun 28, 2008 03:31 AM EDT:

| Who would waste time on Centaur(BN) and Champion(RN) anymore? No one 
| is interested. Knight was not meant to be compounded but must always 
| stand alone. 

I am in total disagreement. The Archbishop (BN) is one of the most elegant
and agile pieces ever designed. It is simply marvelous to see it in action,
dazziling the opponent. To do justice to its play, this piece should be
renamed 'Dancer'.

Schoolbook.

8x10 chess with the rook + knight and bishop + knight pieces added. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Sat, Jun 28, 2008 07:02 AM EDT:

Since there is also a table of piece values on this page, I should point out that playtesting with almost any program shows that the Archbishop values given here are way too low: A+P typically beat Q, and A+A+P beats C+C, in any game phase. See the discussion on the page of th Aberg variant.

Derek Nalls in the mean time revised his piece values accordingly.

H. G. Muller wrote on Sat, Jun 28, 2008 03:06 PM EDT:

The logic of FRC castling is that the outcome of the castling in a shuffled variant will be the 'normal' location of K and R, i.e. the one they get by performing normal castling from an unshuffled variant. This could similarly be applied to shuffle variants of games with free castling. Just pick any of the final positions that the castling type with centralized King and corner Rooks could give.

H. G. Muller wrote on Mon, Jun 30, 2008 01:17 PM EDT:

The table was actually given in my first post in that thread. It might be burried very deep under what unfortunately followed. I am sorry about that, it arose from Derek taking it as a personal offense when I pointed out that the values he derived from elaborate theoretical arguments were no good in practice. So let me repeat the table:

P =  85
N = 300
B = 350 (B-pair bonus = 40)
R = 475
A = 875
C = 900
Q = 950 

I usually normalize on the Q value, as Pawns come in many forms (doubled, isolated, backward, passed, doubled, edge), with extremely different values. So giving a value for the Pawn wouldn't mean a thing if you don't tell at the same time which kind of Pawn. All values above are opening values, where the Pawn is f2/f7 in the opening array.

The values were empirically derived from playing 20,000 games starting from opening setups where selected pieces were deleted from the array to create a material imbalance.

Rooks are known to be orth a lot more in end-games than in the early opening, so the Rook value might be higher than given here during most of the game.

Falcon Chess. Game on an 8x10 board with a new piece: The Falcon. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Mon, Jun 30, 2008 02:12 PM EDT:

| Just as Greg Strong was about to finish Falcon Chess for ChessV, 
| it is fine to put Falcon in engine free of charge throughout years 
| 2008, 2009 and 2010 to play, so long as strictly not commercial 
| (unlike standards-degrading Zillions). Please inform what is going on, 
| and put the patent #5690334 two or more times about the Rules or 
| Board, since ultimately we would like to market Falcon material too. 

OK, I will see what I can do. I will let you know as soon as I made something, and send it to you privately, so that you can judge if it meats your standards.

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Wed, Jul 2, 2008 03:51 AM EDT:

Sam Trenholme:
| I think the best way to come up with reasonable piece values is
| to have a computer program play itself hundreds or thousands of
| games of a given chess variant, and use genetic selection (evolution)
| to choose the version of the program with piece values that win the
| most games.

This has been tried many times before (in normal Chess, mainly), with an
appalling lack of success. The reason is that even a very wrong evaluation
of one of the pieces (say a program that values a Queen at 7.5 in stead of
a correct 9.5) still only leads to bad trades in a minority of the cases, like 10-20%. This because it does require complex exchanges (like Q vs R+B)
rather than simple 1:1 exchanges, which simply do not present themselves
very often in games. In the other 80-90% of games the Queens will be
traded against each other, which will always be a neutral trade to each
program, no matter how much they differ in Queen value.

When only 10% of the games is affected by the piece-value difference,
while 90% with equal trades will have a 50-50 outcome, that latter set of
games will still produce statistical noise, which is added to the noise in
the overall result score, while it dilutes the systematic bias because of
the different evaluation. If, after a wrong trade (say Q vs R+B) induced
by the faulty Q value the side with Q left would have 70% winning chance,
(20% above par), the total score would be only 52% (2% above par). To
detect this score excess with the same relative statistical accuracy as
the 20% excess would require 100x as many games. (So 10,000 in stead of
100.)

The situation could be lightly improved if one would SELECT games before
analysis, throwing out all games whith equal trades (Q vs Q). Then you
eliminate the random noise produced by them from the result, and would
only look at the sample with unequal trades (with 20% score access). You
would still need about 100 of those, but now you only have to play 1,000
games to acquire them. Problem is that judging which games were affected
by the piece-value under study is a bit subjective, as Q vs Q trades do
not always occur through QxQ, ...xQ combinations, but sometimes are part
of a larger exchange with intermediate positions with material imbalance
(not affecting the engine decision, as they were within the horizon of the
engine search).

This is why I adopted the methodology of forcing the material imbalance
under study into the game from the very beginning. ('Asymmetric
playtesting' in Dereks terminology.) All games I play are then relevant.
Even if the engine I play with has a completely wrong idea of the piece
values, the material advantage it has at the outset (say A vs B+N) will be
needlessly traded away in only 10% of the cases. And if both engines share
the misconception, that will be still lower, as the opponent would
actually try to avoid such trades. So you will have only a light
suppression of the excess score, and very little noise added to it.

| I could do it myself, but I need a chess variant engine that I can
| set, from the command line, white's and black's values of the pieces
| independently, and then have the variant play itself a game of the
| chess variant.

I would only applaud this. In fact the engines you request do exist, and
can be downloaded as free software from my website:

* Joker80 allows setting of the piece values by a command-line argument,
(a feature requested by Derek, as discussed below in this thread) but is
limited to 10x8 variants with the Capablanca pieces.

* Fairy-Max allows implementation of (nearly) arbitrary fairy pieces, and
setting of their values, through a configuration file (fmax.ini) that can
be changed with a simple text editor like Notepad. (This because the
options here are too elaborate to fit on the command line.) The format of
the piece description is admittedly a bit cumbersome (that is, the
description of the way it moves, especially if it is a complex move like
that of a Crooked Bishop), but the fmax.ini that is provided for download includes many examples for the more common fairy pieces. And changing the piece value is absolutely trivial. Furthermore, I am always available to provide assistance.

H. G. Muller wrote on Wed, Jul 2, 2008 04:29 AM EDT:

Reinhard:
| P.S.: Thus it would be best to present a short and convincing argument.

If you don't consider the fact that 'the side having piece A beats that
having piece(s) B 90% of the time' a convincing argument to value A
higher than B, I don't really see what could convince you.

But the point really is that Derek ASKS you to provide such a version of
SMIRF to help him conduct an experiment he thinks is interesting. So it
should not really matter if the piece values here request are CORRECT or
not, because this is exactly what he is trying to test. The question is if
you want to HELP him searching for the truth, by providing him what he
needs to conduct this search...

H. G. Muller wrote on Wed, Jul 2, 2008 04:48 AM EDT:

Greg Strong:
| The current state of ChessV? 

Hi Greg! Good to see you back here! What would be very interesting to me
is to have a version of ChessV that just plays as a console application
rather than having its own graphical interface. Preferably using WinBoard
protocol, of course, but I would be happy with anything, no matter how
primitive. I wouldn't even mind if the graphical interface stays, as long
as ChessV would also print the move it makes on its standard output, and
reads and accepts a move from its standard input. If it could do those
things, I would be able to write an adapter to run it under WinBoard
against other engines.

Would this be feasible?

| For onething, it doesn't anticipate forced repetition draws in 
| the appropriate way; even if it is winning by quite a margin, 
| it won't break the repetition to save it's advantage.  

I can vouch from my experience with micro-Max that this is extremely
important. It is almost impossible to quantitatively judge performace of
the engine if it can be tricked into rep draws, to the point where very
clear improvements do not affect the score at all.

In uMax I could fix 95% of the problem by recognizing returns to positions
that already occurred before in the game history, and evaluate those at
0.00. That it cannot really plan (or avoid) forced repetitions that occur
entirely in the tree is only a minor problem, as it does not occur too
often that repetitions can be forced.

H. G. Muller wrote on Wed, Jul 2, 2008 07:08 AM EDT:

Some more empirical data for those who are working on ab-initio theories
for calculating piece values:

I did determine piece values of several fully symmetric elementary and
compound leapers, with various number of target squares, in the context of
a normal FIDE Chess set in which the extra pieces were embedded in pairs,
on a 10x8 board. The number of target suares varied from 4 (Ferz, Wazir)
to 24 (Lion), the length of the leap limited to 2 in one dimension. From
this I noticed that the empirical values for pieces with the same number
of target squares tends to cluster quite closely around certain values:
140, 285, 630 and 1140 centiPawn for pieces witth 4, 8, 16 and 24 targets,
respectively). These values can be fitted by the expression

value = (30 + 5/8*N)*N,

where N is the number of target squares (when unrestricted by board
edges).

Then I went on by testing how the value of a piece that is nearly
saturated with moves (so that taking away 1 or 2 hardly affects its
overall manouevrability), namely the Lion, which in this context is a
piece that reaches all targets in the 5x5 area in which it is centered, is
affected by taking some moves away. In taking away moves, I preserved the
left-right symmetry of the piece, so that moves not on a file were
disabled in pairs. This left 14 distinct leap types, which I disabled one
at a time. I then played a pair of the thus handicapped pieces agains a
pair of unimpede Lions (plus the FIDE array present for both sides).

The resulting excess scores in favor of the unimpeded Lions when disabling
the various leaps were:

forward:   12.5% 15.1%  8.8% 15.1% 12.5%
           11.0% 14.8%  5.9% 14.8% 11.0%
            6.8%  5.0%    -   5.0%  6.8%
            7.9%  7.8%  5.4%  7.8%  5.4% 
backward:   7.6%  9.1%  5.4%  9.1%  7.6%

So disabling both forward (2,2) leaps (fA in Betza notation) reduced the
winning chances by 12.5%, etc. Pawn odds produces approximately 12% excess
score, so the two fA leaps marginally contribute a value of 100 cP to the
Lion. Note the values were obtained from 1000-game matches, and thus have
a statistical error of ~1.5% (12.5 cP). Also note that the numbers on the
vertical symmetry axis have to be multiplied by at least a factor 2 for
fair comparison with the other numbers, as in these tests only a singlke
leap was disabled, as opposed to two in the other.

As a general conclusion, we can see that forward moves are worth more (by
about a factor 5/3) than sideway or backward moves. 'Narrow' leaps seem
on average to be worth a little bit more than 'wide' leaps.

I am not sure if the scores above can be taken at face value as indicators
of the relative value of the particular leap in other pieces as well; it
could be that there are some cooperative contributions here that are
included in the measured marginal values, as all other leaps are always
present. E.g. the forward narrow Knight leaps are worth most, but perhaps
this is because they provide the piece with distant solo mating potential
of a King on the backrank. Perhaps the observed piece values should be
corrected for such global properties (of the entire target pattern) first,
before ascribing the value to individual leaps. Note, however, that all the
marginal scores add up to 123%, which is about 10.25 Pawns, not so far away
from empirical total value of the Lion. This suggest that cooperative
effects can't be on the average very large.

Next I intend to figure out how much of the value of each leap is provided
by its capture aspect, and how much by the non-capture aspect, by disabling
these separately. For the distant leaps, I want furthermore to know how
much the value changes if these are turned into lame leaps, blockable on a
single intermediate square. Note that the Xiangqi Horse (Mao) drops a
factor 2 in value compared to an orthodox Knight by being lame. I also
want to investigate if the lameness is worse if the piece has no capture
to the square on which th move could be blocked (a cooperative effect).

Falcon Chess. Game on an 8x10 board with a new piece: The Falcon. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Wed, Jul 2, 2008 11:39 AM EDT:

Why do you call this piece a Falcon, btw? A falcon is a flying creature, which makes it a very illogical name for a piece that can be blocked from reaching its destination by ground-based troops! Octopus would have been a more apt name, as the piece seems to have distinct tentacles that can slither through openings in the crowd, to attack what is at the other side. With a bit of imagination (considering neighboring (3,1) and (3,2) as one waving tentacle tip) there are even eight!

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Wed, Jul 2, 2008 01:54 PM EDT:

Reinhard:
Why is it relevant what you like, for giving Derek what he wants? He would
not ask for it unles HE liked it. You seem to deny other people what they
want/need/like because it is different from what you like.

Just add 2 Pawns to the value of any Archbishop. No matter how the rest of
your evaluation is, that can't be that difficult? If you think the
evaluation becomes totally non-sensical because of this is Derek's
problem.

H. G. Muller wrote on Wed, Jul 2, 2008 02:25 PM EDT:

Sam Trenholme:
| What is you experience with how being colorbound affects the value 
| of a short range leaper?

I never tried measuring heavily 'challenged' pieces like the Alfil or
Dabbaba. So I can only speak for color-bound pieces that can still access
50% of the board, like Bishop, Ferz, Camel, FD.

My experience is that, when I measure those in pairs of opposite color,
their value hardly suffers. A pair of FDs was worth almost as much as a
pair of Knights (580 vs 600). But in analogy to Bishops the value of such
a pair should be split in a base value and a pair bonus. A good way to
measure the pair bonus seems playing the two color-bound pieces on the
same color against a pair on different color. At least for the Bishops
this worked quite well, using Joker.

Problem is that Fairy-Max is really a bit too simple to measure a subtle
effect like this, as its evaluation does not include any pair bonuses. In
micro-Max, for orthodox Chess, I simply make the Bishop worth more than a
Knight, to bias it against B vs N trades. Although this makes it shy away
from B vs N trades even with only a single Bishop for no justifyable
reason, this is not very harmful. Unfortunately, this trick does not make
it avoid trading Bishops of unlike color against Bishops of like color.
And when tboth engines see these as perfectly equal trade, they become
very likely, wasting the advantage of the pair. I guess I could fix this
by programming the Bishops of either side as different pieces, and give
the Bishops of the side that has the pair a larger base value. (And
similar for other color-bound pieces.) I have not tried this yet.

Note that one should also expect cross-type pair bonuses, e.g. an FD plus
a Bishop are worth more if they are on unlike color. I am also not sure
how to calculate pair bonuses if there are more than 2 color-bound pieces
on the board foreach side. E.g. with 4 Bishops, two on white, two on
black, do I have two pairs, or four pairs?

I currently believe Betza's conjecture as a working hypothesis, that as
long as you have one piece of every color-class, the total value of the
set does not suffer from the color boundness. But I haven't tested 8
Alfils per side, and I have no idea how much the value of the set
decreases if you have only 4 left. There could be a term that is quadratic
in the number of Alfils in the evaluation. All this can in principle be
tested, but a piece with 4 targets, like Ferz, is not much worth to begin
with (~150 cP on 8x8). The Alfil is most likely not better, even in a
dense pack. And pair-bonus effects are usually again a small fraction of
the base value, and might be as low as 20 cP. It requires an enormous
number of games to get such small difference above the noise threshold.

Bison. Makes (1-3)-jump or (2-3)-jump.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Wed, Jul 2, 2008 07:11 PM EDT:

The Bison definitely has mating potential on an 8x8 board. Denoting Bison by Y: I have built a tablebase for the KYK end-game, and it is 100% won for white to move. (With black to move there are of course positions where the bare King captures an undefended Bison on the first move, and these are then draw.) Longest mate against best defense takes 27 moves.

I cannot build tablebases on other boards yet, but I adapted Joker80 so it would move Knights like Bisons. If I let it think a few min/move it does find mate in 20 or so in all psositions where the bare King is not too well centralized (and the white King is). As it is rather easy to drive the bare K out of he center with K+Y, this makes it likely that KYK is also won on 10x8. If I give the winning side a time-odds handicap of a factor 100, (40/60 vs 40/0:36) so it searches only 9-12 ply, where the defending King searches 22-28 ply, the bare King starting from w:Ke1,Yg1 b:Ke8 gets menouevred into mated-in-31 position quite rapidly (without the K+Y side knowing yet), after which it sees the mating net being tightened until the winning side finally gets a mate-in-12 within its horizon.

I couldn't say anything about 12x12.

Note that the Bison is equivalent to the patented Falcon in these games, as there is not enough material on the board to block the Falcon moves.

Falcon Chess. Game on an 8x10 board with a new piece: The Falcon. (10x8, Cells: 80) [All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, Jul 3, 2008 06:25 AM EDT:

George Duke:
| Right, that paragraph could be improved, let's see. That was written 
| in late 1996, when copyright mailed in USA, and not revised for the 
| CVP 2000 article. If one King and Falcon stand on own back rank, 
| and other King at its bank rank, with no other pieces on board, no 
| checkmate is possible with good play.

I did some more tests using a converted Joker80 engine, and it seems that on a 10x8 board this statement is plain wrong. Joker has no difficulty at all in checkmating a bare King with King + Falcon, even if they all start from their own backrank (or even if the bare King can start in the center). Even if I let the defending side search 100x longer, making it search ~10 ply deeper, so that it sees the mate coming long before the winning side does, and would avoid it if possible.

David Paulowich:
| Falcon Chess has the opposite problem: I have not seen anyone state 
| that King and Falcon can force a lone King into a corner. 

OK, so I am the first then. ;-) Even an engine with a comparatively shallow search has no problems driving a bare King into a corner with King + Falcon, as long as it knows that it is bad for a bare King to be closer to a corner. Even if the defending side enormously outsearches it. This applies to 8x8 boards (where there is ironclad proof through an end-game tablebase) as well as 10x8 (where it is based on time-odds play testing).

This page really need thorough revision. Apart from poor presentation, some of the statements in it are just plain false, or very unlikely to be true at least...

H. G. Muller wrote on Thu, Jul 3, 2008 06:44 AM EDT:

Oh, and since there is no e-mail address in my profile on this discussion board, for people that want to contact me privately:

I can be reached with user name h.g.muller, with provider hccnet. nl

Carpenter. compound of Knight and Dabbaba.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, Jul 3, 2008 01:09 PM EDT:

I can confirm that this piece has mating potential on an 8x8 board. My tablebase builder says for the King + Carpenter vs King endgame that it is almost always won if the side with the Carpenter has the move. There are only 196 exceptions to this (out of ~ 250,000 positions) where the Carpenter is under diagonal attack in a corner, and its King is too far away to protect it after it moves.

Longest mate against optimal defense is 31 moves. There are 100 such positions, e.g. w:Ka1, Carpenter g7, b:Kf6.

I am not sure what the interest of 12x12 boards is. Perhaps I should modify my EGTB generator to handle 16x16 board or even 32x32 boards. The current version can do upto 5 men on 8x8, but with the same memory usage it could still handle 3 men on 32x32. And I can always limit it to a subset of the board.

Kangaroo (Newton). compound of Knight and Alfil.[All Comments] [Add Comment or Rating]

H. G. Muller wrote on Thu, Jul 3, 2008 05:42 PM EDT:

The Kangaroo is yet another Knight + Short-Range leaper compound with mating potential: King + Kangaroo vs King is generally won on all boards upto 10x10 (on 12x12 it is usually draw). On 8x8, only 192 positions (out of ~250,000) are not won with white to move: when the Kangaroo is on a corner square and attacked by the bare King diagonally, and its own King is too far away for the Kangaroo to leap into its save haven.

The longest mate against perfect defense on 8x8 is 35 moves. There are 260 such positions, e.g. w:Kb1, Kangaroo a2, b:Kb3 (white to move).

[Subject Thread] [Add Response]

H. G. Muller wrote on Fri, Jul 4, 2008 04:50 AM EDT:

Now that there is talk about how to attract more attention for Chess
Variants, perhaps the following is an idea as well. It could be
implemented next to, and independently from organizing matches with GMs.

We could put some pages on this website where there is live broadcasting
of automated games of a few selected CVs between computer programs, say at
10 or 5 min/game, so that people can watch and get an idea of how the game
is played. To get an impression of what I am thinking of, see 
http://home.hccnet.nl/h.g.muller/goths.html .

In my experience, people that say they are not interested in Chess variants
change their opinion quite easily if they actually see the variants in
action. Watching Chess-like blitz games has a hypnotic and adictive effect
effect on people anyway, they can't help but being curious at what will
happen next.

The demo above is just replaying a game I uploaded to the website at my
provider's server, and there is no game going on at the moment, so the
moves are not updated. If I would post the same page on my PC at home,
where I have a game running, anyone clicking a link to the viewer page
would get to see the game in progress being replayed at 1 move/sec, until
it reaches the current position. From then on it would wait for the
playing engines to append their moves to the file 'moves.txt'. The
viewer periodically polls this page, and if there are new moves, it
updates the display. The play can be fully automated, a new game starting
as soon as the previous finishes, between the same engines, or in a
round-robin tournament of many engines. In the latter case people would be
able to request the current standings and cross table of the tourney.

I have already run such tournaments for several 10x8 Capablanca
sub-variants and for Knightmate, and currently am preparing one for
'Nightrider Chess' (a variant that is not even in this pages, but which
some existing Chess engines do support, identical to FIDE Chess except
that the Knights are replaced by Nightriders).

So my idea would be to put a link in a prominent place on the
chessvariants.com home page to a 'gallery of demo games'. This would
lead to a page with some explanation of what people are going to see, and
a bunch of links to computers of people willing to run the games, each a
different CV. When people would click such a link, they would get a game
viewer page like the demo above, displayed in their browser. This
javascript-driven page, and the file with moves to broadcast the game,
would be fetched directly from the gaming PC. (An alternative would be to
install the viewer pages on the chessvariants.com server, and have the
computers that play the games upload a new moves.txt file each time a move
is played. This would require some alteration of the software, though.)

Good candidate CVs for live demo games would be:
* 10x8 Capablanca variants
* 10x8 Falcon Chess
* Knightmate
* Shatranj
* Courier
* Nightrider Chess


What do you think of this idea?

100 comments displayed

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.