Check out Glinski's Hexagonal Chess, our featured variant for May, 2024.


[ Help | Earliest Comments | Latest Comments ]
[ List All Subjects of Discussion | Create New Subject of Discussion ]
[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

Earlier Reverse Order LaterLatest
Piece Values[Subject Thread] [Add Response]
Peter Hatch wrote on Fri, Apr 12, 2002 02:53 AM EDT:
Various and sundry ideas about calculating the value of chess pieces.

First off, it is quite interesting to instead of picking a magic number as
the chance of a square being empty, calculate the value for everything
between 32 pieces on the board and 3 pieces on the board.  Currently I'm
then just averaging all the numbers, and it gives me numbers slightly
higher than using 0.7 as the magic number (for Runners - Knights and other
single step pieces are of course the same).  One advantage of it is that it
becomes easier to adjust to other starting setups - for Grand Chess I can
calculate everything between 40 pieces on the board and 3, and it should
work.  With a magic number I'd have to guess what the new value should be,
as it would probably be higher since the board starts emptier.  One
disadvantage is that I have no idea whether or not the numbers suck. :) 
Interesting embellishments could be added - social and anti-social
characteristics could modify the values before they are averaged, and
graphs of the values would be interesting.  It would be interesting to
compare the official armies from Chess with Different Armies at the final
average and at each particular value.  It might be possible to do something
besides averaging based on the shape of the graph - the simplest idea would
be if a piece declines in power, subtract a little from it's value but
ignore the ending part, assuming that it will be traded off before the
endgame.

Secondly, I'm not sure what to do with the numbers, but it is interesting
to calculate the average number of moves it takes a piece to get from one
square to another, by putting the piece on each square in turn and then
calculate the number of moves it takes to get for there to every other
square.  So for example a Rook (regardless of it's position on the board)
can get to 15 squares in 1 move, 48 squares in 2 moves, and 1 square in 0
move (which I included for simplicity, but which should probably be left
out) so the average would be 1.75.  I've got some old numbers for this on
my computer which are probably accurate, but I no longer know how I got
them.   Here's a sampling:

Knight: 2.83
Bishop: 1.66 (can't get to half the squares)
Rook: 1.75
Queen: 1.61
King: 3.69
Wazir: 5.25
Ferz: 3.65 (can't get to half the squares)

This concept seems to be directly related to distance.  Perhaps some method
of weighting the squares could make it account for forwardness as well.

Finally, on the value of Kings.  They are generally considered to have
infinite value, as losing them costs you the game.  But what if you assume
that the standard method is to lose when you have lost all your pieces, and
that kings have the special disadvantage that losing it loses you the game?
 I first assumed this would make the value fairly negative, but preliminary
testing in Zillions seems to indicate it is somewhere around zero.  If it
is zero, that would be very nifty, but I'll leave it to someone much better
than me at chess to figure out it's true value.

gnohmon wrote on Fri, Apr 12, 2002 10:12 AM EDT:
'First off, it is quite interesting to instead of picking a magic number
as
the chance of a square being empty, calculate the value for everything
between 32 pieces on the board and 3 pieces on the board.  Currently I'm
then just averaging all the numbers,'

I've done that, too. The problem is, if the only reason you accept the
results is because they are similar to the results given by the 
magic number, then the results have no special validity, they mean
nothing more than the magic results. So why add the extra computational
burden?

If, on the other hand, you had a sound and convincing theory of why 
averaging the results was correct, that would be a different story.

'This concept seems to be directly related to distance.' Actually, I
think
I'd call it 'speed'. I'm pretty sure that I've played with those numbers
but gave up because I couldn't figure out what to do with them. 
Maybe you can; I encourage you to try.

Jianying Ji wrote on Mon, Apr 21, 2008 05:29 PM EDT:
Hear Hear, Joe Joyce,

I guess I will throw my first two cents in on the question of the piece
value quanta question. I think the smallest difference on 8x8 board, is
about a third of a pawn or about a tenth of a knight. The larger the board
the smaller the quanta, I believe. Maybe by 12x16, the quanta may be as
large as a pawn, or more. The problem as alluded before in the other
thread is how to empirically test such things.

Joe Joyce wrote on Tue, Apr 22, 2008 02:18 PM EDT:
Reinhardt, this is the place for the discussion of piece values here at the
cv.org site. It was started quite a while ago, but has almost no entries. I
guess the discussion from a while back on the cvwiki would also be
relevant.
George, thank you! That thread was started by Mike Nelson on 3/21/04,
about 12,500 comments ago. It's worth reading.
Jianying Ji, 'argument' below your comment in Aberg:

'2008-04-18	Jianying Ji Verified as Jianying Ji	None	
Theoretical considerations ... must tempered by empirical
experimentation. Below is my theoretical analysis of C vs A situation.

First let's take the following values:

R: 4.5
B: 3
N: 3

Now the bishop is a slider so should have greater value then knight, but
it is color bound so it gets a penalty by decreasing its value by a third,
which reduce it to that of the knight. 

When Bishop is combined with Knight, the piece is no longer color bound so
the bishop component gets back to its full strength (4.5), which is
rookish. As a result Archbishop and Chancellor become similar in value.'
                 ***                 ***
I would argue that your conclusion on the values would be correct on an
infinite board, where the values of the bishop, rook, and queen have all
converged to infinity. [see cvwiki discussion] On an 8x8 board, the
unhindered rook moves 14, and the bishop between 7 and 13. This must act
to push the value back down. So, what counterbalances it? The RN gets
16-22 on an 8x8, and 18-24 on a 10x8. The BN gets 9 in the corner on
either size board, going to a maximum of 21. Can the 4 'forward' attacks
of the BN vs the RN's 3 and its ability to checkmate alone really overcome
the noticeable mobility disadvantage?

Joe Joyce wrote on Tue, Apr 22, 2008 02:30 PM EDT:
Reinhardt, I'm posting your values from the wiki for the Minister [NDW]
and High Priestess [NAF]. [These values were calculated by the method he
gives a link to in his last post.] Thank you for the numbers. Would you
say that the values would remain the same or very similar on a 10x10 where
the other pieces increased or decreased in power?

 Values for Minister and High Priestess by SMIRF's method
Scharnagl 4 May 2007, 07:54 -0-400

As far as I understood those pieces are 'close' types. Thus by SMIRF's
method their third value element is always zero because both first
elements are equal. It results (please verify this) in 8x8 values:
Minister 6+5/7, High Priestess 6+1/28, in 10x10 values: Minister 6+44/45,
High Priestess 6+19/45. Thus a Minister seems to be about 1/2 Pawn unit
more valued than a High Priestess.

[http://chessvariants.wikidot.com/forum/t-8835/piece-comparisons-by-contest]

David Paulowich wrote on Tue, Apr 22, 2008 04:09 PM EDT:
Piece   (S)   (m+M)  Double Average

Pawn     1.    ---    ------
Knight   3.    10     10.500
Bishop   3.    20     17.500
Rook     5.    28     28.000
Queen    9.    48     45.500
Guard    4.    11     13.125

The table above includes a 'Guard', moving like a nonroyal King. Joe Joyce is quite fond of it, even I have been known to use this piece. The (S) column gives one popular set of standard piece values. The (m+M) column is based on a simple pencil and paper calculation, adding the minimum number of possible moves for the given piece (from a corner square) to the MAXIMUM of possible moves (from a central square). The Knight, for example, has 2 moves minimum and 8 moves MAXIMUM, giving a total of 10 moves. Other people, with more determination, have precisely calculated a grand total of 336 possible moves from all 64 squares on the board , giving an average value of 5.250 possible moves. Dividing 336 by 32 puts 10.500 in the 'Double Average' column, which is surprisingly close to the previous column. From time to time, I play around with piece values on a cubic playing field with 216 cells, content to use an (m+M) column as my source of raw numbers.

What, if any, sense can we make of these numbers? The last two columns measure piece mobility on an empty board, so they indicate the general strength of each piece in the endgame - which I have found the (S) column well suited to. Note that N + B = R in the Double Average column. No great mystery here, the Knight has 60% of the mobility of the Bishop, while the Rook has 160%. Holding the Bishop at 3 points, this column suggests 4.8 points for the Rook, not an unreasonable choice - some writers assign as little as 4.5 points to the Rook. But nobody values the Knight at 1.8 points! To arrive at the 'standard' values, one must make arbitrary changes in the raw numbers, forcing them towards a desired conclusion. 'Knight-moves' need to be counted as more valuable than the moves made by other pieces, perhaps by a 5:3 ratio. The penalty I am inclined to give the Bishop for being colorbound (therefore limited to half the board) needs to be cancelled out by a matching bonus for the fact that every Bishop move either attacks or retreats. The Rook, with its boring sideways moves, usually attacks only a single enemy piece - also it will have only a single line of retreat after capturing that piece. I love Rooks, but am forced to admit that they are superior to Bishops only because they have many move possible moves, on average. The 3D Rook moves up and down along one axis and sideways along two different axes, making it even more 'boring' than the 2D Rook. I am presently re-thinking the entire subject of piece values for 3D chess.

Here is an idea I had one day: recently Joe Joyce and I have been using the Elephant piece, which can move like a Ferz or an Alfil. Let the Grand Rook move like a Rook or an Elephant and let the Chancellor move like a Rook or a Knight. These two pieces, each adding eight shortrange moves to the Rook, should be nearly identical in value on most boards. But I consider a Grand Rook to be worth around half a Pawn less than a Queen on the 8x8 board - contradicting several statements by Ralph Betza (gnohmon) that the Chancellor and Queen are equal in value. This procedure is an art, not a science, and is even more difficult when working with different boards and new pieces. See my Rose Chess XII for a collection of interesting pieces, inspired by the writings of Ralph Betza, plus some theory of their values on a 12x12 board.


Reinhard Scharnagl wrote on Tue, Apr 22, 2008 04:10 PM EDT:
Well, I recalculated the values for both piece types using my last
published model (which probably is not perfect ;-) ):

High Priestess:
8x8: 6+1/28; 10x8: 6+5/36; 10x10: 6+19/45

Minister:
8x8: 6+5/7; 10x8: 6+3/4; 10x10: 6+44/45

Let me admit, that now it seems to me more impressive, to scale piece
values no longer to a Pawn normalised as 1, instead to do it using a
Knight normalised to 3. This remains neutral to the pieces' values
relative to each other, but it seems to create more comparable value
series.

The High Priestess' strength is more vulnerable by a decreasing board
size. Values of both types tend to become equal at an unlimited board
size.

Jianying Ji wrote on Tue, Apr 22, 2008 05:40 PM EDT:
Reinhard,

   I quite agree, knight is a great piece to normalize value to. I often
think the best way to valuate pieces is to normalize, with knight at
10pts, which is agreeable with the chess quanta at a little less than a
third of a pawn. Perhaps, some new standard can be worked out this way.

Joe Joyce wrote on Wed, Apr 23, 2008 02:39 AM EDT:
These are Aberg's values:
A  	 Archbishop  	 6.8
C 	 Chancellor 	 8.7
Q 	 Queen 	         9.0

These are Reinhardt's recent values:
High Priestess:
8x8: 6+1/28; 10x8: 6+5/36; 10x10: 6+19/45
Minister:
8x8: 6+5/7; 10x8: 6+3/4; 10x10: 6+44/45
So, for 10x8:
The high priestess comes in at 6.1 vs the archbishop's 6.8 - about a 10%
difference.
The minister comes in at 6.8 vs the chancellor's 8.7, a difference of
over 25%.

Why is the high priestess so close to the archbishop's value, compared to
the minister being noticeably [about 30%] weaker than the chancellor? 

Why is the value of the high priestess and the minister so much closer
together than that of the archbishop and chancellor? This falls in line
with HG Muller's argument, though at the lower value, not the higher
value.  This should imply [at least] something about the 2 types of
pieces, the shortrange leapers vs the infinite sliders, no? But what?

I said I was better at asking than answering questions; these I find
interesting. Now, it's way past my bedtime; good night, all. Pleasant
dreams. ;-)

David Paulowich wrote on Fri, Apr 25, 2008 10:05 AM EDT:

H.G.Muller has written here 'It is funny that a pair of the F+D, which is the (color-bound) conjugate of the King, is worth nearly a Knight (when paired), while a non-royal King is worth significantly less than a Knight (nearly half a Pawn less). But of course a Ferz is also worth more than a Wazir, zo maybe this is to be expected.'

Ralph Betza has written here 'Surprisingly enough, a Commoner (a piece that moves like a King but doesn't have to worry about check) is very weak in the opening, reasonably good in the middlegame, and wins outright against a Knight or Bishop in the endgame. (There are no Commoners in FIDE chess, but the value of the Commoner is some guide to the value of the King).'


Derek Nalls wrote on Sat, Apr 26, 2008 07:05 PM EDT:
Since ...

A.  The argumentative posts of Muller (mainly against Scharnagl & Aberg)
in advocacy of his model for relative piece values in CRC are
neverending.

B.  My absence from this melee has not spared my curious mind the agony of
reading them at all.

... I hope I can help-out by returning briefly just to point-out the six
most serious, directly-paradoxical and obvious problems with Muller's
model.

1.  The archbishop (102.94) is very nearly as valuable as the chancellor
(105.88)- 97.22%.

2.  The archbishop (102.94) is nearly as valuable as the queen (111.76)-
92.11%.

3.  One archbishop (102.94) is nearly as valuable as two rooks (2 x
55.88)- 92.11%.  In other words, one rook (55.88) is only a little more
than half as valuable as one archbishop (102.94)- 54.28%.

4.  Two rooks (2 x 55.88) have a value exactly equal to one queen
(111.76).

5.  One knight (35.29) plus one rook (55.88) are markedly less valuable
than one archbishop (102.94)- 88.57%.

6.  One bishop (45.88) plus one rook (55.88) are less valuable than one
archbishop (102.94)- 98.85%.

None of these problems exist within the reputable models by Nalls,
Scharnagl, Kaufmann, Trice or Aberg.  You must honestly address all of
these important concerns or realistically expect to be ignored.

Joe Joyce wrote on Sun, Apr 27, 2008 03:03 PM EDT:
Gentlemen, this is a fascinating topic, and has drawn the attention of a
large audience [for chess variants, anyhow ;-) ], and I'd hope to see
something concrete come out of it. Obviously, many of you gentlemen
participating in the conversation have made each other's acquaintance
before. And passions run high - I could say: 'but this is [only] chess',
however, I, too have had the rare word here or there, over chess, so I
would be most hypocritical, besides hitting by subtly [snort! - 'only'
is not subtle] putting down what we all love and hate to hear others say
is useless. 

What I and any number of others are hoping to get is an easy way to get
values for the rookalo we just invented. Assuming hope is futile, we look
for a reasonable way to get these values. Finally, we just pray that there
is any way at all to get them. So far, we don't have all that many probes
into the middle ground, much less the wilds of variant piece design. 

We use 3 methods to value pieces, more or less, I believe:
 The FIDE piece values are built up over centuries of experience, and
still not fully agreed-upon;
 The software engines [and to a certain extent, the hardware it runs on]
that rely on the same brute-force approach that the FIDE values are based
on, but using algorithms instead of people to play the games;
 Personal estimates of some experts in the field, who use various and
multiple ways to determine values for unusual pieces. 

The theoretical calculations that go into each of these at some stage or
other are of interest here. Why? Because the results are different. That
the results are different is a good thing, because it causes questioning,
and a re-examination of assumptions and methods of implementation. 

The questions you should be asking and seriously trying to answer are why
the differences exist and what effects they have on the final outcomes.
Example: 2 software engines, A and B - A plays the archbishop-type piece
better than the chancellor-type piece because there are unexpected
couplings between the software and hardware that lead to that outcome, and
B is the opposite. Farfetched? Well, it boils down to 3 elements: theory,
implementation, execution. Or: what is the designer trying to do [and
why?], what does the code actually say, and how does the computer actually
run it? Instead of name-calling, determine where the roots of the
difference lie [because I expect several differences]; they must lie in
theory, implementation and/or execution. 

Why shouldn't humans and computers value pieces differently? They have
different styles of play. 

Please, tone down the rhetoric, and give with some numbers and methods.
Work together to see what is really going on. Or use each other's methods
to see if results are duplicated. Numbers and methods, gentlemen, not names
and mayhem. I have clipped some words or sentences from rare posts, when
they clearly violated the site's policies. Please note that sticking to
the topic, chess, is a site policy, and wandering off topic is
discouraged. 

Play the High Priestess and Minister on SMIRF or one of the other 10x8
engines that exists, and see what values come up. Play the Falcon, the
Scout, the Hawklet... and give us the numbers, please. If they don't
match, show us why.

Reinhard Scharnagl wrote on Sun, Apr 27, 2008 03:41 PM EDT:
J.J.: '... Play the High Priestess and Minister on SMIRF or ...'

SMIRF still is not able to use other non conventional piece types despite of Chancellor (Centaur) or Archbishop (Archangel). You have to use other fine programs. Nevertheless the SMIRF value theory is able to calculate estimated piece exchange values.

Currently I am about to learn the basics of how to write a more mature SMIRF and GUI for the Mac OS X operating system. Thus it will need a serious amount of time and I hope not to lose motivation on this. Still I have some difficulties to understand some details of Cocoa programming using Xcode, because there are only few good books on that topic here in German language. We will see if this project will become ready ever.

Derek Nalls wrote on Tue, Apr 29, 2008 10:20 PM EDT:
A substantial revision and expansion has recently occurred.

universal calculation of piece values
http://www.symmetryperfect.com/shots/calc.pdf
66 pages

Only three games have relative piece values calculated using this complex
model:  FRC, CRC and Hex Chess SS (my own invention).  Furthermore, I only
confidently consider my figures somewhat reliable for two of these games, FRC (including Chess) and Capablanca Random Chess, because much work has been done by many talented individuals (hopefully, including myself) as well as computers to isolate reliable material values.  This dovetails into the reason that I do not take requests.  I have absolutely no assurance that the effort spent outside these two established testbeds is productive at all.  If it is important to you to know the material values for the pieces within your favorite chess variant (according to this model), then you must calculate them yourself.

Under the recent changes to this model, the material values for FRC pieces
and Hex Chess SS pieces remained exactly the same.  However, the material
values for a few CRC pieces changed significantly:

Capablanca Random Chess
material values for pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

pawn  10.00
knight  30.77
bishop  37.56 
rook  59.43
archbishop  93.95
chancellor  95.84
queen  103.05

Focused, intensive playtesting on my part has proven Muller to be correct
in his radical, new contention that the accurate material value of the
archbishop is extraordinarily, counter-intuitively high.  I think I have
successfully discovered a theoretical basis which is now explained within
my 66-page paper.

All of the problems (that I am presently aware of) within my set of CRC
material values have now been solved.  Some problems remain within
Muller's set.  I leave it to him whether or not to maturely discuss them.

Jianying Ji wrote on Wed, Apr 30, 2008 12:39 PM EDT:
Interesting response by Derek Nalls, It does appear that the archbishop
will be getting a hearing and reevaluation. This will certain sharpen
things and advance our knowledge of this piece.

On piece values in general, I second Rich with the addition of Hans's
comment, that piece values are for:

1) Balancing armies when playing different armies.

2) Giving odds to weaker players (this is more easily done with
shogi-style variants, with chess-style variants the weaker player receive
a slightly stronger army)

3) To cancel out the first player advantage by giving the second player a
slight strengthening of maybe only one piece.

As for Joe Joyce's minister and Priestess, my initial estimate was
queenish but that is an overestimate, and is dependent on the range of
opponent pieces. One interesting feature that may impact value is that
minister is more color changing than color bound, while priestess is a
balance of both. This balance between color changing and color bound might
make a nice chessvariant theme.

Another general consideration for evaluating piece and army strength is
approachability, how many opponent pieces from how many squares can attack
a piece without reciprocal threat.

George Duke wrote on Wed, Apr 30, 2008 03:18 PM EDT:
Another impact on values is the piece mix. Where there are many Pawns and
short-range pieces, Carrera's Centaur and Champion have more value. Where
those unoriginal BN and RN exist with Unicorn (B+NN) or Rococo
Queen-distance pieces, like Immobilizer, Advancer, Long Leaper, even
Swapper, BN and RN then have inherently less value. Put an Amazon (Q+N) in
there, with at least some Pawns for experimental similarity, and BN and RN
fall in value. Then too, change the Pawn-type and change the values. Put
stronger Rococo Cannon Pawns in any CV previously having regular F.I.D.E.
or Berolina Pawns, and any piece value of 5.0 or more, relative to Pawns
normalized to near 1.0, decreases -- on most board sizes. I wonder why
Ralph Betza made only one Comment in this 6-year-old thread. Maybe he
figured, why help out Computers too much? They had already ruined
500-year-old Mad Queen 64.

Joe Joyce wrote on Wed, Apr 30, 2008 07:16 PM EDT:
Yes, Ji, this is interesting - pity I didn't know all this before that
exchange... gentlemen, an interesting midpoint. I was going to note
that some of the Muller numbers are quite similar to others' numbers.
For example, the values of the minister and priestess fell between 6
and 7 by both HG and Reinhard's methods. Yet other numbers are quite
far apart, like the commoner values. This, of course, presents 2
problems, one to explain the differences, and the other to explain the
similarities. Derek, could you give us a verbal explanation of what you
did and found?

Reinhard, my apologies for some sloppy phraseology. You've posted your
theory for all to see. You have provided numbers both times we've
spoken on this. In fact, you have been kind enough to correct my
mistakes in using your theory as well as providing the 2 sets of numbers.
[I will have to find some time to upgrade the wiki on this. Excellent.]
Thank you; I could ask for very little more. [Heh, maybe a tutorial on
that 3rd factor; Graeme had to correct my mistakes too.] I wish you the
very best with your new endeavor.

Ji is right, the number of squares attacked may be a first
approximation, but the pattern of movement is a key modifier. I put
together a chart a while ago after discussing the concept of
approachability with David Paulowich. The numbers in the chart are
accurate; the notes following contain observations, ideas, statements
that may be less so. Fortunately, the numbers in themselves are rather
suggestive, one way to look at power and vulnerability. They present a
two-dimensional view of pieces, a sort of looking down from above view
in chart form.
http://chessvariants.wikidot.com/attack-fraction

The chart clearly could be expanded, should anyone be interested. [The
archbishop, chancellor, amazon should be added soon, for example; any
volunteers? :-) ] But can it be used for anything? Colorboundness, and
turns to get across board, both side to side and between opposite
corners, are factors that must have some effect. [Board size and
edge effect are 2 more, this time mutually interactive factors. How
much will they be explored? Working at constant board size sort of
moots that question.] What do your theories, gentlemen who are carrying
on or following this conversation, have to say about these things?

Please note this conversation is spread over 3 topics:
this Piece Values thread,
Aberg's Variant game comments
Grand Shatranj game comments

Rich Hutnik wrote on Wed, Apr 30, 2008 10:05 PM EDT:
I believe spaces attacked are a subset of spaces a piece can move onto.

Derek Nalls wrote on Wed, Apr 30, 2008 10:47 PM EDT:
As far as playtesting goes ...

Admittedly, my initial intention was just to amuse myself by 
disproving the consistency of Muller's unusually-high archbishop 
material value in relation to other piece values within his CRC set.
If indeed his archbishop material value had been as fictitious as it 
was radical, then this would have been readily-achievable 
using any high-quality chess variant program such as SMIRF.
No matter what test I threw at it, this never happened.

Previously, I have only used 'symmetrical playtesting'.
By this I mean that the material and positions of the pieces
of both players have been identical relative to one another.
This is effective when playing one entire set of CRC piece values
against another entire set as, for example, Reinhard Scharnagl & I
have done on numerous occasions.  The player that consistently 
wins all deep-ply (long time per move) games, alternatively playing 
white and black, can be safely concluded to be the player using 
the better of the two sets of CRC piece values since this single 
variable has been effectively isolated.  However, this playtesting
method cannot isolate which individual pieces within the set 
carry the most or least accurate material values.

In fact, I had no problem with Muller's set of CRC piece values
as a whole.  The order of the material values of all of the CRC 
pieces was-is correct.  However, I had a large problem with his
material value for the archbishop being nearly as high as for
the chancellor.  

To pinpoint an unreasonably-high material value for only one 
piece within a CRC set required 'asymmetrical playtesting'.  
By this I mean that the material and positions of the pieces 
of both players had to be different in an appropriate manner to
test the upper and lower limits of the material value for a certain 
piece (e.g., archbishop).  This was achieved by removing select
pieces from both players within the Embassy Chess setup so that 
BOTH players had a significant material advantage consistent
with different models (i.e., Scharnagl set vs. Muller set).  
This was possible strictly because of the sharp contrast between the 
'normal, average' and 'very high', respectively, material values 
for the archbishop assigned by Scharnagl and Muller.  The fact
that the SMIRF program implicitly uses the Scharnagl set to play
both players is a control variable- not a problem- since it is 
insures equality in the playing strength with which both players
are handled.  The player using the Scharnagl set lost every game 
using SMIRF MS-173h-X ... regardless of time controls, 
white or black player choice and all variations in excluded pieces 
that I could devise.

I thought it was remotely possible that an intransigent, positional 
advantage for the Muller set somehow happened to exist within the 
modified Embassy Chess setup that was larger than its material 
disadvantage.  This type of catastrophe can be the curse of 
'asymmetrical playtesting'.  So, I experimented likewise using a 
few other CRC variants.  Same result!  The Scharnagl set lost every 
game.

I seriously doubt that all CRC variants (or at least, the games I tested)
are realistically likely to carry an intransigent, positional advantage 
for the Muller set.  If this is true, then the Muller set is provably, 
ideally suited to CRC, notwithstanding- just for a different reason.

Finally, I reconsidered my position and revised my model.

Reinhard Scharnagl wrote on Thu, May 1, 2008 06:21 AM EDT:
Well Derek, I did not understand exactly, what you have done. But it seems
to me, that you exchanged or disposed some different pieces from the
Capablanca piece set according to SMIRF's average exchange values.

Let me point to a repeatedly written detail: if a piece will be captured,
then not only its average piece exchange value is taken from the material
balance, but also its positional influence from the final detail
evaluation. Thus it is impossible to create 'balanced' different armies
by simply manipulating their pure material balance to become nearly equal
- their positional influences probably would not be balanced as need be.

A basic design element of SMIRF's detail evaluation is, that the
positional value of a square dominated by a piece (of minimal exchange
value) is related to 1/x from its exchange value. Thus replacing some
bigger pieces  by some more smaller types keeping their combined material
balance will tend to increase their related positional influences.

You see, that deriving conclusions from having different armies playing
each other, is a very complicated story.

Derek Nalls wrote on Fri, May 2, 2008 07:31 AM EDT:
For the reasons you describe (which I mostly agree with), I do not ever use
'asymmetrical playtesting' unless that method is unavoidable.  However,
you should know that I used many permutations of positions within my
'missing pieces' test games to try to average-out positions that may
have pre-set a significant positional advantage for either player.  

Yes, the fact that SMIRF currently uses your (Scharnagl) material values
with a 'normal, average' material value for the archbishop instead of a
'very high' material value (as well as the interrelated positional value
given to the archbishop with SMIRF) means that both players will place
greater effort than I think is appropriate into avoiding being forced into
disadvantageous exchanges where they would trade their chancellor or queen
for the archbishop of the opponent.  Still, the order of your material
values for CRC pieces agrees with the Muller model (although an
archbishop-chancellor exchange is considered only slightly harmful to the
chancellor player under his model).  So, I think tests using SMIRF are
meaningful even if I disagree substantially with the material value for
one piece within your model (i.e., the archbishop).

Due to apprehension over boring my audience with irrelevant details, I did
not even mention within my previous post that I also invented a variety of
10 x 8 test games using the 10 x 8 editor available in SMIRF that were
unrelated to CRC.  

For example, one game consisted of 1 king & 10 pawns per player with 9
archbishops for one player and 8 chancellors or queens for another player.
 Under the Muller model, the player with the 9 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 8 chancellors or 8 queens had a significant material advantage. 
The player with the 9 archbishops won every game.

For example, one game consisted of 1 king & 20 pawns per player with 9
archbishops for one player and 8 chancellors or queens for another player.
 Under the Muller model, the player with the 9 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 8 chancellors or 8 queens had a significant material advantage. 
The player with the 9 archbishops won every game.

For example, one game consisted of 1 king & 10 pawns per player with 18
archbishops for one player and 16 chancellors or queens for another
player.  Under the Muller model, the player with the 18 archbishops had a
significant material advantage.  Under the Scharnagl model, the player
with the 16 chancellors or 16 queens had a significant material advantage.
 The player with the 18 archbishops won every game.

I have seen it demonstrated many times how resilient positionally the
archbishop is against the chancellor and/or the queen in virtually any
game you can create using SMIRF with a 10 x 8 board and a CRC piece set.

When Muller assures us that he is responsibly using statistical methods
similar to those employeed by Larry Kaufmann, a widely-respected
researcher of Chess piece values, I think we should take his word for it. 
Of course, I remain concerned about the reliability of his stats generated
via using fast time controls.  However, it has now been proven to me that
his method is at least sensitive enough to detect 'elephants' (i.e.,
large discrepancies in material values) such as exist between contrasting
CRC models for the archbishop even if it is not sensitive enough to detect
'mice' (i.e., small discrepancies in material values) so to speak.

Reinhard Scharnagl wrote on Fri, May 2, 2008 10:36 AM EDT:
The infeasibility of using different armies to calculate piece values

To Derek Nalls and H.G.M.:

Nearly everyone - so I think - will agree, that inside a CRC piece set the value of an Archbishop is greater than the sum of the values of Knight and Bishop, and even greater than two Knight values. Nevertheless, if you have following different armies playing against each other:

[FEN 'nnnn1knnnn/pppppppppp/10/10/10/10/PPPPPPPPPP/A1A2K1A1A w - - 0 1']

then you will get a big surprise, because those 'weaker' Knights will be going to win.

There are a lot of new and unsolved problems, when trying to calculating piece values inside of different armies, including the playability of a special piece type, e.g. regarding the chances to cover it by any other weaker one.

Derek Nalls wrote on Fri, May 2, 2008 11:37 AM EDT:
Yes, your test example yields a result totally inconsistent with
everyone's models for CRC piece values.  [I did not run any playtest
games of it since I trust you completely.]  Yes, your test example could
cause someone who placed too much trust in it to draw the wrong conclusion
about the material values of knights vs. archbishops.  The reason your test
example is unreliable (and we both agree it must be) is due to its 2:1 ratio of knights to archbishops.  The game is a victory for the knights player simply because he/she can overrun the archbishops player and force materially-disadvantageous exchanges despite the fact that 4 archbishops indisputably have a material value significantly greater than 8 knights.

In all three of my test examples from my previous post, the ratios of
archbishops to chancellors and archbishops to queens were only 9:8.  Note
the sharp contrast.  Although I agree that a 1:1 ratio is the ideal goal, it was impossible to achieve for the purposes of the tests.  I do not believe a slight disparity (1 piece) in the total number of test pieces per player is enough to make the test results highly unreliable.  [Yes, feel free to invalidate my test example with 18 archbishops vs. 16 chancellors and 18 archbishops vs. 16 queens since a 2 piece advantage existed.]  Although surely imperfect and slightly unreliable, I think the test results achieved thru 'asymmetrical playtesting' or 'games with different armies' can be instructive as long as the test conditions are not pushed to the extreme.  Your test example was extreme.  Two out of three of my test examples were not extreme.

Reinhard Scharnagl wrote on Fri, May 2, 2008 11:57 AM EDT:
Derek, my example must be extreme. Only then light might fall to the
obscure points.

My current interpretation to that strange behavior:  it is part of a
piece's value, that it is able to risk its own existence by entering
attacked squares. But that implies that it could be covered by a minor
piece. And covering is possible only, if there is at least one enemy piece
of equal or higher value to enable a tolerable exchange. In your and mine
examples that is definitely not the case. 

My conclusion is, that the most valued pieces will decrease in their
values, if no such potential acceptable exchange pieces exist. My
assumption to that is, a suggested replace value would be:

( big own piece value + big enemy piece value + 1 pawn unit ) / 2

This has to be applied to all those unbalanced big pieces. ( Just an idea
of mine ... )

P.S.: after rethinking on the question of the value of such handicaped
big pieces (having no equal or bigger counterpart) I now propose:

( big own piece value + 2 * big enemy piece value ) / 3

Derek Nalls wrote on Fri, May 2, 2008 12:07 PM EDT:
Feel free to invalidate my other two test examples I (reluctantly)
mentioned as well.  

My reason is that having ranks nearly full of archbishops, chancellors or
queens in test games does not even resemble a proper CRC variant setup
with its variety and placement of pieces.  Therefore, those test results
cannot safely be concluded to have any bearing upon the material values of
pieces in any CRC variant.   

Your reason is well-expressed.

Derek Nalls wrote on Fri, May 2, 2008 12:35 PM EDT:
The feasibility of using identical armies to calculate piece values

It has been a long time since our sets of CRC piece values have played one
another (on my dual 2.4 Ghz CPU server) using otherwise-identical versions
of SMIRF.  Obviously, the reason is that it has been a long time since
there existed a large disparity within our material values for any one of
the CRC pieces.  Recently, that has changed in the case of the
archbishop.

I already have the standard version of SMIRF MS-174b-O which uses
Scharnagl CRC piece values.  Would you be willing to compile a special
version of SMIRF MS-174b-O for me which uses Nalls CRC piece values?

Capablanca Random Chess
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

Back on safe ground using 'symmetrical playtesting', the results of who
wins the test games should be indicative of who is using a better set of
CRC piece values.

Reinhard Scharnagl wrote on Fri, May 2, 2008 12:46 PM EDT:
Derek, now it no longer is that easy. Because now in SMIRF piece values are
only implemented in their statical part. Their mobility part will be
covered by the detail evaluation. The '-X' versions of SMIRF have made a
mixture of those, the '-0' version is completely without mobility
fractions. This is a minor detail of my new approaches.

Nevertheless if you would separate those components compiles are possible.

Derek Nalls wrote on Fri, May 2, 2008 01:38 PM EDT:
I understand.  I wondered what the 'X' & 'O' designations for recent
SMIRF versions meant.  Do you still possess an older version of SMIRF (of
satisfactory quality to you) that uses your current CRC material values?

Since there is appr. 2-1/2 pawns difference between our models in our 
material values for the archbishop, I predict that my playtesting results
would probably be worthwhile and decisive.

George Duke wrote on Fri, May 2, 2008 01:47 PM EDT:
Joe Joyce and J.J. are referring to Minister ( Knight + Dabbabah + Wazir )
and Priestess ( Knight + Alfil + Ferz ). Ralph Betza's Chess Different
Armies has FAD ( Ferz + Alfil + Dabbabah ). That took a minute to recall
and find. I am quite sure (N+D+W) and (N+A+F) are not new and appear under
different name(s) some time ago, and it would be less misleading to use
earlier names. They did not originate with uncreative A.B.Shatranj or such
other recently. When previous use(s) found, I will post them, as we have
done with some other ''re-inventions.'' These pieces are unappealing,
all three, because they have unnatural foreshortened Rook or Bishop
dimension in their triple-compounding. There is no compelling logic. They
are pulled out of a hat from hundreds possibilities. Why not use pieces
going one-, two-, and three- either Rook- or Bishop-wise? No reason. No
improvement of any CV set-up by limiting to up-to-two or -three radially.
That is why Bishop and Rook themselves will always stand as perfection.
Piece Values inherently, however, are interesting intellectual activity
and topic. However, in perspective, not because of the utility of these
particular mediocre choices, ''Minister,'' ''Priestess,'' FAD. (Another Comment
may take up Amazon and the others as to their deficiencies.) Instead,
because facility at computing values can then attempt to apply  to
better piece-movement concepts, such as Rococo units, these are worthwhile
enough threads on Piece Values.

Reinhard Scharnagl wrote on Fri, May 2, 2008 01:48 PM EDT:
Well, Derek, I will use my own values for 8x8, if you have none new for
Q,A,C ...

I still have not published my current values (because they normally are
not used inside of SMIRF, and only the mobility parts have been modified),
I will use those then in the requested compiles:

N,B,R,A,C,Q for 8x8:
3.0000, 3.4119, 5.1515, 6.7824, 8.7032, 9.0001

N,B,R,A,C,Q for 10x8:
3.0556, 3.6305, 5.5709, 7.0176, 9.1204, 9.6005

Derek Nalls wrote on Fri, May 2, 2008 02:13 PM EDT:
Your revised material values for SMIRF look fine to me. I have written them down for safekeeping. Which version will you be compiling? Of course, I do not plan to playtest anyone's material values for pieces upon the 8 x 8 board- only material values for CRC pieces upon the 10 x 8 board.

Reinhard Scharnagl wrote on Fri, May 2, 2008 02:20 PM EDT:
Derek, you will receive versions compiled using complete piece values.

Reinhard Scharnagl wrote on Fri, May 2, 2008 04:32 PM EDT:
Different armies in action: 4*Archbishop vs. 8*Knight

Following game could be reviewed using the SMIRF donationware release
from: http://www.chessbox.de/Compu/schachsmirf_e.html

(but first replace the single quotes by double quotes before pasting)

[Event 'SmirfGUI: Different Armies Games']
[Site 'MAC-PC-RS']
[Date '2008.05.02']
[Time '18:30:40']
[Round '60 min + 30 sec']
[White '1st Smirf MS-174c-0']
[Black '2nd Smirf MS-174c-0']
[Result '0-1']
[Annotator 'RS']
[SetUp '1']
[FEN 'nnnn1knnnn/pppppppppp/10/10/10/10/PPPPPPPPPP/A1A2K1A1A w - - 0 1']

1. Aji3 Nd6 {(11.02) -1.791} 2. Aab3 Ne6 {(12.01=) -1.533} 3. c4 c5 {(12.01=)
-0.992} 4. d4 cxd4 {(13.00) -0.684} 5. c5 Ne4 {(12.01) -0.535} 6. Ac2 d5
{(11.39) +0.189} 7. f3 N4xc5 {(11.01=) +0.465} 8. Ag3 Nac7 {(11.01) +0.900} 9.
b4 Ncd7 {(11.01=) +1.475} 10. f4 g6 {(10.31) +1.750} 11. Ai5+ Ngh6 {(11.03+)
+1.920} 12. g4 j6 {(12.01=) +2.225} 13. Aie1 Nig7 {(11.01=) +2.363} 14. Ac1d3
f6 {(10.20) +2.506} 15. a4 N8f7 {(11.01=) +2.707} 16. Kg1 a5 {(11.01) +2.803}
17. bxa5 Nc6 {(11.15) +2.910} 18. Ab3 Nji6 {(11.01) +2.570} 19. j4 f5 {(12.03=)
+3.010} 20. gxf5 Ngxf5 {(11.01) +3.342} 21. a6 bxa6 {(11.01=) +3.998} 22. a5
Ne3 {(11.15) +4.156} 23. Aa4 Nb5 {(11.01=) +4.504} 24. Ab3 Nig7 {(11.03=)
+5.244} 25. Aih4 Nf6 {(11.02) +5.324} 26. Aef2 Nfh5 {(10.19) +6.395} 27. Ah3
Nhxf4 {(11.01) +6.172} 28. Adxf4 Nxf4 {(14.01) +5.979} 29. Axf4 g5 {(12.14)
+6.086} 30. Ahxg5 Nxg5 {(14.01=) +6.018} 31. Axg5 Kg8 {(14.11) +5.176} 32. Axe3
dxe3 {(16.01=) +5.117} 33. Axd5+ Kh8 {(16.01=) +5.117} 34. Axc6 Nhf5 {(14.18)
+5.127} 35. Ab4 Nc7 {(15.00) +4.803} 36. Ad3 Ki8 {(15.00) +4.838} 37. Kh1 Nd6
{(14.01) +4.891} 38. j5 Ngf5 {(14.01=) +5.189} 39. Ac5 Ndb5 {(14.01) +5.248}
40. Ad3 Nbd4 {(14.01) +5.365} 41. Ae4 Ncb5 {(16.02) +5.631} 42. Ki1 e6 {(15.23)
+5.932} 43. Ad3 h6 {(15.01) +5.250} 44. Ac4 h5 {(15.01=) +5.467} 45. i3 Kj7
{(15.12) +5.637} 46. Ad3 Nc3 {(15.09) +5.715} 47. Axa6 Ndxe2 {(15.00) +5.678}
48. Ad3 Ned4 {(14.01=) +6.117} 49. a6 Ncb5 {(14.01=) +6.602} 50. Kj1 e2
{(15.01=) +8.080} 51. Ae1 e5 {(15.01=) +11.59} 52. i4 e4 {(15.01=) +12.16} 53.
ixh5 Nf3 {(14.02) +12.56} 54. Af2 e3 {(15.22) +14.61} 55. Ad3 e1=Q+ {(16.02)
+16.00} 56. Axe1 Nxe1 {(17.01=) +23.09} 57. h6 ixh6 {(15.02=) +M~010} 58. h4
Nxh4 {(12.01=) +M~008} 59. a7 Nxa7 {(10.01=) +M~008} 60. Ki1 Neg2+ {(08.01=)
+M~007} 61. Kh2 e2 {(06.01=) +M~006} 62. Kh3 e1=Q {(04.01=) +M~005} 63. Kg4
Qe4+ {(02.01=) +M~004} 64. Kh3 Qf3+ {(02.01=) +M~003} 65. Kh2 Qi3+ {(02.01=)
+M~002} 66. Kh1 Qi1# {(02.00?) +M~001} 0-1

You will find out, that the handicap of being a big piece without having any exchangeable counterpart is dominating the kind of the battle.

H. G. Muller wrote on Sat, May 3, 2008 02:59 AM EDT:
Ha, finally my registration could be processed manually, as all automatic procedures consistently failed. So this thread is now also open to me for posting. Let me start with some remarks to the ongoing discussion. * I tried Reinhards 4A vs 8N setup. In a 100-game match of 40/1' games with Joker80, the Knights are crushed by the Archbishops 80-20. So although in principle I agree with Reinhard that such extreme tests with setups that make the environment for the pieces very alien compared to normal Chess could be unreliable, I certainly would not take it for granted that his claim that 8 Knights beat 4 Archbishops is actually true. Possible reasons for the discrepancy could be: 1) Reinhard did not base his conclusion on enough games. In my experience using anything less than 100 games is equivalent to making the decision by throwing dice. It often happens that after 30 games the side that is leading by 60% will eventually lose by 45%. 2) Smirf does not handle the Archbishop well, because it is programmed to underestimate its value, and is prepared to trade it to easily for two Knights to avoid or postpone a Pawn loss, while Joker80 just gives the Pawn and saves its Archbishops until he can get 3 Knights for it. 3) The shorter time control used does restrict search depth such that this does not allow Joker80 to recognize some higher, unnatural strategy (which has no parallel in normal Chess) where all Knights can be kept defending each other multiple times, because they all have identical moves, and so judges the pieces more on their tactical merits that would be relevant for normal Chess. * The arguments Reinhard gives against more realistic 'asymmetrical platesting': | Let me point to a repeatedly written detail: if a piece will be | captured, then not only its average piece exchange value is taken | from the material balance, but also its positional influence from | the final detail evaluation. Thus it is impossible to create | 'balanced' different armies by simply manipulating their pure material | balance to become nearly equal - their positional influences probably | would not be balanced as need be. seem invalid. For one, all of us are good enough Chess players that we can recognize for ourselves in the initial setup we use for playtesting if the Archbishop or Knight or whatever piece is part of the imbalance is an exceptionally strong or poor one, or just an average one. So we don't put a white Knight on e5 defended by Pf4, while the black d- and f-pawn already passed it, and we don't put it on a1 with white pawns on b3, c2 and black pawns on b4, c3. In particular, I always test from opening positions, where non of the pieces is on a particularly good square, but they can be easily developed, as the opponent does not inderdict access to any of the good squares either. So after a few opening moves, the pieces get to places that, almost by definition, are the average where you can get them. Secondly, when setting up the position, we get the evaluation of the engine for that position telling us if the engine does consider one of the sides highly favored positionally (by taking the difference between the engine evaluation and the known material difference for the piece values we know the engine is using). Although I would trust this less than my own judgement, it can be used as additional confirmation. Like Derek says, averaging over many positions (like I always do: all my matches are played starting from 432 different CRC opening positions) will tend to have avery piece on the average in an average position. If a certain piece, like A, would always have a +200cP 'positional' contribution, (e.g. calculated as its contribution to mobility) no matter where you put it, then that contribution is not positional at all, but a hidden part of the piece value. Positional contributions should average to zero, when averaged over all plausible positions. Furthermore, in Chess positional contributions are usually small compared to material ones, if they do not have to do with King safety or advanced passers. And none of the latter play a role in the opening positions I use. * Symettrical playtesting between engines with different piece-value sets is known to be a notoriously unreliable method. Dozens of people have reported trying it, often with quite advanced algorithms to step through search space (e.g. genetic algorithms, or annealing). The result was always the same: in the end (sometimes after months of testing) they obtained piece values that, when pitted against the original hand-tuned values, would consistently lose. The reason is most likely that the method works in principle, but requires too many games in practice. Derek mentioned before, that if two engines value certain piece combinations differently, they often exchange them for each other, creating a material imbalance, which then affects their winning chances. Well, 'often' is not the same as 'always'. For very large errors, like putting AR the undervaluation of A only can lead to much more complicated bad trades, as you have to have at least two pieces for A. The probability that this occurs is far smaller, and only 10-20% of the games will see such a trade. Now the problem is that the games in which the bad trades do NOT happen will not be affected by the wrong piece value. So this subset of games will have a 50-50 outcome, pushing the outcome of the total score average towards 50%. If A vs R+N gives you 60% winning chance,(so 10% excess), if it is the only bad trade that happens (because you set A slightly under 8), and happens in only 20% of the cases, the total effect you would see (and on which you would have to conclude the A value is suboptimal) would be 52%. But the 80% of games that did not contribute to learning anything about A value, because in the end A was traded for A, will contribute to the statistical noise! To recognize a 2% excess score in stead of a 10% excess score you need a 5 times lower statistical error. But statistical errors only decrease as the SQUARE ROOT of the number of games. So to get it down a factor 5, you need 25 times as many games. You could not conclude anything before you had 2500 games! Symmetrical playtesting MIGHT work if you first discard all the games that traded A for A (to eliminate the noise they produce, and they can't say anything about the correctness of the A value), and make sure you have about 100 games left. Otherwise, the result will be garbage.

Reinhard Scharnagl wrote on Sat, May 3, 2008 05:18 AM EDT:
Well, H.G.M., if you are believing in your value model, and your engine is using it, then this engine will avoid valid trades (as I regard them to be). If you would trust in your model, you could easily add a black Knight and remove some white Pawns and still have a value sum 'advantage' of the white Archbishops' team. So why do you not test these arrays?

The arrays as I have tested with SMIRF have had an advantage for White of 3.1296 in my model.
In your model (normalized to a Pawn = 1) the advantage has been about 12.944 (more than a Queen's value).

P.S.: Why not have some test games between SMIRF using Black having 9 Knights against your program having 4 Archbishops, each having 10 Pawns? In your value model it should be nearly impossible for Black to gain any victory at all.

P.P.S.: The game as proposed is no subject for Blitz, because it is decided by deep positional effects. So I used 60 min / game + 30 sec / move for the time frame, which is important.

H. G. Muller wrote on Sat, May 3, 2008 06:14 AM EDT:
Sorry my original long post got lost.

As this is not a position where you can expect piece values to work, and
my computers are actually engaged in useful work, why don't YOU set it
up?

Reinhard Scharnagl wrote on Sat, May 3, 2008 06:17 AM EDT:
Well, Harm, you know, that I failed in using 10x8 Winboard GUIs, so I
discontinued trying that.

H. G. Muller wrote on Sat, May 3, 2008 06:36 AM EDT:
It seems to me that that is bad strategy. If you fail you should keep
trying until you succeed. Only when you succeed you can stop trying...

Reinhard Scharnagl wrote on Sat, May 3, 2008 08:17 AM EDT:
You will find a (hopefully) actual table of several piece value sets at:
http://www.10x8.net/Compu/schachveri1_e.html

Derek Nalls wrote on Sat, May 3, 2008 11:07 AM EDT:
I have adequate confidence in my latest material values to ask you to
publish them upon your web page (instead of my previous material values).

CRC
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

They are, in principle, similar to Muller's set for every piece except
that they run on a comparatively compressed scale.  Even though I have not
yet playtested them, I consider my tentative confidence rational (although
admittedly premature and risky) because I trust Muller's methods of
playtesting his own material values and I think my latest revisions to my
model are conceptually valid.

Reinhard Scharnagl wrote on Sat, May 3, 2008 11:57 AM EDT:
Derek, I have changed your values again within my piece value table. I hope, you will report on some 9*N vs. 4*A games using your special SMIRF engine modified to your values. I am very convinced, that the effect of value reduced unbalanced big pieces exists. P.S.: Here is a hint to check out my marginally refined approach at page:
http://www.10x8.net/Compu/schachansatz1_e.html

H. G. Muller wrote on Sat, May 3, 2008 12:15 PM EDT:
To summarize the state of affairs, we now seem to have sets of piece
values for Capablanca Chess by:

Hans Aberg (1)
Larry Kaufman (1)
Reinhard Scharnagl (2)
H.G. Muller (3)
Derek Nalls (4)

1) Educated guessing based on known 8x8 piece values and assumptions on
synergy values of compound pieces
2) Based on board-averaged piece mobilities
3) Obtained as best-fit of computer-computer games with material
imbalance
4) Based on mobilities and more complex arguments, fitted to experimental
results ('playtesting')

I think we can safely dismiss method (1) as unreliable, as the (clearly
stated) assumptions on which they are based were never tested in any way,
and appear to be invalid.
Method (3) and (4) now are basically in agreement. 
Method (2) produces substantially different results for the Archbishop.

One problem I see with method (2) is that plain averaging over the board
does not seem to be the relevant thing to do, and even inconsitent at
places: suppose we apply it to a piece that has no moves when standing in
a corner, the corner squares would suppress the mobility. If otoh, the
same piece would not be allowed to move into the corner at all, the
average would be taken over the part of the board that it could access
(like for the Bishop), and would be higher than for the piece that could
go there, but not leave it (if there weren't too many moves to step into
the corner). While the latter is clearly upward compatible, and thus must
be worth more.

The moral lesson is that a piece that has very low mobility on certain
squares, does not lose as much value because of that as the averaging
suggest, as in practice you will avoid putting the piece there. The SMIRF
theory doe not take that into account at all.

Focussing on mobility only also makes you overlook disastrous handicaps a
certain combination of moves can have. A piece that has two forward
diagonal moves and one forward orthogonal (fFfW in Betza notation) has
exactly the same mobility as that with forward diagonal and backward
orthogonal moves (fFbW). But the former is restricted to a small (and ever
smaller) part of the board, while the latter can reach every point from
every other point. My guess is that the latter piece would be worth much
more than the former, although in general forward moves are worth more
than backward moves. (So fWbF should be worth less than fFbW.) But I have
not tested any of this yet.

I am not sure how much of the agreement between (3) and (4) can be
ascribed to the playtesting, and how much to the theoretical arguments:
the playtesting methods and results are not extensively published and not
open to verification, and it is not clear how well the theoretical
arguments are able to PREdict piece values rather than POSTdict them. IMO
it is not possible to make an all encompasisng theory with just 4 or 6
empirical piece values as input, as any elaborate theory will have many
more than 6 adjustable parameters.

So I think it is crucial to get accurate piece values for more different
pieces. One keystone piece could be the Lion. This is can make all leaps
to targets in a 5x5 square centered on it (and is thus a compound of Ferz,
Wazir, Alfil, Dabbabah and Knight). This piece seems to be 1.25 Pawn
stronger than a Queen (1075 on my scale). This reveals a very interesting
approximate law for piece values of short-range leapers with N moves:

value = (30+5/8*N)*N

For N=8 this would produce 280, and indeed the pieces I tested fall in the
range 265 (Commoner) to 300 (Knight), with FA (Modern Elephant), WD (Modern
Dabbabah) and FD in between. For N=16 we get 640, and I found WDN
(Minister) = 625 and FAN (High Priestess) and FAWD (Sliding General) 650.
And for the Lion, with N=24, the formula predicts 1080.

My interpretation is that adding moves to a piece does not only add the
value of the move itself (as described by the second factor, N), but also
increases the value of all pre-existing moves, by allowing the piece to
better manouevre in place for aiming them at the enemy. I would therefore
expect it is mainly the captures that contribute to the second factor,
while the non-captures contribute to the first factor.

The first refinement I want to make is to disable all Lion moves one at a
time, as captures or as non-captures, to see how much that move
contributes to the total strength. The simple counting (as expressed by
the appearence of N in the formula) can then be replaced by a weighted
counting, the weights expressing the relative importance of the moves. (So
that forward captures might be given a much bigger weight than forward
non-captures, or backward captures along a similar jump.) This will
require a lot of high-precision testing, though.

H. G. Muller wrote on Sat, May 3, 2008 12:21 PM EDT:
Oh Yes, I forgot about:

[name removed] (5)

5) Based on safe checking

I am not sure that safe checking is of any relevance. Most games are not
won by checkmating the opponent King in an equal-material position, but
by
annihilating the opponent's forces. So mainly by threatening Pawns and
other Pieces, not Kings. A problem is that safe checking seems to predict
zero value for pieces like Ferz, Wazir and Commoner, while the latter is
not that much weaker than the Knight. (And, averaged over all game
stages,
might even be stronger than a Knight.) This directly seems to falsify the
method.

[The above has been edited to remove a name and/or site reference. It is
the policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. -D. Howe]

Reinhard Scharnagl wrote on Sat, May 3, 2008 12:35 PM EDT:
H.G.M wrote: ... Focussing on mobility only also makes you overlook disastrous handicaps a certain combination of moves can have. A piece that has two forward diagonal moves and one forward orthogonal (fFfW in Betza notation) has exactly the same mobility as that with forward diagonal and backward orthogonal moves (fFbW). But the former is restricted to a small (and ever smaller) part of the board, while the latter can reach every point from every other point. My guess is that the latter piece would be worth much more than the former, although in general forward moves are worth more than backward moves. (So fWbF should be worth less than fFbW.) But I have not tested any of this yet.

Before I try to think over this argument, remember, all (non Pawn) pieces of the CRC piece set have non orientated gaits. Thus this argument could not change anything in value discussion of the CRC piece set, especially concerning the value of an Archbishop.

H. G. Muller wrote on Sat, May 3, 2008 12:46 PM EDT:
Reinhard, why do you attach such importance to the 4A-9N position. I think
that example is totally meaningless. If it would prove anything, it is
that you cannot get the value of 9 Knights by taking 9 times the Knight
value. It will prove _nothing_ about the Archbishop value. Chancellor and
Queen will encounter exactly the same problems facing an army of 9
Knights.

The problem is that there is a positional bonus for identical pieces
defending each other. This is well known (e.g. connected Rooks). Problem
is that such pair interactions grow as the square of the number of pieces,
and thus start to dominate the total evaluation if the number of identical
pieces gets extremely high (as it never will in real games).

Pieces like A, C and Q (or in particular the highest-valued pieces on the
board) will not get such bonuses, as the bonus is asociated with the
safety of mutually defending each other, and tactical security in case the
piece is traded, because the recapture then replaces it by an identical
one, preserving all defensive moves it had. In absence of equal or higher
pieces, defending pieces is a useless exercise, as recapture will not
offer compensation. If you are attacked, you will have to withdraw. So the
mutual-defence bonus is also dependent on the piece makeup of the opponent,
and is zero for Archbishops when the opponent only has Knights, and very
high for Knights when the opponent has only Archbishops.

If you want to playtest material imbalances, the positional value of the
position has to be as equal as possible. The 4A-9N position violates that
requirement to an extreme extent. It thus cannot tell us anything about
piece values. Just like deleting the white Queen and all 8 black Pawns
cannot tell us anything about the value of Q vs P.

Reinhard Scharnagl wrote on Sat, May 3, 2008 01:00 PM EDT:
H.G.M. wrote: ... It thus cannot tell us anything about piece values. Just like deleting the white Queen and all 8 black Pawns cannot tell us anything about the value of Q vs P.

I fully agree with that. Because my A vs. N example has not been intended to calculate piece values. Instead it should put light on some obscure details. The strange effect is not caused by the ability of N to cover each other. This also holds for A. It is caused by the absence of exchangeable counterparts for A of equal (or bigger) value size.

My example should demonstrate the existence of new effects in games of different armies. And that implies, that one should be carefully, when trying to calculate or verify piece values by having series of matches between different armies. Such effects as demonstrated in my N vs. A example should be discussed, eliminated or if not to be avoided to be integrated inside a formula. I suggested to reduce the values of such unbalanced big pieces somehow (I am not yet sure how exactly) in the equations you are using to find out special piece values. But without such purification attempts misinterpretations are not to be avoided.

H. G. Muller wrote on Sat, May 3, 2008 01:18 PM EDT:
Well, Reinhard, there could be many explanations for the 'surprising'
strength of an all-Knight army, and we could speculate forever on it. But
it would only mean anything if we could actually find ways to test it. I
think the mutual defence is a real effect, and I expect an army of all
different 8-target leapers to do significantly worse than an army of all
Knights, even though all 8-target leapers are almost equally strong. But
it would have to be tested.

Defending each other for Archbishops is useless (in the absence of opponet
Q, C or A), as defending Archbishop in the face of Knight attacks is of
zero use. So the factthey can do it is not worth anything.

Nevertheless, the Archbishops do not do so bad as you want to make us
believe, and I think they still would have a fighting chance against 9
Knights. So perhaps I will run this tests (on the Battle-of-the-Goths
port, so that everyone can watch) if I have nothing better to do. But
currently I have more important and urgent things to do on my Chess PC. I
have a great idea for a search enhancement in Joker, and would like to
implement and test it before ICT8.

Derek Nalls wrote on Sat, May 3, 2008 01:20 PM EDT:
re:  Muller's assessment of 5 methods of deriving material values for CRC pieces

'I am not sure how much of the agreement between (3) and (4) can be
ascribed to the playtesting, and how much to the theoretical arguments
...'

As much playtesting as possible.  Unfortunately, that amount is deficient
by my standards (and yours).  I have tried to compensate for marginal
quantity with high quality via long time controls.  You use a converse
approach with opposite emphasis.  Given enough years (working with 
only one server), this quantity of well-played games may eventually 
become adequate.

' ... and it is not clear how well the theoretical arguments are able to
PREdict piece values rather than POSTdict them.'

You have pinpointed my greatest disappointment and frustration thusfar
with my ongoing work.  To date, my theoretical model has not made 
any impressive predictions verified by playtesting.  To the contrary,
it has been revised, expanded and complicated many times upon 
discovery that it was grossly in error or out of conformity with reality.

Although the foundations of the theoretical model are built upon 
arithmetic and geometry to the greatest extent possible with verifiable 
phenomena important to material values of pieces used logically for 
refinements, mathematical modelling can be misused to postulate and 
describe in detail the existence of almost any imaginable non-existent 
phenomena.  For example, the Ptolemy model of the solar system.

Reinhard Scharnagl wrote on Sat, May 3, 2008 01:24 PM EDT:
H.G.M. wrote: ... Defending each other for Archbishops is useless (in the absence of opponet Q, C or A), as defending Archbishop in the face of Knight attacks is of zero use. So the factthey can do it is not worth anything. ...

Now you have got it. The main reason is the missing of counterparts of equal (or bigger) value. That is, what makes any effective covering impossible. And this is a payload within an (I confess very extremely designed) game between different armies.

P.S.: any covering of A also by P is useless then ...

H. G. Muller wrote on Sat, May 3, 2008 02:32 PM EDT:
Well, I got that from the beginning. But the problem is not that the A
cannot be defended. It is strong and mobile enough to care for itself. The
problem is that the Knights cannot be threatened (by A), because they all
defend each other, and can do so multiple times. So you can build a
cluster of Knights that is totally unassailable. That would be much more
difficult for a collection of all different pieces. This will be likely to
have always some weak spots, which the extremely agile Archbishops then
seek out and attack that point with deadly precision.

But I don't see this as a fundamental problem of pitting different armies
against each other. After an unequal trade, andy Chess game becomes a game
between different armies. But to define piece values that can be helpful
to win games, it is only important to test positions that could occur in
chames, or at least are not fundamentally different in character from what
you might encounter in games. and the 4A-9N position definitely does not
qualify as such.

I think this is valid critisism against what Derek has done (testing
super-pieces only against each other, without any lighter pieces being
present), but has no bearing on what I have done. I never went further
than playing each side with two copies of the same super-piece, by
replacing another super-piece (which was then absent in that army). This
is slightly unnatural, but I don't expect it to lead to qualitatively
different games, as the super-pieces are similar in value and mobility.
And unlike super-pieces share already some moves, so like and unlike
super-pieces can cooperate in very similar ways (e.g. forming batteries).
It did not essentially change the distribution of piece values, as all
lower pieces were present in normal copy numbers.

I understand that Derek likes to magnify the effect by playing several
copies of the piece under test, but perhaps using 8 or 9 is overdoing it.
To test a difference in piece value as large as 200cP, 3 copies should be
more than enough: This can still be done in a reasonably realistic mix of
pieces, e.g. replacing Q and C on one side by A, and on the other side by
Q and A by C, so that you play 3C vs 3A, and then give additional Knight
odds to the Chancellors. This would predict about +3 for the Chancellors
with the SMIRF piece values, and -2.25 according to my values. Both
imbalances are large enough to cause 80-90% win percentages, so that just
a few games should make it obvious which value is very wrong.

H. G. Muller wrote on Sat, May 3, 2008 02:42 PM EDT:
Derek Nalls:
| Given enough years (working with only one server), this quantity of 
| well-played games may eventually become adequate.

I never found any effect of the time control on the scores I measure for
some material imbalance. Within statistical error, the combinations I
tries produced the same score at 40/15', 40/20', 40/30', 40/40',
40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I
did not consider it worth doing just to prve that it was a waste of
time...

The way I see it, piece-values are a quantitative measure for the amount
of control that a piece contributes to steering the game tree in the
direction of the desired evaluation. He who has more control, can
systematically force the PV in the direction of better and better
evaluation (for him). This is a strictly local property of the tree. The
only advantage of deeper searches is that you average out this control
(which highly fluctuates on a ply-by play basis) over more ply. But in
playing the game, you average over all plies anyway.

Reinhard Scharnagl wrote on Sat, May 3, 2008 02:43 PM EDT:
H.G.M. wrote: ... After an unequal trade, andy Chess game becomes a game between different armies. ...

And thus I am convinced, that I have to include this aspect into SMIRF's successor's detail evaluation function.

... This can still be done in a reasonably realistic mix of pieces, e.g. replacing Q and C on one side by A, and on the other side by Q and A by C, so that you play 3C vs 3A, and then give additional Knight odds to the Chancellors. ...

And by that this would create just the problem I have tried to demonstrate. The three Chancellors could impossibly be covered, thus disabling their potential to risk their own existence by entering squares already influenced by the opponent's side.

Reinhard Scharnagl wrote on Sat, May 3, 2008 04:06 PM EDT:
H.G.M. wrote: ... Both imbalances are large enough to cause 80-90% win percentages, so that just a few games should make it obvious which value is very wrong.

Hard to see. You will wait for White to lose because of insufficient material, and I will await a loss of White because of the lonely big pieces disadvantage. It will be the task then to find out the true reasons of that.

I will try to create two arrays, where each side think to have advantage.

H. G. Muller wrote on Sat, May 3, 2008 04:18 PM EDT:
| And by that this would create just the problem I have tried to 
| demonstrate. The three Chancellors could impossibly be covered, 
| thus disabling their potential to risk their own existence by 
| entering squares already influenced by the opponent's side.

You make it sound like it is a disadvantage to have a stronger piece,
because it cannot go on squares attacked by the weaker piece. To a certain
extent this is true, if the difference in capabilities is not very large.
Then you might be better off ignoring the difference in some cases, as
respecting the difference would actually deteriorate the value of the
stronger piece to the point where it was weaker than the weak piece. (For
this reason I set the B and N value in my 1980 Chess program Usurpator to
exactly the same value.) But if the difference between the pieces is
large, then the fact that the stronger one can be interdicted by the
weaker one is simply an integral part of its piece value.

And IMO this is not the reason the 4A-9N example is so biased. The problem
there is that the pieces of one side are all worth more than TWICE that of
the other. Rooks against Knights would not have the same problem, as they
could still engage in R vs 2N trades, capturing a singly defended Knight,
in a normal exchange on a single square. But 3 vs 1 trades are almost
impossible to enforce, and require very special tactics.

It is easy enough to verify by playtesting that playing CCC vs AAA (as
substitutes for the normal super-pieces) will simply produce 3 times the
score excess of playing a normal setup with on one side a C deleted, and
at the other an A. The A side will still have only a single A to harrass
every C. Most squares on enemy territory will be covered by R, B, N or P
anyway, in addition to A, so the C could not go there anyway. And it is
not true that anything defended by A would be immune to capture by C, as
A+anything > C (and even 2A+anything > 2C. So defending by A will not
exempt the opponent from defending as many times as there is attack, by
using A as defenders. And if there was one other piece amongst the
defenders, the C had no chance anyway. 

The effect you point out does not nearly occur as easily as you think.
And, as you can see, only 5 of my different armies did have duplicated
superpieces. All the other armies where just what you would get if you
traded the mentioned pieces, thus detecting if such a trade would enhance
or deteriorate your winning chances or not.

H. G. Muller wrote on Sat, May 3, 2008 05:31 PM EDT:
Reinhard, if I understand you correct, what you basically want to introduce
in the evaluation is terms of the type w_ij*N_i*N_j, where N_i is the
number of pieces of type i of one side, and N_j is the number of pieces of
type j of the opponent, and w_ij is an tunable weight.

So that, if type i = A and type j = N, a negative w_ij would describe a
reduction of the value of each Archbishop by the presence of the enemy
Knights, through the interdiction effect. Such a term would for instance
provide an incentive to trade A in a QA vs ABNN for the QA side, as his A
is suppressed in value by the presence of the enemy N (and B), while the
opponent's A would not be similarly suppressed by our Q. On the contrary,
our Q value would be suppressed by the the opponent's A as well, so
trading A also benefits him there.

I guess it should be easy enough to measure if terms of this form have
significant values, by playing Q-BNN imbalances in the presence of 0, 1
and 2 Archbishops, and deducing from the score whose Archbishops are worth
more (i.e. add more winning probability). And similarly for 0, 1, 2
Chancellors each, or extra Queens. And then the same thing with a Q-RR
imbalance, to measure the effect of Rooks on the value of A, C or Q.

In fact, every second-order term can be measured this way. Not only for
cross products between own and enemy pieces, but also cooperative effects
between own pieces of equal or different type. With 7 piece types for each
side (14 in total) there would be 14*13/2 = 91 terms of this type possible.

Derek Nalls wrote on Sun, May 4, 2008 02:38 AM EDT:
'I never found any effect of the time control on the scores I measure for
some material imbalance. Within statistical error, the combinations I
tries produced the same score at 40/15', 40/20', 40/30', 40/40',
40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I
did not consider it worth doing just to prove that it was a waste of
time...'
_________

The additional time I normally give to playtesting games to improve the
move quality is partially wasted because I can only control the time per
move instead of the number of plies completed using most chess variant
programs.  This usually results in the time expiring while it is working
on an incomplete ply.  Then, it prematurely spits out a move
representative of an incomplete tour of the moves available within that
ply at a random fraction of that ply.  Since there is always more than one
move (often, a few-several) under evaluation as being the best possible
move [Otherwise, the chosen move would have already been executed.], this
means that any move on this 'list of top candidates' is equally likely
to be randomly executed.

Here are two typical scenarios that should cover what usually happens:

A.  If the list of top candidates in an 11-ply search consists of 6 moves
where the list of top candidates in a 10-ply search consists of 7 moves,
then only 1 discovered-to-be-less-than-the-best move has been successfully
excluded and cannot be executed.  

Of course, an 11-ply search completion may typically require est. 8-10
times as much time as the search completions for all previous plies (1-ply
thru 10-ply) up until then added together.

OR

B.  If the list of top candidates in an 11-ply search consists of 7 moves
[Moreover, the exact same 7 moves.] just as the preceding 10-ply search, 
then there is no benefit at all in expending 8-10 times as much time.
______________________________________________________________

The reason I endure this brutal waiting game is not for purely masochistic
experience but because the additional time has a tangible chance (although
no guarantee) of yielding a better move with every occasion.  Throughout
the numerous moves within a typical game, it can be realistically expected
to yield better moves on dozens of occasions.

We usually playtest for purposes at opposite extremes of the spectrum 
yet I regard our efforts as complimentary toward building a complete 
picture involving material values of pieces.

You use 'asymmetrical playtesting' with unequal armies on fast time 
controls, collect and analyze statistics ... to determine a range, with a
margin of error, for individual material piece values.

I remain amazed (although I believe you) that you actually obtain any 
meaningful results at all via games that are played so quickly that the AI
players do not have 'enough time to think' while playing games so complex
that every computer (and person) needs time to think to play with minimal
competence.  Can you explain to me in a way I can understand how and why
you are able to successfully obtain valuable results using this method? 
The quality of your results was utterly surprising to me.  I apologize for
totally doubting you when you introduced your results and mentioned how you
obtained them.

I use 'symmetrical playtesting' with identical armies on very slow time
controls to obtain the best moves realistically possible from an
evaluation function thereby giving me a winner (that is by some margin
more likely than not deserving) ... to determine which of two sets of
material piece values is probably (yet not certainly) better. 
Nonetheless, as more games are likewise played ...  If they present a
clear pattern, then the results become more probable to be reliable, 
decisive and indicative of the true state of affairs.

The chances of flipping a coin once and it landing 'heads' are equal to
it landing 'tails'.  However, the chances of flipping a coin 7 times and
it landing 'heads' all 7 times in a row are 1/128.  Now, replace the
concepts 'heads' and 'tails' with 'victory' and 'defeat'.  I
presume you follow my point.

The results of only a modest number of well-played games can definitely
establish their significance beyond chance and to the satisfaction of 
reasonable probability for a rational human mind.  [Most of us, including
me, do not need any better than a 95%-99% success to become convinced that
there is a real correlation at work even though such is far short of an
absolute 100% mathematical proof.]

In my experience, I have found that using any less than 10 minutes per
move will cause at least one instance within a game when an AI player
makes a move that is obvious to me (and correctly assessed as truly being)
a poor move.  Whenever this occurs, it renders my playtesting results 
tainted and useless for my purposes.  Sometimes this occurs during a 
game played at 30 minutes per move.  However, this rarely occurs during 
a game played at 90 minutes per move.

For my purposes, it is critically important above all other considerations
that the winner of these time-consuming games be correctly determined 
'most of the time' since 'all of the time' is impossible to assure.
I must do everything within my power to get as far from 50% toward 100%
reliability in correctly determining the winner.  Hence, I am compelled to
play test games at nearly the longest survivable time per move to minimize
the chances that any move played during a game will be an obviously poor 
move that could have changed the destiny of the game thereby causing 
the player that should have won to become the loser, instead.  In fact, 
I feel as if I have no choice under the circumstances.

Reinhard Scharnagl wrote on Sun, May 4, 2008 03:09 AM EDT:
Harm, I think of a more simple formula, because it seems to be easier to
find out an approximation than to weight a lot of parameters facing a lot
of other unhanded strange effects. Therefore my less dimensional approach
is looking like: f(s := sum of unbalanced big pieces' values,  n :=
number of unbalanced big pieces, v := value of biggest opponents' piece).

So I intend to calculate the presumed value reduction e.g. as:

(s - v*n)/constant

P.S.: maybe it will make sense to down limit v by s/(2*n) to prevent a too big reduction, e.g. when no big opponents' piece would be present at all.  

P.P.S.: There have been some more thoughts of mine on this question. Let w := sum of n biggest opponent pieces, limited by s/2. Then the formula should be:

(s - w)/constant

P.P.P.S.: My experiments suggest, that the constant is about 2.0

P^4.S.: I have implemented this 'Elephantiasis-Reduction' (as I will name it) in a new private SMIRF version and it is working well. My constant is currently 8/5. I found out, that it is good to calculate one more piece than being without value compensation, because that bottom piece pair could be of switched size and thus would reduce the reduction. Non existing opponent pieces will be replaced by a Knight piece value within the calculation. I noticed a speeding up of SMIRF when searching for mating combinations (by normal play). I also noticed that SMIRF is making sacrifices, incorporating vanishing such penalties of the introduced kind.

H. G. Muller wrote on Sun, May 4, 2008 04:57 AM EDT:
Derek Nalls: | The additional time I normally give to playtesting games to improve | the move quality is partially wasted because I can only control the | time per move instead of the number of plies completed using most | chess variant programs. Well, on Fairy-Max you won't have that problem, as it always finishes an iteration once it decides to start it. But although Fairy-Max might be stronger than most other variant-playing AIs you use, it is not stronger than SMIRF, so using it for 10x8 CVs would still be a waste of time. Joker80 tries to minimize the time wastage you point out by attempting only to start iterations when it has time to finish them. It cannot always accurately guess the required time, though, so unlike Fairy-Max it has built in some emergency breaks. If they are triggered, you would have an incomplete iteration. Basically, the mechanism works by stopping to search new moves in the root if there already is a move with a similar score as on the previous iteration, once it gets in 'overtime'. In practice, these unexpectedly long iterations mainly occur when the previously best move runs into trouble that so far was just beyond the horizon. As the tree for that move will then look completely different from before, it takes a long time to search (no useful information in the hash), and the score will have a huge drop. It then continues searching new moves even in overtime in a desparate attempt to find one that avoids the disaster. Usually this is time well spent: even if there is no guarantee it finds the best move of the new iteration, if it aborts it early, it at least has found a move that was significantly better than that found in the previous iteration. Of course both Joker80 and Fairy-Max support the WinBoard 'sd' command, allowing you to limit the depth to a certain number of plies, although I never use that. I don't like to fix the ply depth, as it makes the engine play like an idiot in the end-game. | Can you explain to me in a way I can understand how and why | you are able to successfully obtain valuable results using this | method? Well, to start with, Joker80 at 1 sec per move still reaches a depth of 8-9 ply in the middle-game, and would probably still beat most Humans at that level. My experience is that, if I immediately see an obvious error, it is usually because the engine makes a strategic mistake, not a tactical one. And such strategic mistakes are awefully persistent, as they are a result of faulty evaluation, not search. If it makes them at 8 ply, it is very likely to make that same error at 20 ply. As even 20 ply is usually not enough to get the resolution of the strategical feature within the horizon. That being said, I really think that an important reason I can afford fast games is a statistical one: by playing so many games I can be reasonably sure that I get a representative number of gross errors in my sample, and they more or less cancel each other out on the average. Suppose at a certain level of play 2% of the games contains a gross error that turns a totally won position into a loss. If I play 10 games, there is a 20% error that one game contains such an error (affecting my result by 10%), and only ~2% probability on two such errors (that then in half the cases would cancel, but in other cases would put the result off by 20%). If, OTOH, I would play 1000 faster games, with an increased 'blunder rate' of 5% because of the lower quality, I would expect 50 blunders. But the probability that they were all made by the same side would be negligible. In most cases the imbalace would be around sqrt(50) ~ 7. That would impact the 1000-game result by only 0.7%. So virtually all results would be off, but only by about 0.7%, so I don't care too much. Another way of visualizing this would be to imagine the game state-space as a2-dimensional plane, with two evaluation terms determining the x- and y-coordinate. Suppose these terms can both run from -5 to +5 (so the state space is a square), and the game is won if we end in the unit circle (x^2 + y^2 < 1), but that we don't know that. Now suppose we want to know how large the probability of winning is if we start within the square with corners (0,0) and (1,1) (say this is the possible range of the evaluation terms when we posses a certain combination of pieces). This should be the area of a quarter circle, PI/4, divided by the area of the square (1), so PI/4 = 79%. We try to determine this empirically by randomly picking points in the square (by setting up the piece combination in some shuffled configuration), and let the engines play the game. The engines know that getting closer or farther away of (0,0) is associated with changing the game result, and are programmed to maximize or minimize this distance to the origin. If they both play perfectly, they should by definition succeed in doing this. They don't care about the 'polar angle' of the game state, so the point representing the game state will make a random walk on a circle around the origin. When the game ends, it will still be in the same region (inside or outside the unit circle), and games starting in the won region will all be won. Now with imperfect play, the engines will not conserve the distance to the origing, but their tug of war will sometimes change it in favor of one or the other (i.e. towards the origin, or away from it). If the engines are still equally strong, by definition on the average this distance will not change. But its probability distribution will now spread out over a ring with finite width during the game. This might lead to won positions close to the boundary (the unit circle) now ending up outside it, in the lost region. But if the ring of final game states is narrow (width << 1), there will be a comparable number of initial game states that diffuse from within the unit circle to the outside, as in the other direction. In other words, the game score as a function of the initial evaluation terms is no longer an absolute all or nothing, but the circle is radially smeared out a little, making a smooth transition from 100% to 0% in a narrow band centered on the original circle. This will hardly affect the averaging, and in particular, making the ring wider by decreasing playing accuracy will initially hardly have any effect. Only when play gets so wildly inaccurate that the final positions (where win/loss is determined) diverge so far from the initial point that it could cross the entire circle, you will start to see effects on the score. In the extreme case wher the radial diffusion is so fast that you could end up anywhere in the 10x10 square when the game finishes, the result score will only be PI/100 = 3%. So it all depends on how much the imperfections in the play spread out the initial positions in the game-state space. If this is only small compared to the measures of the won and lost areas, the result will be almost independent of it.

Derek Nalls wrote on Sun, May 11, 2008 06:05 PM EDT:
Before Scharnagl sent me three special versions of SMIRF MS-174c compiled
with the CRC material values of Scharnagl, Muller & Nalls, I began
playtesting something else that interested me using SMIRF MS-174b-O.

I am concerned that the material value of the rook (especially compared to
the queen) amongst CRC pieces in the Muller model is too low:

rook  55.88
queen  111.76

This means that 2 rooks exactly equal 1 queen in material value.

According to the Scharnagl model:

rook  55.71
queen  91.20

This means that 2 rooks have a material value (111.42) 22.17% greater than
1 queen.

According to the Nalls model:

rook  59.43
queen  103.05

This means that 2 rooks have a material value (118.86) 15.34% greater than
1 queen.

Essentially the Scharnagl & Nalls models are in agreement in predicting
victories in a CRC game for the player missing 1 queen yet possessing 2
rooks.  By contrast, the Muller model predicts draws (or appr. equal
number of victories and defeats) in a CRC game for either player.

I put this extraordinary claim to the test by playing 2 games at 10
minutes per move on an appropriately altered Embassy Chess setup with the
missing-1-queen player and the missing-2-rooks player each having a turn
at white and black.

The missing-2-rooks player lost both games and was always behind.  They
were not even long games at 40-60 moves.

Muller:

I think you need to moderately raise the material value of your rook in
CRC.  It is out of its proper relation with the other material values
within the set.

H. G. Muller wrote on Mon, May 12, 2008 01:57 AM EDT:
To Derek:

I am aware that the empirical Rook value I get is suspiciously low. OTOH,
it is an OPENING value, and Rooks get their value in the game only late.
Furthermore, this only is the BASE VALUE of the Rook; most pieces have a
value that depends on the position on the board where it actually is, or
where you can quickly get it (in an opening situation, where the opponent
is not yet able to interdict your moves, because his pieces are in
inactive places as well). But Rooks only increase their value on open
files, and initially no open files are to be seen. In a practical game, by
the time you get to trade a Rook for 2 Queens, there usually are open
files. So by that time, the value of the Q vs 2R trade will have gone up
by two times the open-file bonus. You hardly have the possibility of
trading it before there are open files. So it stands to reason that you
might as well use the higher value during the entire game.

In 8x8 Chess, the Larry Kaufman piece values include the rule that a Rook
should be devaluated by 1/8 Pawn for each Pawn on the board there is over
five. In the case of 8 Pawns that is a really large penalty of 37.5cP for
having no open files. If I add that to my opening value, the late
middle-game / end-game value of the Rook gets to 512, which sounds a lot
more reasonable.

There are two different issues here:
1) The winning chances of a Q vs 2R material imbalance game
2) How to interpret that result as a piece value

All I say above has no bearing on (1): if we both play a Q-2R match from
the opening, it is a serious problem if we don't get the same result. But
you have played only 2 games. Statistically, 2 games mean NOTHING. I don't
even look at results before I have at least 100 games, because before they
are about as likely to be the reverse from what they will eventually be,
as not. The standard deviation of the result of a single Gothic Chess game
is ~0.45 (it would be 0.5 point if there were no draws possible, and in
Gothic Chess the draw percentge is low). This error goes down as the
square root of the number of games. In the case of 2 games this is
45%/sqrt(2) = 32%. The Pawn-odds advantage is only 12%. So this standard
error corresponds to 2.66 Pawns. That is 1.33 Pawns per Rook. So with this
test you could not possibly see if my value is off by 25, 50 or 75. If you
find a discrepancy, it is enormously more likely that the result of your
2-game match is off from to true win probability.

Play 100 games, and the error in the observed score is reasonable certain
(68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only thn
you can see with reasonable confidence if your observations differ from
mine.

Derek Nalls wrote on Mon, May 12, 2008 03:06 PM EDT:
'You hardly have the possibility of trading it before there are open
files. So it stands to reason that you might as well use the higher value
during the entire game.'

Well, I understand and accept your reasons for leaving your lower rook 
value in CRC as is.  It is interesting that you thoroughly understand and
accept the reasons of others for using a higher rook value in CRC as
well.  Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic?
_____________________________

'... if we both play a Q-2R match from the opening, it is a serious
problem if we don't get the same result. But you have played only 2
games. Statistically, 2 games mean NOTHING.'

I never falsely claimed or implied that only 2 games at 10 minutes per 
move mean everything or even mean a great deal (to satisfy probability
overwhelmingly).  However, they mean significantly more than nothing.  
I cannot accept your opinion, based upon a purely statistical viewpoint,
since it is at the exclusion another applicable mathematical viewpoint.  
They definitely mean something ... although exactly how much is not 
easily known or quantified (measured) mathematically.
__________________________________________________

'I don't even look at results before I have at least 100 games, because
before they are about as likely to be the reverse from what they will 
eventually be, as not.'

Statistically, when dealing with speed chess games populated 
exclusively with virtually random moves ... YES, I can understand and 
agree with you requiring a minimum of 100 games.  However, what you 
are doing is at the opposite extreme from what I am doing via my 
playtesting method.

Surely you would agree that IF I conducted only 2 games with perfect 
play for both players that those results would mean EVERYTHING.  
Unfortunately, with state-of-the-art computer hardware and chess variant 
programs (such as SMIRF), this is currently impossible and will remain 
impossible for centuries-millennia.  Nonetheless, games played at 100 
minutes per move (for example) have a much greater probability of 
correctly determining which player has a definite, significant advantage 
than games played at 10 seconds per move (for example).

Even though these 'deep games' play of nowhere near 600 times better
quality than these 'shallow games' as one might naively expect
(due to a non-linear correlation), they are far from random events 
(to which statistical methods would then be fully applicable).  
Instead, they occupy a middleground between perfect play games and 
totally random games.  [In my studied opinion, the example 
'middleground games' are more similar to and closer to perfect play 
games than totally random games.]  To date, much is unknown to
combinatorial game theory about the nature of these 'middleground 
games'.

Remember the analogy to coin flips that I gave you?  Well, in fact, 
the playtest games I usually run go far above and beyond such random 
events in their probable significance per event.

If the SMIRF program running at 90 minutes per move casted all of its 
moves randomly and without any intelligence at all (as a perfect 
woodpusher), only then would my 'coin flip' analogy be fully applicable.
Therefore, when I estimate that it would require 6 games (for example) 
for me to determine, IF a player with a given set of piece values loses 
EVERY game, that there is only a 63/64 chance that the result is
meaningful (instead of random bad luck), I am being conservative to the
extreme.  The true figure is almost surely higher than a 63/64 chance.

By the way, if you doubt that SMIRF's level of play is intelligent and
non-random, then play a CRC variant of your choice against it at 90 
minutes per move.  After you lose repeatedly, you may not be able to 
credit yourself with being intelligent either (although you should) ... 
if you insist upon holding an impractically high standard to define the 
word.
______

'If you find a discrepancy, it is enormously more likely that the result
of your 2-game match is off from its true win probability.'

For a 2-game match ... I agree.  However, this may not be true for a 
4-game, 6-game or 8-game match and surely is not true to the extremes 
you imagine.  Everything is critically dependant upon the specifications 
of the match.  The number of games played (of course), the playing 
strength or quality of the program used, the speed of the computer and 
the time or ply depth per move are the most important factors.
_________________________________________________________

'Play 100 games, and the error in the observed score is reasonable
certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.'

It would require est. 20 years for me to generate 100 games with the 
quality (and time controls) I am accustomed to and somewhat satisfied 
with.  Unfortunately, it is not that important to me just to get you to
pay attention to the results for the benefit of only your piece values
model.  As a practical concern to you, everyone else who is working to
refine quality piece values models in FRC and CRC will have likely
surpassed your achievements by then IF you refuse to learn anything from
the results of others who use different yet valid and meaningful methods
for playtesting and mathematical analysis than you.

H. G. Muller wrote on Mon, May 12, 2008 06:12 PM EDT:
Drek Nalls:
| They definitely mean something ... although exactly how much is not 
| easily known or quantified (measured) mathematically.
Of course that is easily quantified. The entire mathematical field of
statistics is designed to precisely quantify such things, through
confidence levels and uncertainty intervals. The only thing you proved
with reasonable confidence (say 95%) is that two Rooks are not 1.66 Pawn
weaker than a Queen. So if Q=950, then R > 392. Well, no one claimed
anything different. What we want to see is if Q-RR scores 50% (R=475) or
62% (R=525). That difference just can't be seen with two games. Play 100.
There is no shortcut. Even perfect play doesn't help. We do have perfect
play for all 6-men positions. Can you derive piece values from that, even
end-game piece values???

| Statistically, when dealing with speed chess games populated 
| exclusively with virtually random moves ... YES, I can understand and 
| agree with you requiring a minimum of 100 games.  However, what you 
| are doing is at the opposite extreme from what I am doing via my 
| playtesting method.
Where do you get this nonsense? This is approximately master-level play.
Fact is that results from playing opening-type positions (with 35 pieces
or more) are stochastic quantity at any level of play we are likely to see
the next few million years. And even if they weren't, so that you could
answer the question 'who wins' through a 35-men tablebase, you would
still have to make some average over all positions (weighted by relevance)
with a certain material composition to extract piece values. And if you
would do that by sampling, the resukt would again be a sochastic quantity.
And if you would do it by exhaustive enumeration, you would have no idea
which weights to use.
And if you are sampling a stochastic quantity, the error will be AT LEAST
as large as the statistical error. Errors from other sources could add to
that. But if you have two games, you will have at least 32% error in the
result percentage. Doesnt matter if you play at an hour per move, a week
per move, a year per move, 100 year per move. The error will remain >=
32%. So if you want to play 100 yesr per move, fine. But you will still
need 100 games.

| Nonetheless, games played at 100 minutes per move (for example) have 
| a much greater probability of correctly determining which player has 
| a definite, significant advantage than games played at 10 seconds per 
| move (for example).
Why do I get the suspicion that you are just making up this nonsense? Can
you show me even one example where you have shown that a certain material
advantage would be more than 3-sigma different for games at 100 min / move
than for games at 1 sec/move? Show us the games, then. Be aware that this
would require at least 100 games at aech time control. That seems to make
it a safe guess that you did not do that for 100 min/move.
 On the other hand, in stead of just making things up, I have actually
done such tests, not with 100 games per TC, but with 432, and for the
faster even with 1728 games per TC. And there was no difference beyond the
expected and unavoidable statistical fluctuations corresponding to those
numbers of games, between playing 15 sec or 5 minutes. 
The advantage that a player has in terms of winning probability is the
same at any TC I ever tried, and can thus equally reliably be determined
with games of any duration. (Provided ou have the same number of games).
If you think it would be different for extremely long TC, show us
statistically sound proof.

I might comment on the rest of your long posting later, but have to go
now...

Derek Nalls wrote on Mon, May 12, 2008 10:39 PM EDT:
'Of course, that is easily quantified. The entire mathematical field of
statistics is designed to precisely quantify such things, through
confidence levels and uncertainty intervals.'

No, it is not easily quantified.  Some things of numerical importance
as well as geometric importance that we try to understand or prove 
in the study of chess variants are NOT covered or addressed by statistics.
I wish our field of interest was that simple (relatively speaking) and
approachable but it is far more complicated and interdisciplinary.  
All you talk about is statistics.  Is this because statistics is all you
know well?
___________

'That difference just can't be seen with two games. Play 100.
There is no shortcut.'

I agree.  Not with only 2 games.  

However ...

With only 4 games, IF they were ALL victories or defeats for the player 
using a given piece values model, I could tell you with confidence 
that there is at least a 15/16 chance the given piece values model is 
stronger or weaker, respectively, than the piece values model used by 
its opponent.  [Otherwise, the results are inconclusive and useless.]

Furthermore, based upon the average number of moves per game 
required for victory or defeat compared to the established average 
number of moves in a long, close game, I could probably, correctly 
estimate whether one model was a little or a lot stronger or weaker, 
respectively, than the other model.  Thus, I will not play 100 games 
because there is no pressing, rational need to reduce the 'chance of 
random good-bad luck' to the ridiculously-low value of 
'the inverse of (base 2 to exponent 100)'.

Is there anything about the odds associated with 'flipping a coin'
that is beyond your ability to understand?  This is a fundamental 
mathematical concept applicable without reservation to symmetrical 
playtesting.  In any case, it is a legitimate 'shortcut' that I can and
will use freely.
________________

'Even perfect play doesn't help. We do have perfect play for all 6-men 
positions.'

I meant perfect play throughout an entire game of a CRC variant 
involving 40 pieces initially.  That is why I used the word 'impossible'
with reference to state-of-the-art computer technology.
_______________________________________________________

'This is approximately master-level play.'

Well, if you are getting master-level play from Joker80 with speed
chess games, then I am surely getting a superior level of play from 
SMIRF with much longer times and deeper plies per move.  You see,
I used the term 'virtually random moves' appropriately in a 
comparative context based upon my experience.
_____________________________________________

'Doesn't matter if you play at an hour per move, a week per move, 
a year per move, 100 year per move. The error will remain >=32%. 
So if you want to play 100 years per move, fine. But you will still
need 100 games.'

Of course, it matters a lot.  If the program is well-written, then the 
longer it runs per move, the more plies it completes per move
and consequently, the better the moves it makes.  Hence,
the entire game played will progressively approach the ideal of 
perfect play ... even though this finite goal is impossible to attain.
Incisive, intelligent, resourceful moves must NOT to be confused with 
or dismissed as purely random moves.  Although I could humbly limit 
myself to applying only statistical methods, I am totally justified,
in this case, in more aggressively using the 'probabilities associated 
with N coin flips ALL with the same result' as an incomplete, minimum 
value before even taking the playing strength of SMIRF at extremely-long 
time controls into account to estimate a complete, maximum value.
______________________________________________________________

'The advantage that a player has in terms of winning probability is the
same at any TC I ever tried, and can thus equally reliably be determined
with games of any duration.'

You are obviously lacking completely in the prerequisite patience and 
determination to have EVER consistently used long enough time controls 
to see any benefit whatsoever in doing so.  If you had ever done so, 
then you would realize (as everyone else who has done so realizes) 
that the quality of the moves improves and even if the winning probability
has not changed much numerically in your experience, the figure you 
obtain is more reliable.  

[I cannot prove to you that this 'invisible' benefit exists
statistically. Instead, it is an important concept that you need to
understand in its own terms.  This is essential to what most playtesters do, with the notable exception of you.  If you want to understand what I do and why, then you must come to grips with this reality.]

Derek Nalls wrote on Mon, May 12, 2008 11:38 PM EDT:
CRC piece values tournament
http://www.symmetryperfect.com/pass/

Just push the 'download now' button.

Game #1
Scharnagl vs. Muller
10 minutes per move
SMIRF MS-174c

Result- inconclusive.
Draw after 87 moves by black.
Perpetual check declared.

H. G. Muller wrote on Tue, May 13, 2008 03:17 AM EDT:
This discussion is pointless. In dealing with a stochastic quantity, if
your statistics are no good, your observations are no good, and any
conclusions based on them utterly meaningless. Nothing of what you say
here has any reality value, it is just your own fantasies. First you
should have results, then it becomes possible to talk about what they
mean. You have no result. Get statistically meaningful testresults. If
your method can't produce them, or you don't feel it important enough to
make your method produce them, don't bother us with your cr*p instead.

Two sets of piece values as different as day and knight, and the only
thing you can come up with is that their comparison is 'inconclusive'.
Are you sure that you could conclusively rule out that a Queen is worth 7,
or a Rook 8, by your method of 'playtesting'? Talk about pathetic: even
the two games you played are the same. Oh man, does your test setup s*ck!
If you cannot even decide simple issues like this, what makes you think
you have anything meaningful to say about piece values at all?

H. G. Muller wrote on Tue, May 13, 2008 06:59 AM EDT:
Once upon a time I had a friend in a country far, far away, who had
obtained a coin from the bank. I was sure this coin was counterfeit, as it
had a far larger probability of producing tails. I even PROVED it to him: I
threw the coin twice, and both times tails came up. But do you think the
fool believed me? No, he DIDN'T! 

He had the AUDACITY to claim there was nothing wrong with the coin,
because he had tossed it a thouand times, and 523 times heads had come up!
While it was clear to everyone that he was cheating: he threw the coin only
10 feet up into the air, on each try. While I brought my coin up to 30,000
feet in an airplane, before I threw it out of the window, BOTH times! And,
mind you, both times it landed tails! And it was not just an ordinary
plane, like a Boeing 747. No sir, it was a ROCKET plane!

And still this foolish friend of mine insisted that his measly 10 feet
throws made him more confident that the coin was OK then my IRONCLAD PROOF
with the rocket plane. Ridicuoulous! Anyone knows that you can't test a
coin by only tossing it 10 feet. If you do that, it might land on any
side, rather than the side it always lands on. He might as well have
flipped a coin! No wonder they send him to this far, far away country: no
one would want to live in the same country as such an idiot. He even went
as far as to buy an ICECREAM for that coin, and even ENJOYED eating that!
Scandalous! I can tell you, he ain't my friend anymore! Using coins that
always land on one side as if it were real money.

For more fairy tales and bed-time stories, read Derek's postings on piece
values...
:-) :-) :-)

Jianying Ji wrote on Tue, May 13, 2008 08:59 AM EDT:
Two suggestion for settling debates such as these. First distributed
computing to provide as much data as possible. And bayesian statistical
methods to provide statistical bounds on results.

H. G. Muller wrote on Tue, May 13, 2008 09:58 AM EDT:
Jianying Ji:
| Two suggestion for settling debates such as these. First distributed
| computing to provide as much data as possible. And bayesian statistical
| methods to provide statistical bounds on results.

Agreed: one first needs to generate data. Without data, there isn't even
a debate, and everything is just idle talk. What bounds would you expect
from a two-game dataset? And what if these two games were actually the
same?

But the problem is that the proverbial fool can always ask more than
anyone can answer. If, by recruting all PCs in the World, we could
generate 100,000 games at an hour per move, an hour per move will of
course not be 'good enough'. It will at least have to be a week per
move. Or, if that is possible, 100 years per move.

And even 100 years per move are of course no good, because the computers
will still not be able to search into the end-game, as they will search
only 12 ply deeper than with 1 hour per move. So what's the point?

Not only is his an énd-of-the-rainbow-type endeavor, even if you would get
there, and generate the perfect data, where it is 100% sure and prooven for
each position what the outcome under perfect play is, what then? Because
for simple end-games we are alrady in a position to reach perfect play,
through retrograde analysis (tablebases).

So why not start there, to show that such data is of any use whatsoever,
in this case for generating end-game piece values? If you have the EGTB
for KQKAN, and KAKBN, how would you extract a piece value for A from it?

Derek Nalls wrote on Tue, May 13, 2008 11:08 AM EDT:
'This discussion is pointless.'

On this one occasion, I agree with you.

However, I cannot just let you get away with some of your most 
outrageous remarks to date.

So, unfortunately, this discussion is not yet over.
____________________________________________

'First you should have results, 
then it becomes possible to talk about what they mean. 
You have no result.'

Of course, I have a result!

The result is obviously the game itself as a win, loss or draw
for the purposes of comparing the playing strengths of two
players using different sets of CRC piece values.

The result is NOT statistical in nature.
Instead, the result is probabilistic in nature.

I have thoroughly explained this purpose and method to you.
I understand it.
Reinhard Scharnagl understands it.
You do not understand it.
I can accept that.
However, instead of admitting that you do not understand it,
you claim there is nothing to understand.
______________________________________

'Two sets of piece values as different as day and night, and the only
thing you can come up with is that their comparison is
'inconclusive'.'

Yes.  Draws make it impossible to determine which of two sets of
piece values is stronger or weaker.  However, by increasing the
time (and plies) per move, smaller differences in playing strength 
can sometimes be revealed with 'conclusive' results.

I will attempt the next pair of Scharnagl vs. Muller and Muller vs.
Scharnagl games at 30 minutes per move.  Knowing how much
you appreciate my efforts on your behalf motivates me.
___________________________________________________

'Talk about pathetic: even the two games you played are the same.'

Only one game was played.

The logs you saw were produced by the Scharnagl (standard) version
of SMIRF for the white player and the Muller (special) version of SMIRF
for the black player.  The game is handled in this manner to prevent 
time from being expired without computation occurring.
___________________________________________________

'... does your test setup s*ck!'

What, now you hate Embassy Chess too?
Take up this issue with Kevin Hill.

Jianying Ji wrote on Tue, May 13, 2008 11:28 AM EDT:
I really am completely lost, so I won't comment until I can see what the
debate is about.

Reinhard Scharnagl wrote on Tue, May 13, 2008 11:50 AM EDT:
H.G.M. wrote: '... he threw the coin only 10 feet up into the air, on each try. While I brought my coin up to 30,000 feet in an airplane ...'

Understanding your example as an argument against Derek Nalls' testing method, I wonder why your chess engines always are thinking using the full given timeframe. It would be much more impressive, if your engine would decide always immediately. ;-)

I am still convinced, that longer thinking times would have an influence on the quality of the resulting moves.

Derek Nalls wrote on Tue, May 13, 2008 12:18 PM EDT:
Since I had to endure one of your long bedtime stories (to be sure),
you are going to have to endure one of mine.  Yet unlike yours
[too incoherent to merit a reply], mine carries an important point:

Consider it a test of your common sense-

Here is a scenario ...

01.  It is the year 2500 AD.

02.  Androids exist.

03.  Androids cannot tell lies.

04.  Androids can cheat, though.

05.  Androids are extremely intelligent in technical matters.

06.  Your best friend is an android.

07.  It tells you that it won the lottery.

08.  You verify that it won the lottery.

09.  It tells you that it purchased only one lottery ticket.

10.  You verify that it purchased only one lottery ticket.

11.  The chance of winning the lottery with only one ticket is 1 out of
100 million.

12.  It tells you that it cheated to win the lottery by hacking into its
computer system immediately after the winning numbers were announced,
purchasing one winning ticket and back-dating the time of the purchase.
____________________________________________

You have only two choices as to what to believe happened-

A.  The android actually won the lottery by cheating.

OR

B.  The android actually won the lottery by good luck.
The android was mistaken in thinking it successfully cheated.
______________________________________________________

The chance of 'A' being true is 99,999,999 out of 100,000,000.
The chance of 'B' being true is 1 out of 100,000,000.
________________________________________________

I would place my bet upon 'A' being true
because I do not believe such unlikely coincidences
will actually occur.

You would place your bet upon 'B' being true
because you do not believe such unlikely coincidences
have any statistical significance whatsoever.
_________________________________________

I make this assessment of your judgment ability fairly because you think
it is a meaningless result if a player with one set of CRC piece values
wins against its opponent 10-times-in-a-row even as the chance of it being
'random good luck' is indisputably only 1 out of 1024.

By the way ...

base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376.

Can you see how ridiculous your demand of 100 games is?

H. G. Muller wrote on Tue, May 13, 2008 12:57 PM EDT:
Is this story meant to illustrate that you have no clue as to how to
calculate statistical significance? Or perhaps that you don't know what
it is at all?

The observation of a single tails event rules out the null hypothesis that
the lottery was fair (i.e. that the probability for this to happen was
0.000,000,01) with a confidence of 99.999,999%.

Be careful, though, that this only describes the case where the winning
android was somehow special or singled out in advance. If the other
participants to the lottery were 100 million other cheating androids, it
would not be remarkable in anyway that one of them won. The null
hypothesis that the lottery was fair predicted a 100% probability for
that.

But, unfortunately for you, it doesn't work for lotteries with only 2
tickets. Then you can rule the null hypothesis that the lottery was fair
(and hence the probability 0.5) with a confidence of 50%. And 50%
confidence means that in 50% of the cases your conclusion is correct, and
in the other 50% of the cases not. In other words, a confidence level of
50% is a completely blind, uninformed random guess.

H. G. Muller wrote on Tue, May 13, 2008 01:06 PM EDT:
Reinhard Scharnagl:
| I am still convinced, that longer thinking times would have an 
| influence on the quality of the resulting moves.

Yes, so what? Why do you think that is a relevant remark? The better moves
won't help you at all, if the opponent also does better moves. The result
will be the same. And the rare cases it is not, on the average cancel each
other.

So for the umptiest time:
NO ONE DENIES THAT LONGER THINKING TIME PRODUCES SOMEWHAT BETTER MOVES.
THE ISSUE IS THAT IF BOTH SIDES PLAY WITH LONGER TC, THEIR WINNING
PROBABILITIES WON'T CHANGE.

And don't bother to to tell us that you are also convinced that the
winning probabilities will change, without showing us proof. Because no
one is interested in unfounded opinions, not even if they are yours.

Derek Nalls wrote on Tue, May 13, 2008 01:27 PM EDT:
'Is this story meant to illustrate that you have no clue as to how to
calculate statistical significance?'

No.

This story is meant to illustrate that you have no clue as to how to
calculate probabilistic significance ... and it worked perfectly.
________________________________________________________

There you go again.  Missing the point entirely and ranting about
probabilities not being proper statistics.

Reinhard Scharnagl wrote on Tue, May 13, 2008 03:05 PM EDT:
To H.G.M.: why have you to be that unfriendly? But to give you a strong
argument, that longer thinking phases could change a game result: have a
look at: 
[site removed], 
where [a claim is made], that there would be a mate in 9. In
fact there SMIRF has been in a lost situaton. But watching a chess engine
calculate on that position, you could see, that an initial heavy
disadvantage switches into a secure win. Having engines calculate with
short time frames would probably lead to another result. Here increasing
thinking time indeed is leading to a result switch.

[The above has been edited to remove a name and site reference. It is the
policy of cv.org to avoid mention of that particular name and site to
remove any threat of lawsuits. Sorry to have to do that, but we must
protect ourselves. -D. Howe]

H. G. Muller wrote on Tue, May 13, 2008 05:13 PM EDT:
Reinhard, that is not relevant. It will happen on the average as often for
the other side. It is in the nature of Chess. Every game that is won, is
won by an error, that might not have been made on longer thinking. As the
initial position is not a won position for eaiter side. But most games are
won by either side, and if they are allowed to think longer, most games are
still won by either side.

What is so hard to understand about the statement 'the win probability
(score fraction, if you allow for draws) obtained from a given quiet, but
complex (many pieces) position between equal opponents does not depend on
time control' that it prompt people to come up with irrelevancies? Why do
you think that saying anything at all that does not mention an observed
probability would have any bearing on this statement whatsoever?

I don't think the ever more hollow sounding selfdeclared superiority of
Derek need much comment. He obviously doesn't know zilch about
probability theory and statistics. Shouting that he does won't make it
so, and won't fool anyone.

H. G. Muller wrote on Wed, May 14, 2008 03:09 AM EDT:
This discussion is too silly for words anyway. Because even if it were true that the winning probability for a given material imbalance would be different at 1 hour per move than it would be at 10 sec/move, it would merely mean that piece values are different for different quality players. And although that is unprecedented, that revelation in itself would not make the piece values at 1 hour per move of any use, as that is a time control that no one wants to play anyway.

So the whole endeavor is doomed from the start: by testing at 1 hour per move, either you measure the same piece values as you would at 10  sec/move, and wasted 99.7% of your time, or you find different values, and then you have wrong values, which cannot be used at any time control you would actually want to play...

Rich Hutnik wrote on Wed, May 14, 2008 10:26 PM EDT:
Here is another approach I would suggest for strength of pieces.  How about
we pick 100 and people order them from strongest to weakest?  Work on a
scoring system for position, and then at least get an idea of order of
strength.

Anyone think this might be a sound approach?

H. G. Muller wrote on Thu, May 15, 2008 12:22 PM EDT:
Rich Hutnik:
| Anyone think this might be a sound approach?

Well, not me! Science is not a democracy. We don't interview people in
the street to determine if a neutron is heavier than a proton, or what the 100th decimal of the number pi is.

At best, you could use this method to determine the CV rating of the
interviewed people. But even if a million people would think that piece A
is worth more than piece B, and none the other way around, that doesn't
make it so. The only thing that counts is if A makes you win more often
than B would. If it doesn't, than it is of lower value. No matter what people say, or how many say it.

Derek Nalls wrote on Mon, May 19, 2008 05:58 PM EDT:
To anyone who was interested ...

My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.

Derek Nalls wrote on Mon, May 19, 2008 06:13 PM EDT:
Since Muller's Joker80 has recently established itself via 'The Battle Of
The (Unspeakables)' tournament as the best free CRC program in the world,
I checked it out.  I must report that setting-up Winboard F (also written
by Muller) to use it was straight-forward with helpful documentation. 
Generally, I am finding the features of Joker80 to be versatile and
capable for any reasonable uses.

Derek Nalls wrote on Mon, May 19, 2008 06:28 PM EDT:
Muller:

I would like to conduct two focused playtests using Joker80 at very long
time controls (e.g., 30 minutes per move) to investigate these important questions-

1.  Is Muller's rook value within the CRC set too low?
2.  Is Scharnagl's archbishop value within the CRC set too low?

I would need for you to compile special versions of Joker80 for me using
significantly different values for those CRC pieces as well as
Scharnagl's CRC piece set.  To isolate the target variable, these games would be Muller (standard values) vs. Muller (test values) and Scharnagl (standard values) vs. Scharnagl (test values) via symmetrical playtesting.  Anyway, we can discuss the details if you are interested or willing.  Please let me know.

Joe Joyce wrote on Mon, May 19, 2008 07:40 PM EDT:
This sounds like an interesting proposition.

Derek Nalls wrote on Mon, May 19, 2008 09:13 PM EDT:
Muller:

Please investigate this potentially serious bug I may have discovered
while testing Joker80 under Winboard F ...

Bugs, Bugs, Bugs!
http://www.symmetryperfect.com/pass

I am having a hard time with software today.

H. G. Muller wrote on Tue, May 20, 2008 02:39 AM EDT:
First about the potential bug: I am afraid that I need more information to figure out what exactly was the problem. This is not a plain move-generator bug; when I feed the game to to my version of Joker80 here (which is presumably the same as that you are using), it accepts the move without complaints. It would be unconceivable anyway that a move-generator bug in such a common move would not have manifested itself in the many hundreds of games I had it play against other engines. OTOH, Human vs. engine play is virtually untested. Did you at any point of the game use 'undo' (through the WinBoard 'retract move')? It might be that the undo is not correctly implemented, and I would not notice it in engine-engine play. In fact it is very likely to be broken fter setting up a position, as I implemented it by resetting to the opening position and replaying all moves from there. But this won't work after loading a FEN (a feature I added only later). This is indeed something I should fix, but the current work-around would be not to use 'undo'. To make sure what happened, I would have to see the winboard.debug file (which records all communication between engine and GUI, including a lot of debug output from the engine itself). Unfortunately this file is not made by default. You would have to start WinBoard with the command-line option /debug, or press + + after starting WinBoard. And then immediately rename the winboard.debug to something else if a bug manisfests itself, to prevent it from being overwritten when you run WinBoard again. Joker80 also makes a log file 'jokerlog.txt', but this also is overwritten each time you re-run it. If you didn't run Joker80 since the bug, it might help if you sent me that file. Otherwise, I am afraid that there is little I can do at the moment; we would have to wait until the problem occurs again, and examine the recorded debug information. About the piece values: I could make a Joker80 version that reads the piece base values from a file 'joker.ini' at startup. Then you could change them to anything you want to test, without the need to re-compile. Would that satisfy your needs? Note that currently Joker80 is not really able to play CRC, as it only supports normal castling

Derek Nalls wrote on Tue, May 20, 2008 03:16 AM EDT:
'Human vs. engine play is virtually untested. 
Did you at any point of the game use 'undo'
(through the WinBoard 'retract move')?'

Yes.
Many of us error-prone humans use it frequently.
________________________________________________

'This is indeed something I should fix but
the current work-around would be not to use 'undo'.'

Makes sense to me.
I can avoid using the 'retract move' command altogether.
________________________________________________________

'I could make a Joker80 version that reads the piece base values from a
file 'joker.ini' at startup. Then you could change them to anything you
want to test, without the need to re-compile. Would that satisfy your
needs?'

Yes, better than I ever imagined.
Thank you!

H. G. Muller wrote on Tue, May 20, 2008 07:06 AM EDT:
OK, I replaced the joker80.exe on my website by one with adjustable piece
values. (If you run it from the command line, it should say version 1.1.14
(h).) I also tried to fix the bug in undo (which I discoverd was disabled
altogether in the previous version), and although it seemed to work, it
might remain a weak spot. (I foresee problems if the game contained a
promotion, for instance, as it might not remember the correct promotion
piece on replay.) So try to avoid using the undo.

I decided to make the piece values adjustable through a command-line
option, rather than from a file, to avoid problems if you want to run two
different sets of piece values (where you then would have to keep the
files separate somehow). The way it works now is that for the engine name
(that WinBoard asks in the startup dialog, or that you can put in the
winboard.ini file to appear in the selectable engines there), you should
write:

joker80.exe P85=300=350=475=875=900=950

The whole thing should be put between double quotes, so that WinBoard
knows the P... is an option to the engine, and not to WinBoard. The
numerical values are those of P, N, B, R, A, C and Q, respectively, in
centiPawn. You can replace them by any value you like. If you don't give
the P argument, it uses the default values. If you give a P argument with
not enough values, the engine exits.

Note that these are base values, for the positionally average piece. For N
and B this would be on c3, in the presence (for B) of ~ 6 own Pawns, half
of them on the color of the Bishop. A Bishop pair further gets 40cP bonus.
For the Rook it is the value for one in the absence of (half-)open files.
The Pawn value will be heavily modified by positional effects
(centralization, support by own Pawns, blocking by enemy Pawns), which on
the average will be positive.

Note that you can play two different versions against each other
automatically. The first engine plays white, in two-machines mode. (You
won't be able to recognize them from their name...)

H. G. Muller wrote on Tue, May 20, 2008 07:39 AM EDT:
One small refinement:

If the command-line argument was used to modify the piece values, Joker80
will give its own name to WinBoard as 'Joker80.xp', in stead of
'Joker80.np', so that it becomes less hard to figure out which engine
was winning (e.g. from the PGN file).

Note also that at very long time control you might want to enlarge the
hash table; default is 128MB, but if you invoke Joker80 as

'joker80.exe 22 P100=300=....'

it will use 256MB (and with 23 in stead of 22 it will use 512MB, etc.)

Derek Nalls wrote on Tue, May 20, 2008 12:48 PM EDT:
Everything is working fine.
Thank you!

I now have 12 instances of the Joker80 program running in various
sub-directories of Winboard F with the 'winboard.ini' file set to
conveniently initiate any desired standard or special material values for
the CRC models by Muller, Scharnagl and Nalls.

In the first test, I am going to attempt to find a playtesting time where
a distinct seperation in playing strength occurs between the standard
Muller model wherein the rook is 1 pawn more valuable than the bishop and
a special Muller model wherein the rook is 2 pawns more valuable than the
bishop.  If I successfully find a playtesting time that is survivable by
humans, then we can hopefully establish a tentative probability as to
which CRC model plays decisively better after a few-several games.

At par 100 (for the pawn), the bishop is at 459 under both models with the
rook at 559 under the standard Muller model and 659 under the special
Muller model.

I want to playtest a special Muller model with a rook value 2.00 pawns higher than the bishop because the Nalls model has a rook value 2.19 pawns higher than the bishop and the Scharnagl model has a rook value 1.94 pawns higher than the bishop (for an average of 2.06 pawns).

Since I am attempting to test for such a small difference in the material value of only one type of piece (the rook), I have doubts that I will be able to obtain conclusive results.  In any case ... If I obtain conclusive results, then very long time controls will surely be required to produce them.

H. G. Muller wrote on Tue, May 20, 2008 02:43 PM EDT:
Well, to get an impression at what you can expect: In my first versions of
Joker80 I still used the Larry-Kaufman piece values of 8x8 Chess. So the
Bishop was half a Pawn too low, nearly equal to the Knight (as with more
than 5 Pawns, Kaufman has a Knight worth more than a lone Bishop,
neutraling a large part of the pair bonus.) Now unlike a Rook, a Bishop is
very easy to trade for a Knight, as both get into play early. Making the
trade usually wrecks the opponent's pawn structure by creating a doubled
Pawn, giving enough compensation to make it attractive.

So in almost all games Joker played with two Knights against two Bishops
after 12 moves or so. Fixing that did increase the playing strength by
~100 Elo points. So where the old version would score 50%, the improved
version would score 57%.

Now a similarly bad value for the Rook would manifest itself much more
difficultly: the Rooks get into play late, there is no nearly equal piece
for which a 1:1 trade changes sign, and you would need 1:3 trades (R vs
B+2P) or 2:2 trades (R+P for N+N), which are much more difficult to set
up. So I would expect that being half a Pawn off on the Rook value would
only reduce your score by about 3%, rather than 7% as with the Bishop.
After playing 100 games, the score differs by more than 3% from the true
win probability more often than not. So you would need at least 400 games
to show with minimal confidence that there was a difference.

Beware that the result of the games are stochastic quantities. Replay the
game at the same time control, and the game Joker80 plays will be
different. And often the result will be different. This is true at 1 sec
per move, but it is equally true at 1 year per move. The games that will
be played, are just a sample from the myriads of games Joker80 could play
with non-zero probability. And with fewer than 400 games, the difference
between the actually measured score percentage and the probability you
want to determine will be in most cases larger than the effect of the
piece values, if they are not extremey wrong (e.g. setting Q < B).

Derek Nalls wrote on Tue, May 20, 2008 05:05 PM EDT:
Of course, I would bet anything that there are no 1:1 exchanges supported
under the standard Muller CRC model that could cause material losses.  If
that were the case, yours would not be one of the three most credible CRC
models under close consideration.  In fact, even your excellent Joker80
program would play poorly if stuck with using faulty CRC piece values.

Obviously, the longer the exchange, the rarer its occurrence during
gameplay.  The predominance of simple 1:1 exchanges over even the least
complicated, 1:2 or 2:1 exchanges, in gameplay is large although I do not
know the stats.

In fact, there is a certain 1:2 or 2:1 exchange I am hoping to see that is
likely to support my contention that the Muller rook value should be
higher: the 1 queen for 2 rooks or 2 rooks for 1 queen exchange.  Please
recall that under the standard Muller model, this is an equal exchange. 
However, under asymmetrical playtesting of comparable quality to and
similar to that I used to confirm the correctness of your higher
archbishop value, I played numerous CRC games at various moderate time
controls where the player without 1 queen (yet with 2 rooks) defeated the
player without 2 rooks (yet with 1 queen).  Ultimately, a key mechanism to conclusive results is that while the standard Muller model is neutral toward a 2 rook : 1 queen or 1 queen : 2 rook exchange, the special Muller model regards its 1 queen as significantly less valuable than 2 rooks of its opponent.  Consequently, this contrast in valuation could be played into ... and we would see who wins.

I am actually pleased that you are a realist who shares my pessimism in
this experiment.  In any case, low odds do not deter a best effort to
succeed.  The main difference between us is that you calculate your
pessimism by extreme statistical methods whereas I calculate my pessimism
by moderate probabilistic methods.  I remain hopeful that eventually I
will prove to you that the method Scharnagl & I developed is occasionally
productive.

Derek Nalls wrote on Tue, May 20, 2008 05:17 PM EDT:
Muller:

Please confirm that these are legal values for the 'winboard.ini' file.

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P100=353=459=559=1029=1059=1118'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P100=353=459=659=1029=1059=1118'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P100=306=363=557=702=912=960'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P100=306=363=557=866=912=960'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P100=353=459=559=1029=1059=1118'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P100=353=459=659=1029=1059=1118'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P100=306=363=557=702=912=960'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P100=306=363=557=866=912=960'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P100=308=376=594=940=958=1031'
'C:\winboard-F\TJchess\TJChess10x8'
}

H. G. Muller wrote on Wed, May 21, 2008 08:48 AM EDT:
It looks OK to me.

One caveat: the normalization (e.g. Pawn = 100) is not completely
arbitrary, as the engine weights material against positional terms, and
doubling all piece values would effectively scale down the importance of
passers and King Safety.

In addition, the engine also uses some heavily rounded 'quick' piece
values internally, where B=N=3, R=5, A=C=8 and Q=9, to make a rough guess
if certain branches stand any chance to recoup the material it gave
earlier in the branche. So in certain situations, when it is behind 800
cP, it won't consider capturing a Rook, because it expects that to be
worth about 500 cP, and thus falls 300 cP below the target. Such a large
deficit would be beyond the safety margin for pruning the move. But if the
piece values where scaled up such that the 800 merely represented being a
Bishop behind, this obviously would be an unjustified pruning.

The safety margin is large enough to allow some leeway here, but don't
overdo it. It would be safest to keep the value of Q close to 950.

I am indeed skeptical to the possibility to do enough games to measure the
difference you want to see in the total score percentage. But perhaps some
sound conclusions could be drawn by not merely looking at the result, but
at the actual games, and single out the Q vs 2R trades. (Or actually any
Rook versus other material trade before the end-game. Rooks capturing
Pawns to prevent their promotion probably should not count, though.) These
could then be used to separately extracting the probability for such a
trade for the two sets of piece values, and determine the winning
probability for each of the piece values once such a trade would have
occurred. By filtering the raw data this way, we get rid of the stochastic
noise produced by the (majority of) games whwre the event we want to
determine the effect of would not have occurred.

Derek Nalls wrote on Wed, May 21, 2008 12:53 PM EDT:
As I moved to renormalize all of the values used in Joker80 (written into
the 'winboard.ini' file) with the pawn at a par of 85 points, I looked
at my notes again.  They reminded me that your use of the 'bishop pair'
refinement (with a bonus of 40 points) ramifies that the material value of
the rook is either 1.00 pawns or 1.47 pawns greater than the material value
of the bishop in CRC, depending upon whether or not only one bishop or both
bishops, respectively, remain in the game.  At that point, I realized that
I would be attempting to playtest for a discrepancy that I know from
experience is just too small to detect even at very long time controls. 
So, this planned test has been cancelled.

I am not implying that this matter is unimportant, though.  I remain
concerned for the standard Muller model whenever it allows the exchange of
its 2 rooks for 1 queen belonging to its opponent.

H. G. Muller wrote on Wed, May 21, 2008 01:49 PM EDT:
Well, I share that concern. But note that the low Rook value was not only
based on the result of Q-2R assymetric testing. I also played R-BP and
NN-RP, which ended unexpectedly bad for the Rook, and sets the value of
the Rook compared to that of the minor pieces. While the value of the
Queen was independently tested against that of the minor pieces by playing
Q-BNN.

The low difference between R and B does make sense to me now, as the wider
board should upgrade the Bishop a lot more than the Rook. The Bishop gets
extra forward moves, and forward moves are worth a lot more than lateral
moves. I have seen that in testing cylindrical pieces, (indicated by *),
where the periodic boundary condition w.r.t. the side edges effectifely
simulates an infinitely wide board. In a context of normal Chess pieces,
B* = B+P, while R* = R + 0.25P. OTOH, Q* = Q+2P. So it doesn't surprise
me that on wider boards R loses compared to Q and B.

I can think of several systematic errors that lead to unrealistically poor
performance of the Rook in asymmetric playtesting from an opening position.
One is that Capablanca Chess is a very violent game, where the three
super-pieces are often involved in inflicting an early chekmate (or nearly
so, where the opponent has to sacrifice so much material to prevent the
mate, that he is lost anyway). The Rooks initially offer not much defense
against that. But your chances for such an early victory would be strongly
reduced if you were missing a super-piece. So perhaps two Rooks would do
better against Q after A and C are traded. This explanation would do
nothing for explaining poor Rook performance of R vs B, but perhaps it is
B that is strong (it is also strong compared to N). The problem then would
be not so much low R value, but high Q value, due to cooperativity between
superpieces. So perhaps the observed scores should not be entirely
interpreted as high base values for Q, C and A, but might be partly due to
super-piece pair bonuses similar to that for the Bishop pair. Which I would
then (mistakenly) include in the base value, as the other super-pieces are
always present in my test positions.

Another possible source of error is that the engine plays a strategy that
is not well suited for playing 2R vs Q. Joker80's evaluation does not
place a lot of importance to keeping all its pieces defended. In general
this might be a winning strategy, giving the engine more freedom in using
its pieces in daring attacks. But 2R vs Q might be a case where this
backfires, and where you can only manifest the superiority of your Rook
force by very careful and meticulous, nearly allergic defense of your
troops, slowly but surely pushing them forward. This is not really the
style of Joker's play. So it would be interesting to do the asymmetreic
playtesting for Q vs 2R also with other engines. But TJchess10x8 only
became available long after I started my piece value project, TSCP-G does
not allow setting up positions (although now I know a work-around for
that, forcing initial moves with both ArchBishops to capture all pieces to
delete, and then retreating them before letting the engine play). And Smirf
initially could not play automatically at all, and when I finally made a WB
adapter for it so that it could, fast games by it where more decided by
timing issues than by play quality (many losses on time with scores like
+12!). And Fairy-Max is really a bit too simplistic for this, not knowing
the concept of a Bishop pair or passed pawns, besides being a slower
searcher.

Derek Nalls wrote on Wed, May 21, 2008 03:02 PM EDT:
Muller:

Please have another look at this except from my 'winboard.ini' file. 
There are standard and special versions of piece values by Muller,
Scharnagl & Nalls for the white and black players renormalized to pawn =
85 points.

The special version of the Muller model has a rook value exactly 85 points
or 1.00 pawn higher than the standard version.

The special version of the Scharnagl model has an archbishop value (736
points) at appr. 95% of the archbishop value (775 points) instead of 597
points at appr. 77% for the standard version.

The special version of the Nalls model is identical to the standard
version until some test is needed and planned.

Since I assume that the 'bishop pairs bonus' is hardwired into Joker80,
40 points has been subtracted from the model-independant, material values
of the bishop under all three models.  Is this correct?
_____________________________________________________

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P85=260=269=474=597=775=816'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P85=260=269=474=736=775=816'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P85=262=279=505=799=815=876'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P85=262=279=505=799=815=876'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P85=260=269=474=597=775=816'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P85=260=269=474=736=775=816'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P85=262=279=505=799=815=876'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P85=262=279=505=799=815=876'
'C:\winboard-F\TJchess\TJChess10x8'
}

H. G. Muller wrote on Wed, May 21, 2008 04:29 PM EDT:
Is there any special reason you want to keep the Pawn value equal in all
trial versions, rather than, say, the total value of the army, or the
value of the Queen? Especially in the Scharnagl settings it makes almost
every piece rather light compared to the quick guesses used for pruning.

Note that there are so many positional modifiers on the value of a pawn
(not only determined by its own position, but also by the relation to
other friendly and enemy pawns) that I am not sure what the base value
really means. Even if I say that it represents the value of a Pawn at g2,
the evaluation points lost on deleting a pawn on g2 will depend on if
there are pawns on e- and i-file, and how far they are advanced, and on
the presence of pawns on the f- and h-file (which mighht become backward
or isolated), and of course if losing the pawn would create a passer for
the opponent.

If I were you, I would normalize all models to Q=950, but then replace
the
pawn value everywhere by 85 (I think the standard value used in Joker is
even 75). I don't think you could say then that you deviate from the
model, as the models do not really specify which type of Pawn they use as
a standard. My value refers to the g2 pawn in an opening setup. Perhaps
Reinhard's value refers to an 'average' pawn, in a typical pawn chain
occurring in the early middle game, or a Pawn on d4/e4 (which is the most
likely to be traded).

As to the B-pair: tricky question. The way you did it now would make the
first Bishop to be traded of the value the model prescribes, but would
make the second much lighter. If you would subtract half the bonus, then
on the average they would be what the model prescribes. The value is
indeed hard-wired in Joker, but if you really want, I could make it
adjustable through a 8th parameter.

Derek Nalls wrote on Wed, May 21, 2008 08:13 PM EDT:
'If I were you, I would normalize all models to Q=950 but then replace
the pawn value everywhere by 85.'

Since this is what you (the developer of Joker80) recommend as optimum, 
this is what I will do.

Are you sure that replacing any pawn values different than 85 points
after renormalization to queen = 950 points still renders an accurate 
and complete representation, more or less, of the Scharnagl and Nalls 
models?

At par of queen = 950 points, the pawn value in the Nalls model
is not represented as being only 92.19% as high as that in the Muller 
model and the pawn value in the Scharnagl model is not represented
as being only 98.95% as high as that in the Muller model.

Thru it all ... If a perfect representation is not quite possible, 
I can accept that without reservation.
__________________________________

'I don't think you could say then that you deviate from the
model as the models do not really specify which type of Pawn they use as
a standard.'

Correctly calculating pawn values at the start of the game (much less, 
throughout the game) requires finesse as it is indeed a complex issue.
In fact, its excessively complexity is the reason my 66-page paper on
material values of pieces is silent in the case of calculating pawn values
in FRC & CRC.  Instead, someone needs to read an entire book from an 
outside source about calculating the material values of the pieces in 
Chess to sufficiently understand it.

Personally, I am content with the test situation as long as Joker80 
handles all pawns under all three models initially valued at 85 points
as fairly and equally as realistically possible.

I cannot speak for Reinhard Scharnagl at all, though.
________________________________________________

'The way you did it now would make the first Bishop to be traded of the 
value the model prescribes, but would make the second much lighter. 
If you would subtract half the bonus, then on the average they would 
be what the model prescribes.'

Now, I understand better.
It makes sense.
[I am glad I asked you.]

Yes, I will subtract 20 points (1/2 of the 'bishop pair bonus') from the
model-independant, material values for the bishop under the 
Scharnagl & Nalls models.

Derek Nalls wrote on Wed, May 21, 2008 08:33 PM EDT:
Muller:

Here is my latest revision to my 'winboard.ini' file.
Are these piece values acceptable to you?
Do you think these piece values will work smoothly with Joker80 running
under Winboard F yet remain true to all three models?
______________________________________________________

/firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\w\S-st\w-S-st 22
P85=302=339=551=694=902=950'
'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22
P85=302=339=551=857=902=950'
'C:\winboard-F\Joker80\w\N-st\w-N-st 22
P85=284=326=548=866=884=950'
'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22
P85=284=326=548=866=884=950'
'C:\winboard-F\TJchess\TJChess10x8'
}
/secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22
P85=300=350=475=875=900=950'
'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22
P85=300=350=560=875=900=950'
'C:\winboard-F\Joker80\b\S-st\b-S-st 22
P85=302=339=551=694=902=950'
'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22
P85=302=339=551=857=902=950'
'C:\winboard-F\Joker80\b\N-st\b-N-st 22
P85=284=326=548=866=884=950'
'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22
P85=284=326=548=866=884=950'
'C:\winboard-F\TJchess\TJChess10x8'
}

100 comments displayed

Earlier Reverse Order LaterLatest

Permalink to the exact comments currently displayed.