Comments/Ratings for a Single Item
'First off, it is quite interesting to instead of picking a magic number as the chance of a square being empty, calculate the value for everything between 32 pieces on the board and 3 pieces on the board. Currently I'm then just averaging all the numbers,' I've done that, too. The problem is, if the only reason you accept the results is because they are similar to the results given by the magic number, then the results have no special validity, they mean nothing more than the magic results. So why add the extra computational burden? If, on the other hand, you had a sound and convincing theory of why averaging the results was correct, that would be a different story. 'This concept seems to be directly related to distance.' Actually, I think I'd call it 'speed'. I'm pretty sure that I've played with those numbers but gave up because I couldn't figure out what to do with them. Maybe you can; I encourage you to try.
Hear Hear, Joe Joyce, I guess I will throw my first two cents in on the question of the piece value quanta question. I think the smallest difference on 8x8 board, is about a third of a pawn or about a tenth of a knight. The larger the board the smaller the quanta, I believe. Maybe by 12x16, the quanta may be as large as a pawn, or more. The problem as alluded before in the other thread is how to empirically test such things.
Reinhardt, this is the place for the discussion of piece values here at the cv.org site. It was started quite a while ago, but has almost no entries. I guess the discussion from a while back on the cvwiki would also be relevant. George, thank you! That thread was started by Mike Nelson on 3/21/04, about 12,500 comments ago. It's worth reading. Jianying Ji, 'argument' below your comment in Aberg: '2008-04-18 Jianying Ji Verified as Jianying Ji None Theoretical considerations ... must tempered by empirical experimentation. Below is my theoretical analysis of C vs A situation. First let's take the following values: R: 4.5 B: 3 N: 3 Now the bishop is a slider so should have greater value then knight, but it is color bound so it gets a penalty by decreasing its value by a third, which reduce it to that of the knight. When Bishop is combined with Knight, the piece is no longer color bound so the bishop component gets back to its full strength (4.5), which is rookish. As a result Archbishop and Chancellor become similar in value.' *** *** I would argue that your conclusion on the values would be correct on an infinite board, where the values of the bishop, rook, and queen have all converged to infinity. [see cvwiki discussion] On an 8x8 board, the unhindered rook moves 14, and the bishop between 7 and 13. This must act to push the value back down. So, what counterbalances it? The RN gets 16-22 on an 8x8, and 18-24 on a 10x8. The BN gets 9 in the corner on either size board, going to a maximum of 21. Can the 4 'forward' attacks of the BN vs the RN's 3 and its ability to checkmate alone really overcome the noticeable mobility disadvantage?
Reinhardt, I'm posting your values from the wiki for the Minister [NDW] and High Priestess [NAF]. [These values were calculated by the method he gives a link to in his last post.] Thank you for the numbers. Would you say that the values would remain the same or very similar on a 10x10 where the other pieces increased or decreased in power? Values for Minister and High Priestess by SMIRF's method Scharnagl 4 May 2007, 07:54 -0-400 As far as I understood those pieces are 'close' types. Thus by SMIRF's method their third value element is always zero because both first elements are equal. It results (please verify this) in 8x8 values: Minister 6+5/7, High Priestess 6+1/28, in 10x10 values: Minister 6+44/45, High Priestess 6+19/45. Thus a Minister seems to be about 1/2 Pawn unit more valued than a High Priestess. [http://chessvariants.wikidot.com/forum/t-8835/piece-comparisons-by-contest]
Piece (S) (m+M) Double Average Pawn 1. --- ------ Knight 3. 10 10.500 Bishop 3. 20 17.500 Rook 5. 28 28.000 Queen 9. 48 45.500 Guard 4. 11 13.125
The table above includes a 'Guard', moving like a nonroyal King. Joe Joyce is quite fond of it, even I have been known to use this piece. The (S) column gives one popular set of standard piece values. The (m+M) column is based on a simple pencil and paper calculation, adding the minimum number of possible moves for the given piece (from a corner square) to the MAXIMUM of possible moves (from a central square). The Knight, for example, has 2 moves minimum and 8 moves MAXIMUM, giving a total of 10 moves. Other people, with more determination, have precisely calculated a grand total of 336 possible moves from all 64 squares on the board , giving an average value of 5.250 possible moves. Dividing 336 by 32 puts 10.500 in the 'Double Average' column, which is surprisingly close to the previous column. From time to time, I play around with piece values on a cubic playing field with 216 cells, content to use an (m+M) column as my source of raw numbers.
What, if any, sense can we make of these numbers? The last two columns measure piece mobility on an empty board, so they indicate the general strength of each piece in the endgame - which I have found the (S) column well suited to. Note that N + B = R in the Double Average column. No great mystery here, the Knight has 60% of the mobility of the Bishop, while the Rook has 160%. Holding the Bishop at 3 points, this column suggests 4.8 points for the Rook, not an unreasonable choice - some writers assign as little as 4.5 points to the Rook. But nobody values the Knight at 1.8 points! To arrive at the 'standard' values, one must make arbitrary changes in the raw numbers, forcing them towards a desired conclusion. 'Knight-moves' need to be counted as more valuable than the moves made by other pieces, perhaps by a 5:3 ratio. The penalty I am inclined to give the Bishop for being colorbound (therefore limited to half the board) needs to be cancelled out by a matching bonus for the fact that every Bishop move either attacks or retreats. The Rook, with its boring sideways moves, usually attacks only a single enemy piece - also it will have only a single line of retreat after capturing that piece. I love Rooks, but am forced to admit that they are superior to Bishops only because they have many move possible moves, on average. The 3D Rook moves up and down along one axis and sideways along two different axes, making it even more 'boring' than the 2D Rook. I am presently re-thinking the entire subject of piece values for 3D chess.
Here is an idea I had one day: recently Joe Joyce and I have been using the Elephant piece, which can move like a Ferz or an Alfil. Let the Grand Rook move like a Rook or an Elephant and let the Chancellor move like a Rook or a Knight. These two pieces, each adding eight shortrange moves to the Rook, should be nearly identical in value on most boards. But I consider a Grand Rook to be worth around half a Pawn less than a Queen on the 8x8 board - contradicting several statements by Ralph Betza (gnohmon) that the Chancellor and Queen are equal in value. This procedure is an art, not a science, and is even more difficult when working with different boards and new pieces. See my Rose Chess XII for a collection of interesting pieces, inspired by the writings of Ralph Betza, plus some theory of their values on a 12x12 board.
Well, I recalculated the values for both piece types using my last published model (which probably is not perfect ;-) ): High Priestess: 8x8: 6+1/28; 10x8: 6+5/36; 10x10: 6+19/45 Minister: 8x8: 6+5/7; 10x8: 6+3/4; 10x10: 6+44/45 Let me admit, that now it seems to me more impressive, to scale piece values no longer to a Pawn normalised as 1, instead to do it using a Knight normalised to 3. This remains neutral to the pieces' values relative to each other, but it seems to create more comparable value series. The High Priestess' strength is more vulnerable by a decreasing board size. Values of both types tend to become equal at an unlimited board size.
Reinhard, I quite agree, knight is a great piece to normalize value to. I often think the best way to valuate pieces is to normalize, with knight at 10pts, which is agreeable with the chess quanta at a little less than a third of a pawn. Perhaps, some new standard can be worked out this way.
These are Aberg's values: A Archbishop 6.8 C Chancellor 8.7 Q Queen 9.0 These are Reinhardt's recent values: High Priestess: 8x8: 6+1/28; 10x8: 6+5/36; 10x10: 6+19/45 Minister: 8x8: 6+5/7; 10x8: 6+3/4; 10x10: 6+44/45 So, for 10x8: The high priestess comes in at 6.1 vs the archbishop's 6.8 - about a 10% difference. The minister comes in at 6.8 vs the chancellor's 8.7, a difference of over 25%. Why is the high priestess so close to the archbishop's value, compared to the minister being noticeably [about 30%] weaker than the chancellor? Why is the value of the high priestess and the minister so much closer together than that of the archbishop and chancellor? This falls in line with HG Muller's argument, though at the lower value, not the higher value. This should imply [at least] something about the 2 types of pieces, the shortrange leapers vs the infinite sliders, no? But what? I said I was better at asking than answering questions; these I find interesting. Now, it's way past my bedtime; good night, all. Pleasant dreams. ;-)
H.G.Muller has written here 'It is funny that a pair of the F+D, which is the (color-bound) conjugate of the King, is worth nearly a Knight (when paired), while a non-royal King is worth significantly less than a Knight (nearly half a Pawn less). But of course a Ferz is also worth more than a Wazir, zo maybe this is to be expected.'
Ralph Betza has written here 'Surprisingly enough, a Commoner (a piece that moves like a King but doesn't have to worry about check) is very weak in the opening, reasonably good in the middlegame, and wins outright against a Knight or Bishop in the endgame. (There are no Commoners in FIDE chess, but the value of the Commoner is some guide to the value of the King).'
Since ... A. The argumentative posts of Muller (mainly against Scharnagl & Aberg) in advocacy of his model for relative piece values in CRC are neverending. B. My absence from this melee has not spared my curious mind the agony of reading them at all. ... I hope I can help-out by returning briefly just to point-out the six most serious, directly-paradoxical and obvious problems with Muller's model. 1. The archbishop (102.94) is very nearly as valuable as the chancellor (105.88)- 97.22%. 2. The archbishop (102.94) is nearly as valuable as the queen (111.76)- 92.11%. 3. One archbishop (102.94) is nearly as valuable as two rooks (2 x 55.88)- 92.11%. In other words, one rook (55.88) is only a little more than half as valuable as one archbishop (102.94)- 54.28%. 4. Two rooks (2 x 55.88) have a value exactly equal to one queen (111.76). 5. One knight (35.29) plus one rook (55.88) are markedly less valuable than one archbishop (102.94)- 88.57%. 6. One bishop (45.88) plus one rook (55.88) are less valuable than one archbishop (102.94)- 98.85%. None of these problems exist within the reputable models by Nalls, Scharnagl, Kaufmann, Trice or Aberg. You must honestly address all of these important concerns or realistically expect to be ignored.
Gentlemen, this is a fascinating topic, and has drawn the attention of a large audience [for chess variants, anyhow ;-) ], and I'd hope to see something concrete come out of it. Obviously, many of you gentlemen participating in the conversation have made each other's acquaintance before. And passions run high - I could say: 'but this is [only] chess', however, I, too have had the rare word here or there, over chess, so I would be most hypocritical, besides hitting by subtly [snort! - 'only' is not subtle] putting down what we all love and hate to hear others say is useless. What I and any number of others are hoping to get is an easy way to get values for the rookalo we just invented. Assuming hope is futile, we look for a reasonable way to get these values. Finally, we just pray that there is any way at all to get them. So far, we don't have all that many probes into the middle ground, much less the wilds of variant piece design. We use 3 methods to value pieces, more or less, I believe: The FIDE piece values are built up over centuries of experience, and still not fully agreed-upon; The software engines [and to a certain extent, the hardware it runs on] that rely on the same brute-force approach that the FIDE values are based on, but using algorithms instead of people to play the games; Personal estimates of some experts in the field, who use various and multiple ways to determine values for unusual pieces. The theoretical calculations that go into each of these at some stage or other are of interest here. Why? Because the results are different. That the results are different is a good thing, because it causes questioning, and a re-examination of assumptions and methods of implementation. The questions you should be asking and seriously trying to answer are why the differences exist and what effects they have on the final outcomes. Example: 2 software engines, A and B - A plays the archbishop-type piece better than the chancellor-type piece because there are unexpected couplings between the software and hardware that lead to that outcome, and B is the opposite. Farfetched? Well, it boils down to 3 elements: theory, implementation, execution. Or: what is the designer trying to do [and why?], what does the code actually say, and how does the computer actually run it? Instead of name-calling, determine where the roots of the difference lie [because I expect several differences]; they must lie in theory, implementation and/or execution. Why shouldn't humans and computers value pieces differently? They have different styles of play. Please, tone down the rhetoric, and give with some numbers and methods. Work together to see what is really going on. Or use each other's methods to see if results are duplicated. Numbers and methods, gentlemen, not names and mayhem. I have clipped some words or sentences from rare posts, when they clearly violated the site's policies. Please note that sticking to the topic, chess, is a site policy, and wandering off topic is discouraged. Play the High Priestess and Minister on SMIRF or one of the other 10x8 engines that exists, and see what values come up. Play the Falcon, the Scout, the Hawklet... and give us the numbers, please. If they don't match, show us why.
SMIRF still is not able to use other non conventional piece types despite of Chancellor (Centaur) or Archbishop (Archangel). You have to use other fine programs. Nevertheless the SMIRF value theory is able to calculate estimated piece exchange values.
Currently I am about to learn the basics of how to write a more mature SMIRF and GUI for the Mac OS X operating system. Thus it will need a serious amount of time and I hope not to lose motivation on this. Still I have some difficulties to understand some details of Cocoa programming using Xcode, because there are only few good books on that topic here in German language. We will see if this project will become ready ever.
A substantial revision and expansion has recently occurred. universal calculation of piece values http://www.symmetryperfect.com/shots/calc.pdf 66 pages Only three games have relative piece values calculated using this complex model: FRC, CRC and Hex Chess SS (my own invention). Furthermore, I only confidently consider my figures somewhat reliable for two of these games, FRC (including Chess) and Capablanca Random Chess, because much work has been done by many talented individuals (hopefully, including myself) as well as computers to isolate reliable material values. This dovetails into the reason that I do not take requests. I have absolutely no assurance that the effort spent outside these two established testbeds is productive at all. If it is important to you to know the material values for the pieces within your favorite chess variant (according to this model), then you must calculate them yourself. Under the recent changes to this model, the material values for FRC pieces and Hex Chess SS pieces remained exactly the same. However, the material values for a few CRC pieces changed significantly: Capablanca Random Chess material values for pieces http://www.symmetryperfect.com/shots/values-capa.pdf pawn 10.00 knight 30.77 bishop 37.56 rook 59.43 archbishop 93.95 chancellor 95.84 queen 103.05 Focused, intensive playtesting on my part has proven Muller to be correct in his radical, new contention that the accurate material value of the archbishop is extraordinarily, counter-intuitively high. I think I have successfully discovered a theoretical basis which is now explained within my 66-page paper. All of the problems (that I am presently aware of) within my set of CRC material values have now been solved. Some problems remain within Muller's set. I leave it to him whether or not to maturely discuss them.
Interesting response by Derek Nalls, It does appear that the archbishop will be getting a hearing and reevaluation. This will certain sharpen things and advance our knowledge of this piece. On piece values in general, I second Rich with the addition of Hans's comment, that piece values are for: 1) Balancing armies when playing different armies. 2) Giving odds to weaker players (this is more easily done with shogi-style variants, with chess-style variants the weaker player receive a slightly stronger army) 3) To cancel out the first player advantage by giving the second player a slight strengthening of maybe only one piece. As for Joe Joyce's minister and Priestess, my initial estimate was queenish but that is an overestimate, and is dependent on the range of opponent pieces. One interesting feature that may impact value is that minister is more color changing than color bound, while priestess is a balance of both. This balance between color changing and color bound might make a nice chessvariant theme. Another general consideration for evaluating piece and army strength is approachability, how many opponent pieces from how many squares can attack a piece without reciprocal threat.
Another impact on values is the piece mix. Where there are many Pawns and short-range pieces, Carrera's Centaur and Champion have more value. Where those unoriginal BN and RN exist with Unicorn (B+NN) or Rococo Queen-distance pieces, like Immobilizer, Advancer, Long Leaper, even Swapper, BN and RN then have inherently less value. Put an Amazon (Q+N) in there, with at least some Pawns for experimental similarity, and BN and RN fall in value. Then too, change the Pawn-type and change the values. Put stronger Rococo Cannon Pawns in any CV previously having regular F.I.D.E. or Berolina Pawns, and any piece value of 5.0 or more, relative to Pawns normalized to near 1.0, decreases -- on most board sizes. I wonder why Ralph Betza made only one Comment in this 6-year-old thread. Maybe he figured, why help out Computers too much? They had already ruined 500-year-old Mad Queen 64.
exchange... gentlemen, an interesting midpoint. I was going to note
that some of the Muller numbers are quite similar to others' numbers.
For example, the values of the minister and priestess fell between 6
and 7 by both HG and Reinhard's methods. Yet other numbers are quite
far apart, like the commoner values. This, of course, presents 2
problems, one to explain the differences, and the other to explain the
similarities. Derek, could you give us a verbal explanation of what you
did and found?
Reinhard, my apologies for some sloppy phraseology. You've posted your
theory for all to see. You have provided numbers both times we've
spoken on this. In fact, you have been kind enough to correct my
mistakes in using your theory as well as providing the 2 sets of numbers.
[I will have to find some time to upgrade the wiki on this. Excellent.]
Thank you; I could ask for very little more. [Heh, maybe a tutorial on
that 3rd factor; Graeme had to correct my mistakes too.] I wish you the
very best with your new endeavor.
Ji is right, the number of squares attacked may be a first
approximation, but the pattern of movement is a key modifier. I put
together a chart a while ago after discussing the concept of
approachability with David Paulowich. The numbers in the chart are
accurate; the notes following contain observations, ideas, statements
that may be less so. Fortunately, the numbers in themselves are rather
suggestive, one way to look at power and vulnerability. They present a
two-dimensional view of pieces, a sort of looking down from above view
in chart form.
http://chessvariants.wikidot.com/attack-fraction
The chart clearly could be expanded, should anyone be interested. [The
archbishop, chancellor, amazon should be added soon, for example; any
volunteers? :-) ] But can it be used for anything? Colorboundness, and
turns to get across board, both side to side and between opposite
corners, are factors that must have some effect. [Board size and
edge effect are 2 more, this time mutually interactive factors. How
much will they be explored? Working at constant board size sort of
moots that question.] What do your theories, gentlemen who are carrying
on or following this conversation, have to say about these things?
Please note this conversation is spread over 3 topics:
this Piece Values thread,
Aberg's Variant game comments
Grand Shatranj game comments
I believe spaces attacked are a subset of spaces a piece can move onto.
As far as playtesting goes ... Admittedly, my initial intention was just to amuse myself by disproving the consistency of Muller's unusually-high archbishop material value in relation to other piece values within his CRC set. If indeed his archbishop material value had been as fictitious as it was radical, then this would have been readily-achievable using any high-quality chess variant program such as SMIRF. No matter what test I threw at it, this never happened. Previously, I have only used 'symmetrical playtesting'. By this I mean that the material and positions of the pieces of both players have been identical relative to one another. This is effective when playing one entire set of CRC piece values against another entire set as, for example, Reinhard Scharnagl & I have done on numerous occasions. The player that consistently wins all deep-ply (long time per move) games, alternatively playing white and black, can be safely concluded to be the player using the better of the two sets of CRC piece values since this single variable has been effectively isolated. However, this playtesting method cannot isolate which individual pieces within the set carry the most or least accurate material values. In fact, I had no problem with Muller's set of CRC piece values as a whole. The order of the material values of all of the CRC pieces was-is correct. However, I had a large problem with his material value for the archbishop being nearly as high as for the chancellor. To pinpoint an unreasonably-high material value for only one piece within a CRC set required 'asymmetrical playtesting'. By this I mean that the material and positions of the pieces of both players had to be different in an appropriate manner to test the upper and lower limits of the material value for a certain piece (e.g., archbishop). This was achieved by removing select pieces from both players within the Embassy Chess setup so that BOTH players had a significant material advantage consistent with different models (i.e., Scharnagl set vs. Muller set). This was possible strictly because of the sharp contrast between the 'normal, average' and 'very high', respectively, material values for the archbishop assigned by Scharnagl and Muller. The fact that the SMIRF program implicitly uses the Scharnagl set to play both players is a control variable- not a problem- since it is insures equality in the playing strength with which both players are handled. The player using the Scharnagl set lost every game using SMIRF MS-173h-X ... regardless of time controls, white or black player choice and all variations in excluded pieces that I could devise. I thought it was remotely possible that an intransigent, positional advantage for the Muller set somehow happened to exist within the modified Embassy Chess setup that was larger than its material disadvantage. This type of catastrophe can be the curse of 'asymmetrical playtesting'. So, I experimented likewise using a few other CRC variants. Same result! The Scharnagl set lost every game. I seriously doubt that all CRC variants (or at least, the games I tested) are realistically likely to carry an intransigent, positional advantage for the Muller set. If this is true, then the Muller set is provably, ideally suited to CRC, notwithstanding- just for a different reason. Finally, I reconsidered my position and revised my model.
Well Derek, I did not understand exactly, what you have done. But it seems to me, that you exchanged or disposed some different pieces from the Capablanca piece set according to SMIRF's average exchange values. Let me point to a repeatedly written detail: if a piece will be captured, then not only its average piece exchange value is taken from the material balance, but also its positional influence from the final detail evaluation. Thus it is impossible to create 'balanced' different armies by simply manipulating their pure material balance to become nearly equal - their positional influences probably would not be balanced as need be. A basic design element of SMIRF's detail evaluation is, that the positional value of a square dominated by a piece (of minimal exchange value) is related to 1/x from its exchange value. Thus replacing some bigger pieces by some more smaller types keeping their combined material balance will tend to increase their related positional influences. You see, that deriving conclusions from having different armies playing each other, is a very complicated story.
For the reasons you describe (which I mostly agree with), I do not ever use 'asymmetrical playtesting' unless that method is unavoidable. However, you should know that I used many permutations of positions within my 'missing pieces' test games to try to average-out positions that may have pre-set a significant positional advantage for either player. Yes, the fact that SMIRF currently uses your (Scharnagl) material values with a 'normal, average' material value for the archbishop instead of a 'very high' material value (as well as the interrelated positional value given to the archbishop with SMIRF) means that both players will place greater effort than I think is appropriate into avoiding being forced into disadvantageous exchanges where they would trade their chancellor or queen for the archbishop of the opponent. Still, the order of your material values for CRC pieces agrees with the Muller model (although an archbishop-chancellor exchange is considered only slightly harmful to the chancellor player under his model). So, I think tests using SMIRF are meaningful even if I disagree substantially with the material value for one piece within your model (i.e., the archbishop). Due to apprehension over boring my audience with irrelevant details, I did not even mention within my previous post that I also invented a variety of 10 x 8 test games using the 10 x 8 editor available in SMIRF that were unrelated to CRC. For example, one game consisted of 1 king & 10 pawns per player with 9 archbishops for one player and 8 chancellors or queens for another player. Under the Muller model, the player with the 9 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 8 chancellors or 8 queens had a significant material advantage. The player with the 9 archbishops won every game. For example, one game consisted of 1 king & 20 pawns per player with 9 archbishops for one player and 8 chancellors or queens for another player. Under the Muller model, the player with the 9 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 8 chancellors or 8 queens had a significant material advantage. The player with the 9 archbishops won every game. For example, one game consisted of 1 king & 10 pawns per player with 18 archbishops for one player and 16 chancellors or queens for another player. Under the Muller model, the player with the 18 archbishops had a significant material advantage. Under the Scharnagl model, the player with the 16 chancellors or 16 queens had a significant material advantage. The player with the 18 archbishops won every game. I have seen it demonstrated many times how resilient positionally the archbishop is against the chancellor and/or the queen in virtually any game you can create using SMIRF with a 10 x 8 board and a CRC piece set. When Muller assures us that he is responsibly using statistical methods similar to those employeed by Larry Kaufmann, a widely-respected researcher of Chess piece values, I think we should take his word for it. Of course, I remain concerned about the reliability of his stats generated via using fast time controls. However, it has now been proven to me that his method is at least sensitive enough to detect 'elephants' (i.e., large discrepancies in material values) such as exist between contrasting CRC models for the archbishop even if it is not sensitive enough to detect 'mice' (i.e., small discrepancies in material values) so to speak.
To Derek Nalls and H.G.M.:
Nearly everyone - so I think - will agree, that inside a CRC piece set the value of an Archbishop is greater than the sum of the values of Knight and Bishop, and even greater than two Knight values. Nevertheless, if you have following different armies playing against each other:
[FEN 'nnnn1knnnn/pppppppppp/10/10/10/10/PPPPPPPPPP/A1A2K1A1A w - - 0 1']
then you will get a big surprise, because those 'weaker' Knights will be going to win.
There are a lot of new and unsolved problems, when trying to calculating piece values inside of different armies, including the playability of a special piece type, e.g. regarding the chances to cover it by any other weaker one.
Yes, your test example yields a result totally inconsistent with everyone's models for CRC piece values. [I did not run any playtest games of it since I trust you completely.] Yes, your test example could cause someone who placed too much trust in it to draw the wrong conclusion about the material values of knights vs. archbishops. The reason your test example is unreliable (and we both agree it must be) is due to its 2:1 ratio of knights to archbishops. The game is a victory for the knights player simply because he/she can overrun the archbishops player and force materially-disadvantageous exchanges despite the fact that 4 archbishops indisputably have a material value significantly greater than 8 knights. In all three of my test examples from my previous post, the ratios of archbishops to chancellors and archbishops to queens were only 9:8. Note the sharp contrast. Although I agree that a 1:1 ratio is the ideal goal, it was impossible to achieve for the purposes of the tests. I do not believe a slight disparity (1 piece) in the total number of test pieces per player is enough to make the test results highly unreliable. [Yes, feel free to invalidate my test example with 18 archbishops vs. 16 chancellors and 18 archbishops vs. 16 queens since a 2 piece advantage existed.] Although surely imperfect and slightly unreliable, I think the test results achieved thru 'asymmetrical playtesting' or 'games with different armies' can be instructive as long as the test conditions are not pushed to the extreme. Your test example was extreme. Two out of three of my test examples were not extreme.
Derek, my example must be extreme. Only then light might fall to the obscure points. My current interpretation to that strange behavior: it is part of a piece's value, that it is able to risk its own existence by entering attacked squares. But that implies that it could be covered by a minor piece. And covering is possible only, if there is at least one enemy piece of equal or higher value to enable a tolerable exchange. In your and mine examples that is definitely not the case. My conclusion is, that the most valued pieces will decrease in their values, if no such potential acceptable exchange pieces exist. My assumption to that is, a suggested replace value would be: ( big own piece value + big enemy piece value + 1 pawn unit ) / 2 This has to be applied to all those unbalanced big pieces. ( Just an idea of mine ... ) P.S.: after rethinking on the question of the value of such handicaped big pieces (having no equal or bigger counterpart) I now propose: ( big own piece value + 2 * big enemy piece value ) / 3
Feel free to invalidate my other two test examples I (reluctantly) mentioned as well. My reason is that having ranks nearly full of archbishops, chancellors or queens in test games does not even resemble a proper CRC variant setup with its variety and placement of pieces. Therefore, those test results cannot safely be concluded to have any bearing upon the material values of pieces in any CRC variant. Your reason is well-expressed.
The feasibility of using identical armies to calculate piece values It has been a long time since our sets of CRC piece values have played one another (on my dual 2.4 Ghz CPU server) using otherwise-identical versions of SMIRF. Obviously, the reason is that it has been a long time since there existed a large disparity within our material values for any one of the CRC pieces. Recently, that has changed in the case of the archbishop. I already have the standard version of SMIRF MS-174b-O which uses Scharnagl CRC piece values. Would you be willing to compile a special version of SMIRF MS-174b-O for me which uses Nalls CRC piece values? Capablanca Random Chess material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf Back on safe ground using 'symmetrical playtesting', the results of who wins the test games should be indicative of who is using a better set of CRC piece values.
Derek, now it no longer is that easy. Because now in SMIRF piece values are only implemented in their statical part. Their mobility part will be covered by the detail evaluation. The '-X' versions of SMIRF have made a mixture of those, the '-0' version is completely without mobility fractions. This is a minor detail of my new approaches. Nevertheless if you would separate those components compiles are possible.
I understand. I wondered what the 'X' & 'O' designations for recent SMIRF versions meant. Do you still possess an older version of SMIRF (of satisfactory quality to you) that uses your current CRC material values? Since there is appr. 2-1/2 pawns difference between our models in our material values for the archbishop, I predict that my playtesting results would probably be worthwhile and decisive.
Joe Joyce and J.J. are referring to Minister ( Knight + Dabbabah + Wazir ) and Priestess ( Knight + Alfil + Ferz ). Ralph Betza's Chess Different Armies has FAD ( Ferz + Alfil + Dabbabah ). That took a minute to recall and find. I am quite sure (N+D+W) and (N+A+F) are not new and appear under different name(s) some time ago, and it would be less misleading to use earlier names. They did not originate with uncreative A.B.Shatranj or such other recently. When previous use(s) found, I will post them, as we have done with some other ''re-inventions.'' These pieces are unappealing, all three, because they have unnatural foreshortened Rook or Bishop dimension in their triple-compounding. There is no compelling logic. They are pulled out of a hat from hundreds possibilities. Why not use pieces going one-, two-, and three- either Rook- or Bishop-wise? No reason. No improvement of any CV set-up by limiting to up-to-two or -three radially. That is why Bishop and Rook themselves will always stand as perfection. Piece Values inherently, however, are interesting intellectual activity and topic. However, in perspective, not because of the utility of these particular mediocre choices, ''Minister,'' ''Priestess,'' FAD. (Another Comment may take up Amazon and the others as to their deficiencies.) Instead, because facility at computing values can then attempt to apply to better piece-movement concepts, such as Rococo units, these are worthwhile enough threads on Piece Values.
Well, Derek, I will use my own values for 8x8, if you have none new for Q,A,C ... I still have not published my current values (because they normally are not used inside of SMIRF, and only the mobility parts have been modified), I will use those then in the requested compiles: N,B,R,A,C,Q for 8x8: 3.0000, 3.4119, 5.1515, 6.7824, 8.7032, 9.0001 N,B,R,A,C,Q for 10x8: 3.0556, 3.6305, 5.5709, 7.0176, 9.1204, 9.6005
Derek, you will receive versions compiled using complete piece values.
Different armies in action: 4*Archbishop vs. 8*Knight Following game could be reviewed using the SMIRF donationware release from: http://www.chessbox.de/Compu/schachsmirf_e.html (but first replace the single quotes by double quotes before pasting) [Event 'SmirfGUI: Different Armies Games'] [Site 'MAC-PC-RS'] [Date '2008.05.02'] [Time '18:30:40'] [Round '60 min + 30 sec'] [White '1st Smirf MS-174c-0'] [Black '2nd Smirf MS-174c-0'] [Result '0-1'] [Annotator 'RS'] [SetUp '1'] [FEN 'nnnn1knnnn/pppppppppp/10/10/10/10/PPPPPPPPPP/A1A2K1A1A w - - 0 1'] 1. Aji3 Nd6 {(11.02) -1.791} 2. Aab3 Ne6 {(12.01=) -1.533} 3. c4 c5 {(12.01=) -0.992} 4. d4 cxd4 {(13.00) -0.684} 5. c5 Ne4 {(12.01) -0.535} 6. Ac2 d5 {(11.39) +0.189} 7. f3 N4xc5 {(11.01=) +0.465} 8. Ag3 Nac7 {(11.01) +0.900} 9. b4 Ncd7 {(11.01=) +1.475} 10. f4 g6 {(10.31) +1.750} 11. Ai5+ Ngh6 {(11.03+) +1.920} 12. g4 j6 {(12.01=) +2.225} 13. Aie1 Nig7 {(11.01=) +2.363} 14. Ac1d3 f6 {(10.20) +2.506} 15. a4 N8f7 {(11.01=) +2.707} 16. Kg1 a5 {(11.01) +2.803} 17. bxa5 Nc6 {(11.15) +2.910} 18. Ab3 Nji6 {(11.01) +2.570} 19. j4 f5 {(12.03=) +3.010} 20. gxf5 Ngxf5 {(11.01) +3.342} 21. a6 bxa6 {(11.01=) +3.998} 22. a5 Ne3 {(11.15) +4.156} 23. Aa4 Nb5 {(11.01=) +4.504} 24. Ab3 Nig7 {(11.03=) +5.244} 25. Aih4 Nf6 {(11.02) +5.324} 26. Aef2 Nfh5 {(10.19) +6.395} 27. Ah3 Nhxf4 {(11.01) +6.172} 28. Adxf4 Nxf4 {(14.01) +5.979} 29. Axf4 g5 {(12.14) +6.086} 30. Ahxg5 Nxg5 {(14.01=) +6.018} 31. Axg5 Kg8 {(14.11) +5.176} 32. Axe3 dxe3 {(16.01=) +5.117} 33. Axd5+ Kh8 {(16.01=) +5.117} 34. Axc6 Nhf5 {(14.18) +5.127} 35. Ab4 Nc7 {(15.00) +4.803} 36. Ad3 Ki8 {(15.00) +4.838} 37. Kh1 Nd6 {(14.01) +4.891} 38. j5 Ngf5 {(14.01=) +5.189} 39. Ac5 Ndb5 {(14.01) +5.248} 40. Ad3 Nbd4 {(14.01) +5.365} 41. Ae4 Ncb5 {(16.02) +5.631} 42. Ki1 e6 {(15.23) +5.932} 43. Ad3 h6 {(15.01) +5.250} 44. Ac4 h5 {(15.01=) +5.467} 45. i3 Kj7 {(15.12) +5.637} 46. Ad3 Nc3 {(15.09) +5.715} 47. Axa6 Ndxe2 {(15.00) +5.678} 48. Ad3 Ned4 {(14.01=) +6.117} 49. a6 Ncb5 {(14.01=) +6.602} 50. Kj1 e2 {(15.01=) +8.080} 51. Ae1 e5 {(15.01=) +11.59} 52. i4 e4 {(15.01=) +12.16} 53. ixh5 Nf3 {(14.02) +12.56} 54. Af2 e3 {(15.22) +14.61} 55. Ad3 e1=Q+ {(16.02) +16.00} 56. Axe1 Nxe1 {(17.01=) +23.09} 57. h6 ixh6 {(15.02=) +M~010} 58. h4 Nxh4 {(12.01=) +M~008} 59. a7 Nxa7 {(10.01=) +M~008} 60. Ki1 Neg2+ {(08.01=) +M~007} 61. Kh2 e2 {(06.01=) +M~006} 62. Kh3 e1=Q {(04.01=) +M~005} 63. Kg4 Qe4+ {(02.01=) +M~004} 64. Kh3 Qf3+ {(02.01=) +M~003} 65. Kh2 Qi3+ {(02.01=) +M~002} 66. Kh1 Qi1# {(02.00?) +M~001} 0-1 You will find out, that the handicap of being a big piece without having any exchangeable counterpart is dominating the kind of the battle.
The arrays as I have tested with SMIRF have had an advantage for White of 3.1296 in my model.
In your model (normalized to a Pawn = 1) the advantage has been about 12.944 (more than a Queen's value).
P.S.: Why not have some test games between SMIRF using Black having 9 Knights against your program having 4 Archbishops, each having 10 Pawns? In your value model it should be nearly impossible for Black to gain any victory at all.
P.P.S.: The game as proposed is no subject for Blitz, because it is decided by deep positional effects. So I used 60 min / game + 30 sec / move for the time frame, which is important.
Sorry my original long post got lost. As this is not a position where you can expect piece values to work, and my computers are actually engaged in useful work, why don't YOU set it up?
Well, Harm, you know, that I failed in using 10x8 Winboard GUIs, so I discontinued trying that.
It seems to me that that is bad strategy. If you fail you should keep trying until you succeed. Only when you succeed you can stop trying...
You will find a (hopefully) actual table of several piece value sets at: http://www.10x8.net/Compu/schachveri1_e.html
I have adequate confidence in my latest material values to ask you to publish them upon your web page (instead of my previous material values). CRC material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf They are, in principle, similar to Muller's set for every piece except that they run on a comparatively compressed scale. Even though I have not yet playtested them, I consider my tentative confidence rational (although admittedly premature and risky) because I trust Muller's methods of playtesting his own material values and I think my latest revisions to my model are conceptually valid.
http://www.10x8.net/Compu/schachansatz1_e.html
To summarize the state of affairs, we now seem to have sets of piece values for Capablanca Chess by: Hans Aberg (1) Larry Kaufman (1) Reinhard Scharnagl (2) H.G. Muller (3) Derek Nalls (4) 1) Educated guessing based on known 8x8 piece values and assumptions on synergy values of compound pieces 2) Based on board-averaged piece mobilities 3) Obtained as best-fit of computer-computer games with material imbalance 4) Based on mobilities and more complex arguments, fitted to experimental results ('playtesting') I think we can safely dismiss method (1) as unreliable, as the (clearly stated) assumptions on which they are based were never tested in any way, and appear to be invalid. Method (3) and (4) now are basically in agreement. Method (2) produces substantially different results for the Archbishop. One problem I see with method (2) is that plain averaging over the board does not seem to be the relevant thing to do, and even inconsitent at places: suppose we apply it to a piece that has no moves when standing in a corner, the corner squares would suppress the mobility. If otoh, the same piece would not be allowed to move into the corner at all, the average would be taken over the part of the board that it could access (like for the Bishop), and would be higher than for the piece that could go there, but not leave it (if there weren't too many moves to step into the corner). While the latter is clearly upward compatible, and thus must be worth more. The moral lesson is that a piece that has very low mobility on certain squares, does not lose as much value because of that as the averaging suggest, as in practice you will avoid putting the piece there. The SMIRF theory doe not take that into account at all. Focussing on mobility only also makes you overlook disastrous handicaps a certain combination of moves can have. A piece that has two forward diagonal moves and one forward orthogonal (fFfW in Betza notation) has exactly the same mobility as that with forward diagonal and backward orthogonal moves (fFbW). But the former is restricted to a small (and ever smaller) part of the board, while the latter can reach every point from every other point. My guess is that the latter piece would be worth much more than the former, although in general forward moves are worth more than backward moves. (So fWbF should be worth less than fFbW.) But I have not tested any of this yet. I am not sure how much of the agreement between (3) and (4) can be ascribed to the playtesting, and how much to the theoretical arguments: the playtesting methods and results are not extensively published and not open to verification, and it is not clear how well the theoretical arguments are able to PREdict piece values rather than POSTdict them. IMO it is not possible to make an all encompasisng theory with just 4 or 6 empirical piece values as input, as any elaborate theory will have many more than 6 adjustable parameters. So I think it is crucial to get accurate piece values for more different pieces. One keystone piece could be the Lion. This is can make all leaps to targets in a 5x5 square centered on it (and is thus a compound of Ferz, Wazir, Alfil, Dabbabah and Knight). This piece seems to be 1.25 Pawn stronger than a Queen (1075 on my scale). This reveals a very interesting approximate law for piece values of short-range leapers with N moves: value = (30+5/8*N)*N For N=8 this would produce 280, and indeed the pieces I tested fall in the range 265 (Commoner) to 300 (Knight), with FA (Modern Elephant), WD (Modern Dabbabah) and FD in between. For N=16 we get 640, and I found WDN (Minister) = 625 and FAN (High Priestess) and FAWD (Sliding General) 650. And for the Lion, with N=24, the formula predicts 1080. My interpretation is that adding moves to a piece does not only add the value of the move itself (as described by the second factor, N), but also increases the value of all pre-existing moves, by allowing the piece to better manouevre in place for aiming them at the enemy. I would therefore expect it is mainly the captures that contribute to the second factor, while the non-captures contribute to the first factor. The first refinement I want to make is to disable all Lion moves one at a time, as captures or as non-captures, to see how much that move contributes to the total strength. The simple counting (as expressed by the appearence of N in the formula) can then be replaced by a weighted counting, the weights expressing the relative importance of the moves. (So that forward captures might be given a much bigger weight than forward non-captures, or backward captures along a similar jump.) This will require a lot of high-precision testing, though.
Oh Yes, I forgot about: [name removed] (5) 5) Based on safe checking I am not sure that safe checking is of any relevance. Most games are not won by checkmating the opponent King in an equal-material position, but by annihilating the opponent's forces. So mainly by threatening Pawns and other Pieces, not Kings. A problem is that safe checking seems to predict zero value for pieces like Ferz, Wazir and Commoner, while the latter is not that much weaker than the Knight. (And, averaged over all game stages, might even be stronger than a Knight.) This directly seems to falsify the method. [The above has been edited to remove a name and/or site reference. It is the policy of cv.org to avoid mention of that particular name and site to remove any threat of lawsuits. Sorry to have to do that, but we must protect ourselves. -D. Howe]
Before I try to think over this argument, remember, all (non Pawn) pieces of the CRC piece set have non orientated gaits. Thus this argument could not change anything in value discussion of the CRC piece set, especially concerning the value of an Archbishop.
Reinhard, why do you attach such importance to the 4A-9N position. I think that example is totally meaningless. If it would prove anything, it is that you cannot get the value of 9 Knights by taking 9 times the Knight value. It will prove _nothing_ about the Archbishop value. Chancellor and Queen will encounter exactly the same problems facing an army of 9 Knights. The problem is that there is a positional bonus for identical pieces defending each other. This is well known (e.g. connected Rooks). Problem is that such pair interactions grow as the square of the number of pieces, and thus start to dominate the total evaluation if the number of identical pieces gets extremely high (as it never will in real games). Pieces like A, C and Q (or in particular the highest-valued pieces on the board) will not get such bonuses, as the bonus is asociated with the safety of mutually defending each other, and tactical security in case the piece is traded, because the recapture then replaces it by an identical one, preserving all defensive moves it had. In absence of equal or higher pieces, defending pieces is a useless exercise, as recapture will not offer compensation. If you are attacked, you will have to withdraw. So the mutual-defence bonus is also dependent on the piece makeup of the opponent, and is zero for Archbishops when the opponent only has Knights, and very high for Knights when the opponent has only Archbishops. If you want to playtest material imbalances, the positional value of the position has to be as equal as possible. The 4A-9N position violates that requirement to an extreme extent. It thus cannot tell us anything about piece values. Just like deleting the white Queen and all 8 black Pawns cannot tell us anything about the value of Q vs P.
I fully agree with that. Because my A vs. N example has not been intended to calculate piece values. Instead it should put light on some obscure details. The strange effect is not caused by the ability of N to cover each other. This also holds for A. It is caused by the absence of exchangeable counterparts for A of equal (or bigger) value size.
My example should demonstrate the existence of new effects in games of different armies. And that implies, that one should be carefully, when trying to calculate or verify piece values by having series of matches between different armies. Such effects as demonstrated in my N vs. A example should be discussed, eliminated or if not to be avoided to be integrated inside a formula. I suggested to reduce the values of such unbalanced big pieces somehow (I am not yet sure how exactly) in the equations you are using to find out special piece values. But without such purification attempts misinterpretations are not to be avoided.
Well, Reinhard, there could be many explanations for the 'surprising' strength of an all-Knight army, and we could speculate forever on it. But it would only mean anything if we could actually find ways to test it. I think the mutual defence is a real effect, and I expect an army of all different 8-target leapers to do significantly worse than an army of all Knights, even though all 8-target leapers are almost equally strong. But it would have to be tested. Defending each other for Archbishops is useless (in the absence of opponet Q, C or A), as defending Archbishop in the face of Knight attacks is of zero use. So the factthey can do it is not worth anything. Nevertheless, the Archbishops do not do so bad as you want to make us believe, and I think they still would have a fighting chance against 9 Knights. So perhaps I will run this tests (on the Battle-of-the-Goths port, so that everyone can watch) if I have nothing better to do. But currently I have more important and urgent things to do on my Chess PC. I have a great idea for a search enhancement in Joker, and would like to implement and test it before ICT8.
re: Muller's assessment of 5 methods of deriving material values for CRC pieces 'I am not sure how much of the agreement between (3) and (4) can be ascribed to the playtesting, and how much to the theoretical arguments ...' As much playtesting as possible. Unfortunately, that amount is deficient by my standards (and yours). I have tried to compensate for marginal quantity with high quality via long time controls. You use a converse approach with opposite emphasis. Given enough years (working with only one server), this quantity of well-played games may eventually become adequate. ' ... and it is not clear how well the theoretical arguments are able to PREdict piece values rather than POSTdict them.' You have pinpointed my greatest disappointment and frustration thusfar with my ongoing work. To date, my theoretical model has not made any impressive predictions verified by playtesting. To the contrary, it has been revised, expanded and complicated many times upon discovery that it was grossly in error or out of conformity with reality. Although the foundations of the theoretical model are built upon arithmetic and geometry to the greatest extent possible with verifiable phenomena important to material values of pieces used logically for refinements, mathematical modelling can be misused to postulate and describe in detail the existence of almost any imaginable non-existent phenomena. For example, the Ptolemy model of the solar system.
Now you have got it. The main reason is the missing of counterparts of equal (or bigger) value. That is, what makes any effective covering impossible. And this is a payload within an (I confess very extremely designed) game between different armies.
P.S.: any covering of A also by P is useless then ...
Well, I got that from the beginning. But the problem is not that the A cannot be defended. It is strong and mobile enough to care for itself. The problem is that the Knights cannot be threatened (by A), because they all defend each other, and can do so multiple times. So you can build a cluster of Knights that is totally unassailable. That would be much more difficult for a collection of all different pieces. This will be likely to have always some weak spots, which the extremely agile Archbishops then seek out and attack that point with deadly precision. But I don't see this as a fundamental problem of pitting different armies against each other. After an unequal trade, andy Chess game becomes a game between different armies. But to define piece values that can be helpful to win games, it is only important to test positions that could occur in chames, or at least are not fundamentally different in character from what you might encounter in games. and the 4A-9N position definitely does not qualify as such. I think this is valid critisism against what Derek has done (testing super-pieces only against each other, without any lighter pieces being present), but has no bearing on what I have done. I never went further than playing each side with two copies of the same super-piece, by replacing another super-piece (which was then absent in that army). This is slightly unnatural, but I don't expect it to lead to qualitatively different games, as the super-pieces are similar in value and mobility. And unlike super-pieces share already some moves, so like and unlike super-pieces can cooperate in very similar ways (e.g. forming batteries). It did not essentially change the distribution of piece values, as all lower pieces were present in normal copy numbers. I understand that Derek likes to magnify the effect by playing several copies of the piece under test, but perhaps using 8 or 9 is overdoing it. To test a difference in piece value as large as 200cP, 3 copies should be more than enough: This can still be done in a reasonably realistic mix of pieces, e.g. replacing Q and C on one side by A, and on the other side by Q and A by C, so that you play 3C vs 3A, and then give additional Knight odds to the Chancellors. This would predict about +3 for the Chancellors with the SMIRF piece values, and -2.25 according to my values. Both imbalances are large enough to cause 80-90% win percentages, so that just a few games should make it obvious which value is very wrong.
Derek Nalls: | Given enough years (working with only one server), this quantity of | well-played games may eventually become adequate. I never found any effect of the time control on the scores I measure for some material imbalance. Within statistical error, the combinations I tries produced the same score at 40/15', 40/20', 40/30', 40/40', 40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I did not consider it worth doing just to prve that it was a waste of time... The way I see it, piece-values are a quantitative measure for the amount of control that a piece contributes to steering the game tree in the direction of the desired evaluation. He who has more control, can systematically force the PV in the direction of better and better evaluation (for him). This is a strictly local property of the tree. The only advantage of deeper searches is that you average out this control (which highly fluctuates on a ply-by play basis) over more ply. But in playing the game, you average over all plies anyway.
And thus I am convinced, that I have to include this aspect into SMIRF's successor's detail evaluation function.
... This can still be done in a reasonably realistic mix of pieces, e.g. replacing Q and C on one side by A, and on the other side by Q and A by C, so that you play 3C vs 3A, and then give additional Knight odds to the Chancellors. ...
And by that this would create just the problem I have tried to demonstrate. The three Chancellors could impossibly be covered, thus disabling their potential to risk their own existence by entering squares already influenced by the opponent's side.
Hard to see. You will wait for White to lose because of insufficient material, and I will await a loss of White because of the lonely big pieces disadvantage. It will be the task then to find out the true reasons of that.
I will try to create two arrays, where each side think to have advantage.
| And by that this would create just the problem I have tried to | demonstrate. The three Chancellors could impossibly be covered, | thus disabling their potential to risk their own existence by | entering squares already influenced by the opponent's side. You make it sound like it is a disadvantage to have a stronger piece, because it cannot go on squares attacked by the weaker piece. To a certain extent this is true, if the difference in capabilities is not very large. Then you might be better off ignoring the difference in some cases, as respecting the difference would actually deteriorate the value of the stronger piece to the point where it was weaker than the weak piece. (For this reason I set the B and N value in my 1980 Chess program Usurpator to exactly the same value.) But if the difference between the pieces is large, then the fact that the stronger one can be interdicted by the weaker one is simply an integral part of its piece value. And IMO this is not the reason the 4A-9N example is so biased. The problem there is that the pieces of one side are all worth more than TWICE that of the other. Rooks against Knights would not have the same problem, as they could still engage in R vs 2N trades, capturing a singly defended Knight, in a normal exchange on a single square. But 3 vs 1 trades are almost impossible to enforce, and require very special tactics. It is easy enough to verify by playtesting that playing CCC vs AAA (as substitutes for the normal super-pieces) will simply produce 3 times the score excess of playing a normal setup with on one side a C deleted, and at the other an A. The A side will still have only a single A to harrass every C. Most squares on enemy territory will be covered by R, B, N or P anyway, in addition to A, so the C could not go there anyway. And it is not true that anything defended by A would be immune to capture by C, as A+anything > C (and even 2A+anything > 2C. So defending by A will not exempt the opponent from defending as many times as there is attack, by using A as defenders. And if there was one other piece amongst the defenders, the C had no chance anyway. The effect you point out does not nearly occur as easily as you think. And, as you can see, only 5 of my different armies did have duplicated superpieces. All the other armies where just what you would get if you traded the mentioned pieces, thus detecting if such a trade would enhance or deteriorate your winning chances or not.
Reinhard, if I understand you correct, what you basically want to introduce in the evaluation is terms of the type w_ij*N_i*N_j, where N_i is the number of pieces of type i of one side, and N_j is the number of pieces of type j of the opponent, and w_ij is an tunable weight. So that, if type i = A and type j = N, a negative w_ij would describe a reduction of the value of each Archbishop by the presence of the enemy Knights, through the interdiction effect. Such a term would for instance provide an incentive to trade A in a QA vs ABNN for the QA side, as his A is suppressed in value by the presence of the enemy N (and B), while the opponent's A would not be similarly suppressed by our Q. On the contrary, our Q value would be suppressed by the the opponent's A as well, so trading A also benefits him there. I guess it should be easy enough to measure if terms of this form have significant values, by playing Q-BNN imbalances in the presence of 0, 1 and 2 Archbishops, and deducing from the score whose Archbishops are worth more (i.e. add more winning probability). And similarly for 0, 1, 2 Chancellors each, or extra Queens. And then the same thing with a Q-RR imbalance, to measure the effect of Rooks on the value of A, C or Q. In fact, every second-order term can be measured this way. Not only for cross products between own and enemy pieces, but also cooperative effects between own pieces of equal or different type. With 7 piece types for each side (14 in total) there would be 14*13/2 = 91 terms of this type possible.
'I never found any effect of the time control on the scores I measure for some material imbalance. Within statistical error, the combinations I tries produced the same score at 40/15', 40/20', 40/30', 40/40', 40/1', 40/2', 40/5'. Going to even longer TC is very expensive, and I did not consider it worth doing just to prove that it was a waste of time...' _________ The additional time I normally give to playtesting games to improve the move quality is partially wasted because I can only control the time per move instead of the number of plies completed using most chess variant programs. This usually results in the time expiring while it is working on an incomplete ply. Then, it prematurely spits out a move representative of an incomplete tour of the moves available within that ply at a random fraction of that ply. Since there is always more than one move (often, a few-several) under evaluation as being the best possible move [Otherwise, the chosen move would have already been executed.], this means that any move on this 'list of top candidates' is equally likely to be randomly executed. Here are two typical scenarios that should cover what usually happens: A. If the list of top candidates in an 11-ply search consists of 6 moves where the list of top candidates in a 10-ply search consists of 7 moves, then only 1 discovered-to-be-less-than-the-best move has been successfully excluded and cannot be executed. Of course, an 11-ply search completion may typically require est. 8-10 times as much time as the search completions for all previous plies (1-ply thru 10-ply) up until then added together. OR B. If the list of top candidates in an 11-ply search consists of 7 moves [Moreover, the exact same 7 moves.] just as the preceding 10-ply search, then there is no benefit at all in expending 8-10 times as much time. ______________________________________________________________ The reason I endure this brutal waiting game is not for purely masochistic experience but because the additional time has a tangible chance (although no guarantee) of yielding a better move with every occasion. Throughout the numerous moves within a typical game, it can be realistically expected to yield better moves on dozens of occasions. We usually playtest for purposes at opposite extremes of the spectrum yet I regard our efforts as complimentary toward building a complete picture involving material values of pieces. You use 'asymmetrical playtesting' with unequal armies on fast time controls, collect and analyze statistics ... to determine a range, with a margin of error, for individual material piece values. I remain amazed (although I believe you) that you actually obtain any meaningful results at all via games that are played so quickly that the AI players do not have 'enough time to think' while playing games so complex that every computer (and person) needs time to think to play with minimal competence. Can you explain to me in a way I can understand how and why you are able to successfully obtain valuable results using this method? The quality of your results was utterly surprising to me. I apologize for totally doubting you when you introduced your results and mentioned how you obtained them. I use 'symmetrical playtesting' with identical armies on very slow time controls to obtain the best moves realistically possible from an evaluation function thereby giving me a winner (that is by some margin more likely than not deserving) ... to determine which of two sets of material piece values is probably (yet not certainly) better. Nonetheless, as more games are likewise played ... If they present a clear pattern, then the results become more probable to be reliable, decisive and indicative of the true state of affairs. The chances of flipping a coin once and it landing 'heads' are equal to it landing 'tails'. However, the chances of flipping a coin 7 times and it landing 'heads' all 7 times in a row are 1/128. Now, replace the concepts 'heads' and 'tails' with 'victory' and 'defeat'. I presume you follow my point. The results of only a modest number of well-played games can definitely establish their significance beyond chance and to the satisfaction of reasonable probability for a rational human mind. [Most of us, including me, do not need any better than a 95%-99% success to become convinced that there is a real correlation at work even though such is far short of an absolute 100% mathematical proof.] In my experience, I have found that using any less than 10 minutes per move will cause at least one instance within a game when an AI player makes a move that is obvious to me (and correctly assessed as truly being) a poor move. Whenever this occurs, it renders my playtesting results tainted and useless for my purposes. Sometimes this occurs during a game played at 30 minutes per move. However, this rarely occurs during a game played at 90 minutes per move. For my purposes, it is critically important above all other considerations that the winner of these time-consuming games be correctly determined 'most of the time' since 'all of the time' is impossible to assure. I must do everything within my power to get as far from 50% toward 100% reliability in correctly determining the winner. Hence, I am compelled to play test games at nearly the longest survivable time per move to minimize the chances that any move played during a game will be an obviously poor move that could have changed the destiny of the game thereby causing the player that should have won to become the loser, instead. In fact, I feel as if I have no choice under the circumstances.
Harm, I think of a more simple formula, because it seems to be easier to find out an approximation than to weight a lot of parameters facing a lot of other unhanded strange effects. Therefore my less dimensional approach is looking like: f(s := sum of unbalanced big pieces' values, n := number of unbalanced big pieces, v := value of biggest opponents' piece). So I intend to calculate the presumed value reduction e.g. as: (s - v*n)/constant P.S.: maybe it will make sense to down limit v by s/(2*n) to prevent a too big reduction, e.g. when no big opponents' piece would be present at all. P.P.S.: There have been some more thoughts of mine on this question. Let w := sum of n biggest opponent pieces, limited by s/2. Then the formula should be: (s - w)/constant P.P.P.S.: My experiments suggest, that the constant is about 2.0 P^4.S.: I have implemented this 'Elephantiasis-Reduction' (as I will name it) in a new private SMIRF version and it is working well. My constant is currently 8/5. I found out, that it is good to calculate one more piece than being without value compensation, because that bottom piece pair could be of switched size and thus would reduce the reduction. Non existing opponent pieces will be replaced by a Knight piece value within the calculation. I noticed a speeding up of SMIRF when searching for mating combinations (by normal play). I also noticed that SMIRF is making sacrifices, incorporating vanishing such penalties of the introduced kind.
Before Scharnagl sent me three special versions of SMIRF MS-174c compiled with the CRC material values of Scharnagl, Muller & Nalls, I began playtesting something else that interested me using SMIRF MS-174b-O. I am concerned that the material value of the rook (especially compared to the queen) amongst CRC pieces in the Muller model is too low: rook 55.88 queen 111.76 This means that 2 rooks exactly equal 1 queen in material value. According to the Scharnagl model: rook 55.71 queen 91.20 This means that 2 rooks have a material value (111.42) 22.17% greater than 1 queen. According to the Nalls model: rook 59.43 queen 103.05 This means that 2 rooks have a material value (118.86) 15.34% greater than 1 queen. Essentially the Scharnagl & Nalls models are in agreement in predicting victories in a CRC game for the player missing 1 queen yet possessing 2 rooks. By contrast, the Muller model predicts draws (or appr. equal number of victories and defeats) in a CRC game for either player. I put this extraordinary claim to the test by playing 2 games at 10 minutes per move on an appropriately altered Embassy Chess setup with the missing-1-queen player and the missing-2-rooks player each having a turn at white and black. The missing-2-rooks player lost both games and was always behind. They were not even long games at 40-60 moves. Muller: I think you need to moderately raise the material value of your rook in CRC. It is out of its proper relation with the other material values within the set.
To Derek: I am aware that the empirical Rook value I get is suspiciously low. OTOH, it is an OPENING value, and Rooks get their value in the game only late. Furthermore, this only is the BASE VALUE of the Rook; most pieces have a value that depends on the position on the board where it actually is, or where you can quickly get it (in an opening situation, where the opponent is not yet able to interdict your moves, because his pieces are in inactive places as well). But Rooks only increase their value on open files, and initially no open files are to be seen. In a practical game, by the time you get to trade a Rook for 2 Queens, there usually are open files. So by that time, the value of the Q vs 2R trade will have gone up by two times the open-file bonus. You hardly have the possibility of trading it before there are open files. So it stands to reason that you might as well use the higher value during the entire game. In 8x8 Chess, the Larry Kaufman piece values include the rule that a Rook should be devaluated by 1/8 Pawn for each Pawn on the board there is over five. In the case of 8 Pawns that is a really large penalty of 37.5cP for having no open files. If I add that to my opening value, the late middle-game / end-game value of the Rook gets to 512, which sounds a lot more reasonable. There are two different issues here: 1) The winning chances of a Q vs 2R material imbalance game 2) How to interpret that result as a piece value All I say above has no bearing on (1): if we both play a Q-2R match from the opening, it is a serious problem if we don't get the same result. But you have played only 2 games. Statistically, 2 games mean NOTHING. I don't even look at results before I have at least 100 games, because before they are about as likely to be the reverse from what they will eventually be, as not. The standard deviation of the result of a single Gothic Chess game is ~0.45 (it would be 0.5 point if there were no draws possible, and in Gothic Chess the draw percentge is low). This error goes down as the square root of the number of games. In the case of 2 games this is 45%/sqrt(2) = 32%. The Pawn-odds advantage is only 12%. So this standard error corresponds to 2.66 Pawns. That is 1.33 Pawns per Rook. So with this test you could not possibly see if my value is off by 25, 50 or 75. If you find a discrepancy, it is enormously more likely that the result of your 2-game match is off from to true win probability. Play 100 games, and the error in the observed score is reasonable certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only thn you can see with reasonable confidence if your observations differ from mine.
'You hardly have the possibility of trading it before there are open files. So it stands to reason that you might as well use the higher value during the entire game.' Well, I understand and accept your reasons for leaving your lower rook value in CRC as is. It is interesting that you thoroughly understand and accept the reasons of others for using a higher rook value in CRC as well. Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic? _____________________________ '... if we both play a Q-2R match from the opening, it is a serious problem if we don't get the same result. But you have played only 2 games. Statistically, 2 games mean NOTHING.' I never falsely claimed or implied that only 2 games at 10 minutes per move mean everything or even mean a great deal (to satisfy probability overwhelmingly). However, they mean significantly more than nothing. I cannot accept your opinion, based upon a purely statistical viewpoint, since it is at the exclusion another applicable mathematical viewpoint. They definitely mean something ... although exactly how much is not easily known or quantified (measured) mathematically. __________________________________________________ 'I don't even look at results before I have at least 100 games, because before they are about as likely to be the reverse from what they will eventually be, as not.' Statistically, when dealing with speed chess games populated exclusively with virtually random moves ... YES, I can understand and agree with you requiring a minimum of 100 games. However, what you are doing is at the opposite extreme from what I am doing via my playtesting method. Surely you would agree that IF I conducted only 2 games with perfect play for both players that those results would mean EVERYTHING. Unfortunately, with state-of-the-art computer hardware and chess variant programs (such as SMIRF), this is currently impossible and will remain impossible for centuries-millennia. Nonetheless, games played at 100 minutes per move (for example) have a much greater probability of correctly determining which player has a definite, significant advantage than games played at 10 seconds per move (for example). Even though these 'deep games' play of nowhere near 600 times better quality than these 'shallow games' as one might naively expect (due to a non-linear correlation), they are far from random events (to which statistical methods would then be fully applicable). Instead, they occupy a middleground between perfect play games and totally random games. [In my studied opinion, the example 'middleground games' are more similar to and closer to perfect play games than totally random games.] To date, much is unknown to combinatorial game theory about the nature of these 'middleground games'. Remember the analogy to coin flips that I gave you? Well, in fact, the playtest games I usually run go far above and beyond such random events in their probable significance per event. If the SMIRF program running at 90 minutes per move casted all of its moves randomly and without any intelligence at all (as a perfect woodpusher), only then would my 'coin flip' analogy be fully applicable. Therefore, when I estimate that it would require 6 games (for example) for me to determine, IF a player with a given set of piece values loses EVERY game, that there is only a 63/64 chance that the result is meaningful (instead of random bad luck), I am being conservative to the extreme. The true figure is almost surely higher than a 63/64 chance. By the way, if you doubt that SMIRF's level of play is intelligent and non-random, then play a CRC variant of your choice against it at 90 minutes per move. After you lose repeatedly, you may not be able to credit yourself with being intelligent either (although you should) ... if you insist upon holding an impractically high standard to define the word. ______ 'If you find a discrepancy, it is enormously more likely that the result of your 2-game match is off from its true win probability.' For a 2-game match ... I agree. However, this may not be true for a 4-game, 6-game or 8-game match and surely is not true to the extremes you imagine. Everything is critically dependant upon the specifications of the match. The number of games played (of course), the playing strength or quality of the program used, the speed of the computer and the time or ply depth per move are the most important factors. _________________________________________________________ 'Play 100 games, and the error in the observed score is reasonable certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.' It would require est. 20 years for me to generate 100 games with the quality (and time controls) I am accustomed to and somewhat satisfied with. Unfortunately, it is not that important to me just to get you to pay attention to the results for the benefit of only your piece values model. As a practical concern to you, everyone else who is working to refine quality piece values models in FRC and CRC will have likely surpassed your achievements by then IF you refuse to learn anything from the results of others who use different yet valid and meaningful methods for playtesting and mathematical analysis than you.
Drek Nalls: | They definitely mean something ... although exactly how much is not | easily known or quantified (measured) mathematically. Of course that is easily quantified. The entire mathematical field of statistics is designed to precisely quantify such things, through confidence levels and uncertainty intervals. The only thing you proved with reasonable confidence (say 95%) is that two Rooks are not 1.66 Pawn weaker than a Queen. So if Q=950, then R > 392. Well, no one claimed anything different. What we want to see is if Q-RR scores 50% (R=475) or 62% (R=525). That difference just can't be seen with two games. Play 100. There is no shortcut. Even perfect play doesn't help. We do have perfect play for all 6-men positions. Can you derive piece values from that, even end-game piece values??? | Statistically, when dealing with speed chess games populated | exclusively with virtually random moves ... YES, I can understand and | agree with you requiring a minimum of 100 games. However, what you | are doing is at the opposite extreme from what I am doing via my | playtesting method. Where do you get this nonsense? This is approximately master-level play. Fact is that results from playing opening-type positions (with 35 pieces or more) are stochastic quantity at any level of play we are likely to see the next few million years. And even if they weren't, so that you could answer the question 'who wins' through a 35-men tablebase, you would still have to make some average over all positions (weighted by relevance) with a certain material composition to extract piece values. And if you would do that by sampling, the resukt would again be a sochastic quantity. And if you would do it by exhaustive enumeration, you would have no idea which weights to use. And if you are sampling a stochastic quantity, the error will be AT LEAST as large as the statistical error. Errors from other sources could add to that. But if you have two games, you will have at least 32% error in the result percentage. Doesnt matter if you play at an hour per move, a week per move, a year per move, 100 year per move. The error will remain >= 32%. So if you want to play 100 yesr per move, fine. But you will still need 100 games. | Nonetheless, games played at 100 minutes per move (for example) have | a much greater probability of correctly determining which player has | a definite, significant advantage than games played at 10 seconds per | move (for example). Why do I get the suspicion that you are just making up this nonsense? Can you show me even one example where you have shown that a certain material advantage would be more than 3-sigma different for games at 100 min / move than for games at 1 sec/move? Show us the games, then. Be aware that this would require at least 100 games at aech time control. That seems to make it a safe guess that you did not do that for 100 min/move. On the other hand, in stead of just making things up, I have actually done such tests, not with 100 games per TC, but with 432, and for the faster even with 1728 games per TC. And there was no difference beyond the expected and unavoidable statistical fluctuations corresponding to those numbers of games, between playing 15 sec or 5 minutes. The advantage that a player has in terms of winning probability is the same at any TC I ever tried, and can thus equally reliably be determined with games of any duration. (Provided ou have the same number of games). If you think it would be different for extremely long TC, show us statistically sound proof. I might comment on the rest of your long posting later, but have to go now...
'Of course, that is easily quantified. The entire mathematical field of statistics is designed to precisely quantify such things, through confidence levels and uncertainty intervals.' No, it is not easily quantified. Some things of numerical importance as well as geometric importance that we try to understand or prove in the study of chess variants are NOT covered or addressed by statistics. I wish our field of interest was that simple (relatively speaking) and approachable but it is far more complicated and interdisciplinary. All you talk about is statistics. Is this because statistics is all you know well? ___________ 'That difference just can't be seen with two games. Play 100. There is no shortcut.' I agree. Not with only 2 games. However ... With only 4 games, IF they were ALL victories or defeats for the player using a given piece values model, I could tell you with confidence that there is at least a 15/16 chance the given piece values model is stronger or weaker, respectively, than the piece values model used by its opponent. [Otherwise, the results are inconclusive and useless.] Furthermore, based upon the average number of moves per game required for victory or defeat compared to the established average number of moves in a long, close game, I could probably, correctly estimate whether one model was a little or a lot stronger or weaker, respectively, than the other model. Thus, I will not play 100 games because there is no pressing, rational need to reduce the 'chance of random good-bad luck' to the ridiculously-low value of 'the inverse of (base 2 to exponent 100)'. Is there anything about the odds associated with 'flipping a coin' that is beyond your ability to understand? This is a fundamental mathematical concept applicable without reservation to symmetrical playtesting. In any case, it is a legitimate 'shortcut' that I can and will use freely. ________________ 'Even perfect play doesn't help. We do have perfect play for all 6-men positions.' I meant perfect play throughout an entire game of a CRC variant involving 40 pieces initially. That is why I used the word 'impossible' with reference to state-of-the-art computer technology. _______________________________________________________ 'This is approximately master-level play.' Well, if you are getting master-level play from Joker80 with speed chess games, then I am surely getting a superior level of play from SMIRF with much longer times and deeper plies per move. You see, I used the term 'virtually random moves' appropriately in a comparative context based upon my experience. _____________________________________________ 'Doesn't matter if you play at an hour per move, a week per move, a year per move, 100 year per move. The error will remain >=32%. So if you want to play 100 years per move, fine. But you will still need 100 games.' Of course, it matters a lot. If the program is well-written, then the longer it runs per move, the more plies it completes per move and consequently, the better the moves it makes. Hence, the entire game played will progressively approach the ideal of perfect play ... even though this finite goal is impossible to attain. Incisive, intelligent, resourceful moves must NOT to be confused with or dismissed as purely random moves. Although I could humbly limit myself to applying only statistical methods, I am totally justified, in this case, in more aggressively using the 'probabilities associated with N coin flips ALL with the same result' as an incomplete, minimum value before even taking the playing strength of SMIRF at extremely-long time controls into account to estimate a complete, maximum value. ______________________________________________________________ 'The advantage that a player has in terms of winning probability is the same at any TC I ever tried, and can thus equally reliably be determined with games of any duration.' You are obviously lacking completely in the prerequisite patience and determination to have EVER consistently used long enough time controls to see any benefit whatsoever in doing so. If you had ever done so, then you would realize (as everyone else who has done so realizes) that the quality of the moves improves and even if the winning probability has not changed much numerically in your experience, the figure you obtain is more reliable. [I cannot prove to you that this 'invisible' benefit exists statistically. Instead, it is an important concept that you need to understand in its own terms. This is essential to what most playtesters do, with the notable exception of you. If you want to understand what I do and why, then you must come to grips with this reality.]
CRC piece values tournament http://www.symmetryperfect.com/pass/ Just push the 'download now' button. Game #1 Scharnagl vs. Muller 10 minutes per move SMIRF MS-174c Result- inconclusive. Draw after 87 moves by black. Perpetual check declared.
This discussion is pointless. In dealing with a stochastic quantity, if your statistics are no good, your observations are no good, and any conclusions based on them utterly meaningless. Nothing of what you say here has any reality value, it is just your own fantasies. First you should have results, then it becomes possible to talk about what they mean. You have no result. Get statistically meaningful testresults. If your method can't produce them, or you don't feel it important enough to make your method produce them, don't bother us with your cr*p instead. Two sets of piece values as different as day and knight, and the only thing you can come up with is that their comparison is 'inconclusive'. Are you sure that you could conclusively rule out that a Queen is worth 7, or a Rook 8, by your method of 'playtesting'? Talk about pathetic: even the two games you played are the same. Oh man, does your test setup s*ck! If you cannot even decide simple issues like this, what makes you think you have anything meaningful to say about piece values at all?
Once upon a time I had a friend in a country far, far away, who had obtained a coin from the bank. I was sure this coin was counterfeit, as it had a far larger probability of producing tails. I even PROVED it to him: I threw the coin twice, and both times tails came up. But do you think the fool believed me? No, he DIDN'T! He had the AUDACITY to claim there was nothing wrong with the coin, because he had tossed it a thouand times, and 523 times heads had come up! While it was clear to everyone that he was cheating: he threw the coin only 10 feet up into the air, on each try. While I brought my coin up to 30,000 feet in an airplane, before I threw it out of the window, BOTH times! And, mind you, both times it landed tails! And it was not just an ordinary plane, like a Boeing 747. No sir, it was a ROCKET plane! And still this foolish friend of mine insisted that his measly 10 feet throws made him more confident that the coin was OK then my IRONCLAD PROOF with the rocket plane. Ridicuoulous! Anyone knows that you can't test a coin by only tossing it 10 feet. If you do that, it might land on any side, rather than the side it always lands on. He might as well have flipped a coin! No wonder they send him to this far, far away country: no one would want to live in the same country as such an idiot. He even went as far as to buy an ICECREAM for that coin, and even ENJOYED eating that! Scandalous! I can tell you, he ain't my friend anymore! Using coins that always land on one side as if it were real money. For more fairy tales and bed-time stories, read Derek's postings on piece values... :-) :-) :-)
Two suggestion for settling debates such as these. First distributed computing to provide as much data as possible. And bayesian statistical methods to provide statistical bounds on results.
Jianying Ji: | Two suggestion for settling debates such as these. First distributed | computing to provide as much data as possible. And bayesian statistical | methods to provide statistical bounds on results. Agreed: one first needs to generate data. Without data, there isn't even a debate, and everything is just idle talk. What bounds would you expect from a two-game dataset? And what if these two games were actually the same? But the problem is that the proverbial fool can always ask more than anyone can answer. If, by recruting all PCs in the World, we could generate 100,000 games at an hour per move, an hour per move will of course not be 'good enough'. It will at least have to be a week per move. Or, if that is possible, 100 years per move. And even 100 years per move are of course no good, because the computers will still not be able to search into the end-game, as they will search only 12 ply deeper than with 1 hour per move. So what's the point? Not only is his an énd-of-the-rainbow-type endeavor, even if you would get there, and generate the perfect data, where it is 100% sure and prooven for each position what the outcome under perfect play is, what then? Because for simple end-games we are alrady in a position to reach perfect play, through retrograde analysis (tablebases). So why not start there, to show that such data is of any use whatsoever, in this case for generating end-game piece values? If you have the EGTB for KQKAN, and KAKBN, how would you extract a piece value for A from it?
'This discussion is pointless.' On this one occasion, I agree with you. However, I cannot just let you get away with some of your most outrageous remarks to date. So, unfortunately, this discussion is not yet over. ____________________________________________ 'First you should have results, then it becomes possible to talk about what they mean. You have no result.' Of course, I have a result! The result is obviously the game itself as a win, loss or draw for the purposes of comparing the playing strengths of two players using different sets of CRC piece values. The result is NOT statistical in nature. Instead, the result is probabilistic in nature. I have thoroughly explained this purpose and method to you. I understand it. Reinhard Scharnagl understands it. You do not understand it. I can accept that. However, instead of admitting that you do not understand it, you claim there is nothing to understand. ______________________________________ 'Two sets of piece values as different as day and night, and the only thing you can come up with is that their comparison is 'inconclusive'.' Yes. Draws make it impossible to determine which of two sets of piece values is stronger or weaker. However, by increasing the time (and plies) per move, smaller differences in playing strength can sometimes be revealed with 'conclusive' results. I will attempt the next pair of Scharnagl vs. Muller and Muller vs. Scharnagl games at 30 minutes per move. Knowing how much you appreciate my efforts on your behalf motivates me. ___________________________________________________ 'Talk about pathetic: even the two games you played are the same.' Only one game was played. The logs you saw were produced by the Scharnagl (standard) version of SMIRF for the white player and the Muller (special) version of SMIRF for the black player. The game is handled in this manner to prevent time from being expired without computation occurring. ___________________________________________________ '... does your test setup s*ck!' What, now you hate Embassy Chess too? Take up this issue with Kevin Hill.
I really am completely lost, so I won't comment until I can see what the debate is about.
Understanding your example as an argument against Derek Nalls' testing method, I wonder why your chess engines always are thinking using the full given timeframe. It would be much more impressive, if your engine would decide always immediately. ;-)
I am still convinced, that longer thinking times would have an influence on the quality of the resulting moves.
Since I had to endure one of your long bedtime stories (to be sure), you are going to have to endure one of mine. Yet unlike yours [too incoherent to merit a reply], mine carries an important point: Consider it a test of your common sense- Here is a scenario ... 01. It is the year 2500 AD. 02. Androids exist. 03. Androids cannot tell lies. 04. Androids can cheat, though. 05. Androids are extremely intelligent in technical matters. 06. Your best friend is an android. 07. It tells you that it won the lottery. 08. You verify that it won the lottery. 09. It tells you that it purchased only one lottery ticket. 10. You verify that it purchased only one lottery ticket. 11. The chance of winning the lottery with only one ticket is 1 out of 100 million. 12. It tells you that it cheated to win the lottery by hacking into its computer system immediately after the winning numbers were announced, purchasing one winning ticket and back-dating the time of the purchase. ____________________________________________ You have only two choices as to what to believe happened- A. The android actually won the lottery by cheating. OR B. The android actually won the lottery by good luck. The android was mistaken in thinking it successfully cheated. ______________________________________________________ The chance of 'A' being true is 99,999,999 out of 100,000,000. The chance of 'B' being true is 1 out of 100,000,000. ________________________________________________ I would place my bet upon 'A' being true because I do not believe such unlikely coincidences will actually occur. You would place your bet upon 'B' being true because you do not believe such unlikely coincidences have any statistical significance whatsoever. _________________________________________ I make this assessment of your judgment ability fairly because you think it is a meaningless result if a player with one set of CRC piece values wins against its opponent 10-times-in-a-row even as the chance of it being 'random good luck' is indisputably only 1 out of 1024. By the way ... base 2 to exponent 100 equals 1,267,650,600,228,229,401,496,703,205,376. Can you see how ridiculous your demand of 100 games is?
Is this story meant to illustrate that you have no clue as to how to calculate statistical significance? Or perhaps that you don't know what it is at all? The observation of a single tails event rules out the null hypothesis that the lottery was fair (i.e. that the probability for this to happen was 0.000,000,01) with a confidence of 99.999,999%. Be careful, though, that this only describes the case where the winning android was somehow special or singled out in advance. If the other participants to the lottery were 100 million other cheating androids, it would not be remarkable in anyway that one of them won. The null hypothesis that the lottery was fair predicted a 100% probability for that. But, unfortunately for you, it doesn't work for lotteries with only 2 tickets. Then you can rule the null hypothesis that the lottery was fair (and hence the probability 0.5) with a confidence of 50%. And 50% confidence means that in 50% of the cases your conclusion is correct, and in the other 50% of the cases not. In other words, a confidence level of 50% is a completely blind, uninformed random guess.
Reinhard Scharnagl: | I am still convinced, that longer thinking times would have an | influence on the quality of the resulting moves. Yes, so what? Why do you think that is a relevant remark? The better moves won't help you at all, if the opponent also does better moves. The result will be the same. And the rare cases it is not, on the average cancel each other. So for the umptiest time: NO ONE DENIES THAT LONGER THINKING TIME PRODUCES SOMEWHAT BETTER MOVES. THE ISSUE IS THAT IF BOTH SIDES PLAY WITH LONGER TC, THEIR WINNING PROBABILITIES WON'T CHANGE. And don't bother to to tell us that you are also convinced that the winning probabilities will change, without showing us proof. Because no one is interested in unfounded opinions, not even if they are yours.
'Is this story meant to illustrate that you have no clue as to how to calculate statistical significance?' No. This story is meant to illustrate that you have no clue as to how to calculate probabilistic significance ... and it worked perfectly. ________________________________________________________ There you go again. Missing the point entirely and ranting about probabilities not being proper statistics.
To H.G.M.: why have you to be that unfriendly? But to give you a strong argument, that longer thinking phases could change a game result: have a look at: [site removed], where [a claim is made], that there would be a mate in 9. In fact there SMIRF has been in a lost situaton. But watching a chess engine calculate on that position, you could see, that an initial heavy disadvantage switches into a secure win. Having engines calculate with short time frames would probably lead to another result. Here increasing thinking time indeed is leading to a result switch. [The above has been edited to remove a name and site reference. It is the policy of cv.org to avoid mention of that particular name and site to remove any threat of lawsuits. Sorry to have to do that, but we must protect ourselves. -D. Howe]
Reinhard, that is not relevant. It will happen on the average as often for the other side. It is in the nature of Chess. Every game that is won, is won by an error, that might not have been made on longer thinking. As the initial position is not a won position for eaiter side. But most games are won by either side, and if they are allowed to think longer, most games are still won by either side. What is so hard to understand about the statement 'the win probability (score fraction, if you allow for draws) obtained from a given quiet, but complex (many pieces) position between equal opponents does not depend on time control' that it prompt people to come up with irrelevancies? Why do you think that saying anything at all that does not mention an observed probability would have any bearing on this statement whatsoever? I don't think the ever more hollow sounding selfdeclared superiority of Derek need much comment. He obviously doesn't know zilch about probability theory and statistics. Shouting that he does won't make it so, and won't fool anyone.
This discussion is too silly for words anyway. Because even if it were true that the winning probability for a given material imbalance would be different at 1 hour per move than it would be at 10 sec/move, it would merely mean that piece values are different for different quality players. And although that is unprecedented, that revelation in itself would not make the piece values at 1 hour per move of any use, as that is a time control that no one wants to play anyway. So the whole endeavor is doomed from the start: by testing at 1 hour per move, either you measure the same piece values as you would at 10 sec/move, and wasted 99.7% of your time, or you find different values, and then you have wrong values, which cannot be used at any time control you would actually want to play...
Here is another approach I would suggest for strength of pieces. How about we pick 100 and people order them from strongest to weakest? Work on a scoring system for position, and then at least get an idea of order of strength. Anyone think this might be a sound approach?
Rich Hutnik: | Anyone think this might be a sound approach? Well, not me! Science is not a democracy. We don't interview people in the street to determine if a neutron is heavier than a proton, or what the 100th decimal of the number pi is. At best, you could use this method to determine the CV rating of the interviewed people. But even if a million people would think that piece A is worth more than piece B, and none the other way around, that doesn't make it so. The only thing that counts is if A makes you win more often than B would. If it doesn't, than it is of lower value. No matter what people say, or how many say it.
To anyone who was interested ... My playtesting efforts using SMIRF have been suspended indefinitely due to a serious checkmate bug which tainted the first game at 30 minutes per move between Scharnagl's and Muller's sets of CRC piece values.
Since Muller's Joker80 has recently established itself via 'The Battle Of The (Unspeakables)' tournament as the best free CRC program in the world, I checked it out. I must report that setting-up Winboard F (also written by Muller) to use it was straight-forward with helpful documentation. Generally, I am finding the features of Joker80 to be versatile and capable for any reasonable uses.
Muller: I would like to conduct two focused playtests using Joker80 at very long time controls (e.g., 30 minutes per move) to investigate these important questions- 1. Is Muller's rook value within the CRC set too low? 2. Is Scharnagl's archbishop value within the CRC set too low? I would need for you to compile special versions of Joker80 for me using significantly different values for those CRC pieces as well as Scharnagl's CRC piece set. To isolate the target variable, these games would be Muller (standard values) vs. Muller (test values) and Scharnagl (standard values) vs. Scharnagl (test values) via symmetrical playtesting. Anyway, we can discuss the details if you are interested or willing. Please let me know.
Muller: Please investigate this potentially serious bug I may have discovered while testing Joker80 under Winboard F ... Bugs, Bugs, Bugs! http://www.symmetryperfect.com/pass I am having a hard time with software today.
'Human vs. engine play is virtually untested. Did you at any point of the game use 'undo' (through the WinBoard 'retract move')?' Yes. Many of us error-prone humans use it frequently. ________________________________________________ 'This is indeed something I should fix but the current work-around would be not to use 'undo'.' Makes sense to me. I can avoid using the 'retract move' command altogether. ________________________________________________________ 'I could make a Joker80 version that reads the piece base values from a file 'joker.ini' at startup. Then you could change them to anything you want to test, without the need to re-compile. Would that satisfy your needs?' Yes, better than I ever imagined. Thank you!
OK, I replaced the joker80.exe on my website by one with adjustable piece values. (If you run it from the command line, it should say version 1.1.14 (h).) I also tried to fix the bug in undo (which I discoverd was disabled altogether in the previous version), and although it seemed to work, it might remain a weak spot. (I foresee problems if the game contained a promotion, for instance, as it might not remember the correct promotion piece on replay.) So try to avoid using the undo. I decided to make the piece values adjustable through a command-line option, rather than from a file, to avoid problems if you want to run two different sets of piece values (where you then would have to keep the files separate somehow). The way it works now is that for the engine name (that WinBoard asks in the startup dialog, or that you can put in the winboard.ini file to appear in the selectable engines there), you should write: joker80.exe P85=300=350=475=875=900=950 The whole thing should be put between double quotes, so that WinBoard knows the P... is an option to the engine, and not to WinBoard. The numerical values are those of P, N, B, R, A, C and Q, respectively, in centiPawn. You can replace them by any value you like. If you don't give the P argument, it uses the default values. If you give a P argument with not enough values, the engine exits. Note that these are base values, for the positionally average piece. For N and B this would be on c3, in the presence (for B) of ~ 6 own Pawns, half of them on the color of the Bishop. A Bishop pair further gets 40cP bonus. For the Rook it is the value for one in the absence of (half-)open files. The Pawn value will be heavily modified by positional effects (centralization, support by own Pawns, blocking by enemy Pawns), which on the average will be positive. Note that you can play two different versions against each other automatically. The first engine plays white, in two-machines mode. (You won't be able to recognize them from their name...)
One small refinement: If the command-line argument was used to modify the piece values, Joker80 will give its own name to WinBoard as 'Joker80.xp', in stead of 'Joker80.np', so that it becomes less hard to figure out which engine was winning (e.g. from the PGN file). Note also that at very long time control you might want to enlarge the hash table; default is 128MB, but if you invoke Joker80 as 'joker80.exe 22 P100=300=....' it will use 256MB (and with 23 in stead of 22 it will use 512MB, etc.)
Everything is working fine. Thank you! I now have 12 instances of the Joker80 program running in various sub-directories of Winboard F with the 'winboard.ini' file set to conveniently initiate any desired standard or special material values for the CRC models by Muller, Scharnagl and Nalls. In the first test, I am going to attempt to find a playtesting time where a distinct seperation in playing strength occurs between the standard Muller model wherein the rook is 1 pawn more valuable than the bishop and a special Muller model wherein the rook is 2 pawns more valuable than the bishop. If I successfully find a playtesting time that is survivable by humans, then we can hopefully establish a tentative probability as to which CRC model plays decisively better after a few-several games. At par 100 (for the pawn), the bishop is at 459 under both models with the rook at 559 under the standard Muller model and 659 under the special Muller model. I want to playtest a special Muller model with a rook value 2.00 pawns higher than the bishop because the Nalls model has a rook value 2.19 pawns higher than the bishop and the Scharnagl model has a rook value 1.94 pawns higher than the bishop (for an average of 2.06 pawns). Since I am attempting to test for such a small difference in the material value of only one type of piece (the rook), I have doubts that I will be able to obtain conclusive results. In any case ... If I obtain conclusive results, then very long time controls will surely be required to produce them.
Well, to get an impression at what you can expect: In my first versions of Joker80 I still used the Larry-Kaufman piece values of 8x8 Chess. So the Bishop was half a Pawn too low, nearly equal to the Knight (as with more than 5 Pawns, Kaufman has a Knight worth more than a lone Bishop, neutraling a large part of the pair bonus.) Now unlike a Rook, a Bishop is very easy to trade for a Knight, as both get into play early. Making the trade usually wrecks the opponent's pawn structure by creating a doubled Pawn, giving enough compensation to make it attractive. So in almost all games Joker played with two Knights against two Bishops after 12 moves or so. Fixing that did increase the playing strength by ~100 Elo points. So where the old version would score 50%, the improved version would score 57%. Now a similarly bad value for the Rook would manifest itself much more difficultly: the Rooks get into play late, there is no nearly equal piece for which a 1:1 trade changes sign, and you would need 1:3 trades (R vs B+2P) or 2:2 trades (R+P for N+N), which are much more difficult to set up. So I would expect that being half a Pawn off on the Rook value would only reduce your score by about 3%, rather than 7% as with the Bishop. After playing 100 games, the score differs by more than 3% from the true win probability more often than not. So you would need at least 400 games to show with minimal confidence that there was a difference. Beware that the result of the games are stochastic quantities. Replay the game at the same time control, and the game Joker80 plays will be different. And often the result will be different. This is true at 1 sec per move, but it is equally true at 1 year per move. The games that will be played, are just a sample from the myriads of games Joker80 could play with non-zero probability. And with fewer than 400 games, the difference between the actually measured score percentage and the probability you want to determine will be in most cases larger than the effect of the piece values, if they are not extremey wrong (e.g. setting Q < B).
Of course, I would bet anything that there are no 1:1 exchanges supported under the standard Muller CRC model that could cause material losses. If that were the case, yours would not be one of the three most credible CRC models under close consideration. In fact, even your excellent Joker80 program would play poorly if stuck with using faulty CRC piece values. Obviously, the longer the exchange, the rarer its occurrence during gameplay. The predominance of simple 1:1 exchanges over even the least complicated, 1:2 or 2:1 exchanges, in gameplay is large although I do not know the stats. In fact, there is a certain 1:2 or 2:1 exchange I am hoping to see that is likely to support my contention that the Muller rook value should be higher: the 1 queen for 2 rooks or 2 rooks for 1 queen exchange. Please recall that under the standard Muller model, this is an equal exchange. However, under asymmetrical playtesting of comparable quality to and similar to that I used to confirm the correctness of your higher archbishop value, I played numerous CRC games at various moderate time controls where the player without 1 queen (yet with 2 rooks) defeated the player without 2 rooks (yet with 1 queen). Ultimately, a key mechanism to conclusive results is that while the standard Muller model is neutral toward a 2 rook : 1 queen or 1 queen : 2 rook exchange, the special Muller model regards its 1 queen as significantly less valuable than 2 rooks of its opponent. Consequently, this contrast in valuation could be played into ... and we would see who wins. I am actually pleased that you are a realist who shares my pessimism in this experiment. In any case, low odds do not deter a best effort to succeed. The main difference between us is that you calculate your pessimism by extreme statistical methods whereas I calculate my pessimism by moderate probabilistic methods. I remain hopeful that eventually I will prove to you that the method Scharnagl & I developed is occasionally productive.
Muller: Please confirm that these are legal values for the 'winboard.ini' file. /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P100=353=459=559=1029=1059=1118' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P100=353=459=659=1029=1059=1118' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P100=306=363=557=702=912=960' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P100=306=363=557=866=912=960' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P100=353=459=559=1029=1059=1118' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P100=353=459=659=1029=1059=1118' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P100=306=363=557=702=912=960' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P100=306=363=557=866=912=960' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P100=308=376=594=940=958=1031' 'C:\winboard-F\TJchess\TJChess10x8' }
It looks OK to me. One caveat: the normalization (e.g. Pawn = 100) is not completely arbitrary, as the engine weights material against positional terms, and doubling all piece values would effectively scale down the importance of passers and King Safety. In addition, the engine also uses some heavily rounded 'quick' piece values internally, where B=N=3, R=5, A=C=8 and Q=9, to make a rough guess if certain branches stand any chance to recoup the material it gave earlier in the branche. So in certain situations, when it is behind 800 cP, it won't consider capturing a Rook, because it expects that to be worth about 500 cP, and thus falls 300 cP below the target. Such a large deficit would be beyond the safety margin for pruning the move. But if the piece values where scaled up such that the 800 merely represented being a Bishop behind, this obviously would be an unjustified pruning. The safety margin is large enough to allow some leeway here, but don't overdo it. It would be safest to keep the value of Q close to 950. I am indeed skeptical to the possibility to do enough games to measure the difference you want to see in the total score percentage. But perhaps some sound conclusions could be drawn by not merely looking at the result, but at the actual games, and single out the Q vs 2R trades. (Or actually any Rook versus other material trade before the end-game. Rooks capturing Pawns to prevent their promotion probably should not count, though.) These could then be used to separately extracting the probability for such a trade for the two sets of piece values, and determine the winning probability for each of the piece values once such a trade would have occurred. By filtering the raw data this way, we get rid of the stochastic noise produced by the (majority of) games whwre the event we want to determine the effect of would not have occurred.
As I moved to renormalize all of the values used in Joker80 (written into the 'winboard.ini' file) with the pawn at a par of 85 points, I looked at my notes again. They reminded me that your use of the 'bishop pair' refinement (with a bonus of 40 points) ramifies that the material value of the rook is either 1.00 pawns or 1.47 pawns greater than the material value of the bishop in CRC, depending upon whether or not only one bishop or both bishops, respectively, remain in the game. At that point, I realized that I would be attempting to playtest for a discrepancy that I know from experience is just too small to detect even at very long time controls. So, this planned test has been cancelled. I am not implying that this matter is unimportant, though. I remain concerned for the standard Muller model whenever it allows the exchange of its 2 rooks for 1 queen belonging to its opponent.
Well, I share that concern. But note that the low Rook value was not only based on the result of Q-2R assymetric testing. I also played R-BP and NN-RP, which ended unexpectedly bad for the Rook, and sets the value of the Rook compared to that of the minor pieces. While the value of the Queen was independently tested against that of the minor pieces by playing Q-BNN. The low difference between R and B does make sense to me now, as the wider board should upgrade the Bishop a lot more than the Rook. The Bishop gets extra forward moves, and forward moves are worth a lot more than lateral moves. I have seen that in testing cylindrical pieces, (indicated by *), where the periodic boundary condition w.r.t. the side edges effectifely simulates an infinitely wide board. In a context of normal Chess pieces, B* = B+P, while R* = R + 0.25P. OTOH, Q* = Q+2P. So it doesn't surprise me that on wider boards R loses compared to Q and B. I can think of several systematic errors that lead to unrealistically poor performance of the Rook in asymmetric playtesting from an opening position. One is that Capablanca Chess is a very violent game, where the three super-pieces are often involved in inflicting an early chekmate (or nearly so, where the opponent has to sacrifice so much material to prevent the mate, that he is lost anyway). The Rooks initially offer not much defense against that. But your chances for such an early victory would be strongly reduced if you were missing a super-piece. So perhaps two Rooks would do better against Q after A and C are traded. This explanation would do nothing for explaining poor Rook performance of R vs B, but perhaps it is B that is strong (it is also strong compared to N). The problem then would be not so much low R value, but high Q value, due to cooperativity between superpieces. So perhaps the observed scores should not be entirely interpreted as high base values for Q, C and A, but might be partly due to super-piece pair bonuses similar to that for the Bishop pair. Which I would then (mistakenly) include in the base value, as the other super-pieces are always present in my test positions. Another possible source of error is that the engine plays a strategy that is not well suited for playing 2R vs Q. Joker80's evaluation does not place a lot of importance to keeping all its pieces defended. In general this might be a winning strategy, giving the engine more freedom in using its pieces in daring attacks. But 2R vs Q might be a case where this backfires, and where you can only manifest the superiority of your Rook force by very careful and meticulous, nearly allergic defense of your troops, slowly but surely pushing them forward. This is not really the style of Joker's play. So it would be interesting to do the asymmetreic playtesting for Q vs 2R also with other engines. But TJchess10x8 only became available long after I started my piece value project, TSCP-G does not allow setting up positions (although now I know a work-around for that, forcing initial moves with both ArchBishops to capture all pieces to delete, and then retreating them before letting the engine play). And Smirf initially could not play automatically at all, and when I finally made a WB adapter for it so that it could, fast games by it where more decided by timing issues than by play quality (many losses on time with scores like +12!). And Fairy-Max is really a bit too simplistic for this, not knowing the concept of a Bishop pair or passed pawns, besides being a slower searcher.
Muller: Please have another look at this except from my 'winboard.ini' file. There are standard and special versions of piece values by Muller, Scharnagl & Nalls for the white and black players renormalized to pawn = 85 points. The special version of the Muller model has a rook value exactly 85 points or 1.00 pawn higher than the standard version. The special version of the Scharnagl model has an archbishop value (736 points) at appr. 95% of the archbishop value (775 points) instead of 597 points at appr. 77% for the standard version. The special version of the Nalls model is identical to the standard version until some test is needed and planned. Since I assume that the 'bishop pairs bonus' is hardwired into Joker80, 40 points has been subtracted from the model-independant, material values of the bishop under all three models. Is this correct? _____________________________________________________ /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P85=260=269=474=597=775=816' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P85=260=269=474=736=775=816' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P85=262=279=505=799=815=876' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P85=262=279=505=799=815=876' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P85=260=269=474=597=775=816' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P85=260=269=474=736=775=816' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P85=262=279=505=799=815=876' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P85=262=279=505=799=815=876' 'C:\winboard-F\TJchess\TJChess10x8' }
Is there any special reason you want to keep the Pawn value equal in all trial versions, rather than, say, the total value of the army, or the value of the Queen? Especially in the Scharnagl settings it makes almost every piece rather light compared to the quick guesses used for pruning. Note that there are so many positional modifiers on the value of a pawn (not only determined by its own position, but also by the relation to other friendly and enemy pawns) that I am not sure what the base value really means. Even if I say that it represents the value of a Pawn at g2, the evaluation points lost on deleting a pawn on g2 will depend on if there are pawns on e- and i-file, and how far they are advanced, and on the presence of pawns on the f- and h-file (which mighht become backward or isolated), and of course if losing the pawn would create a passer for the opponent. If I were you, I would normalize all models to Q=950, but then replace the pawn value everywhere by 85 (I think the standard value used in Joker is even 75). I don't think you could say then that you deviate from the model, as the models do not really specify which type of Pawn they use as a standard. My value refers to the g2 pawn in an opening setup. Perhaps Reinhard's value refers to an 'average' pawn, in a typical pawn chain occurring in the early middle game, or a Pawn on d4/e4 (which is the most likely to be traded). As to the B-pair: tricky question. The way you did it now would make the first Bishop to be traded of the value the model prescribes, but would make the second much lighter. If you would subtract half the bonus, then on the average they would be what the model prescribes. The value is indeed hard-wired in Joker, but if you really want, I could make it adjustable through a 8th parameter.
'If I were you, I would normalize all models to Q=950 but then replace the pawn value everywhere by 85.' Since this is what you (the developer of Joker80) recommend as optimum, this is what I will do. Are you sure that replacing any pawn values different than 85 points after renormalization to queen = 950 points still renders an accurate and complete representation, more or less, of the Scharnagl and Nalls models? At par of queen = 950 points, the pawn value in the Nalls model is not represented as being only 92.19% as high as that in the Muller model and the pawn value in the Scharnagl model is not represented as being only 98.95% as high as that in the Muller model. Thru it all ... If a perfect representation is not quite possible, I can accept that without reservation. __________________________________ 'I don't think you could say then that you deviate from the model as the models do not really specify which type of Pawn they use as a standard.' Correctly calculating pawn values at the start of the game (much less, throughout the game) requires finesse as it is indeed a complex issue. In fact, its excessively complexity is the reason my 66-page paper on material values of pieces is silent in the case of calculating pawn values in FRC & CRC. Instead, someone needs to read an entire book from an outside source about calculating the material values of the pieces in Chess to sufficiently understand it. Personally, I am content with the test situation as long as Joker80 handles all pawns under all three models initially valued at 85 points as fairly and equally as realistically possible. I cannot speak for Reinhard Scharnagl at all, though. ________________________________________________ 'The way you did it now would make the first Bishop to be traded of the value the model prescribes, but would make the second much lighter. If you would subtract half the bonus, then on the average they would be what the model prescribes.' Now, I understand better. It makes sense. [I am glad I asked you.] Yes, I will subtract 20 points (1/2 of the 'bishop pair bonus') from the model-independant, material values for the bishop under the Scharnagl & Nalls models.
Muller: Here is my latest revision to my 'winboard.ini' file. Are these piece values acceptable to you? Do you think these piece values will work smoothly with Joker80 running under Winboard F yet remain true to all three models? ______________________________________________________ /firstChessProgramNames={'C:\winboard-F\Joker80\w\M-st\w-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\w\M-sp\w-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\w\S-st\w-S-st 22 P85=302=339=551=694=902=950' 'C:\winboard-F\Joker80\w\S-sp\w-S-sp 22 P85=302=339=551=857=902=950' 'C:\winboard-F\Joker80\w\N-st\w-N-st 22 P85=284=326=548=866=884=950' 'C:\winboard-F\Joker80\w\N-sp\w-N-sp 22 P85=284=326=548=866=884=950' 'C:\winboard-F\TJchess\TJChess10x8' } /secondChessProgramNames={'C:\winboard-F\Joker80\b\M-st\b-M-st 22 P85=300=350=475=875=900=950' 'C:\winboard-F\Joker80\b\M-sp\b-M-sp 22 P85=300=350=560=875=900=950' 'C:\winboard-F\Joker80\b\S-st\b-S-st 22 P85=302=339=551=694=902=950' 'C:\winboard-F\Joker80\b\S-sp\b-S-sp 22 P85=302=339=551=857=902=950' 'C:\winboard-F\Joker80\b\N-st\b-N-st 22 P85=284=326=548=866=884=950' 'C:\winboard-F\Joker80\b\N-sp\b-N-sp 22 P85=284=326=548=866=884=950' 'C:\winboard-F\TJchess\TJChess10x8' }
100 comments displayed
Permalink to the exact comments currently displayed.