[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Latest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Chess with Different Armies. Betza's classic variant where white and black play with different sets of pieces. (Recognized!)[All Comments] [Add Comment or Rating]

Greg Strong wrote on Mon, May 6, 2019 10:10 PM UTC:

Wow. Great work! Very interesting.

H. G. Muller wrote on Mon, May 6, 2019 03:51 PM UTC:

End-games: light pieces

The table below gives an overview of some 5-men CwDA end-games, based on the statistics of generated End-Game Tables. I don't have a generator that can handle pieces with only 2-fold symmetry, but a special built of FairyGen can handle 4-fold symmetry, so I did include the Fibnif as only Nutters piece. CwDA armies consist of a super-piece worth 2.5-3 typical minors (such as Knights), and 3 pairs of 'light' pieces worth 1-1.5 minors. In FIDE the Rooks really stand out amongst the latter; in the other armies the pieces are closer in value, only 1 piece being of Knight strength, the other two lying somewhere in between Knight and Rook.

These pieces can be divided into majors and minors, depending on whether they are able to force checkmate onto a bare King. All light pieces of the Clobberers are minors, all of the light Rookies are majors. The Nutters have one minor, FIDE has two. Of all these minors, the Knight is the only one that cannot checkmate as a pair; for the Clobberers the heterogenous pair Bede + Fad cannot checkmate if they are on the same square shade. All other pairs of minors from the same army can force checkmate. Even all 'unnatural' pairs (which can in theory be obtained by promotion) can force checkmate, provided that pairs of color-bound pieces (Bede, Fad, Bishop) are on unlike shades.

The difference in strength between the light pieces is usually not enough to force a win in a 1-vs-1 situation. Somewhat exceptional are Rook vs WA (which would be a general win if it were not for the 50-move rule; as it is the win is cursed) and Rook vs Fibnif (where the result is unclear; a Fibnif is easily confined by a Rook, and in positions where it is separated from its King it can probably be chased to doom). Of course only the major pieces can hope for a win, in these situations.

Because of their closeness in value, I treated the light pieces as a single group, and generated all EGT of a natural pair versus a single one. Each army has 6 natural pairs, but for the Nutters I could only handle the pair of Fibnifs, so 19 pairs in total. I did not bother with a pair of Knights, as these cannot even win without opposition. I also did not bother with a pair of Rooks, as a pair of R4 could already beat any opponent. Each of the 17 remaining pairs was pitted against the 10 light piecs, 170 combinations in total. This gave the following result.

R = Rook
B = Bishop
N = kNight
D = BD      (beDe)
F = FAD     (Fad)
X = WA      (phoeniX)
S = R4      (Short rook)
H = HFD     (Half duck)
W = WD      (Woody rook)
N'= fhNbFbW (charging kNight)
R'= fsRbFbW (charging Rook)
I = FvN     (fIbnif)
K = non-royal King
Y = vRsN    (dragonflY)
O = BW      (dragon HOrse)

+  = general win
=  = general draw
~  = cursed general win
~? = half-cursed general win
+? = mostly won, but lots of fortress draws
?  = mixed win/draw
?~ = mixed, and about half the wins cursed
*  = already won without the second piece

       X   I   W   N   B   F   D   H   S   R   K   Y   O   N'  R'
XX     =   =   =   =   =   =   =   =   =   =   =   =   =   =   =
BN     =   =   =   =   =   =   =   =   =   =   =   =   =   =   =
FX     =   =   =   ~? ~/=  =   =   =   =   =   =   =   =   =   =
BB     +   =   =   ~?  =   =   =   =   =   =   =   ~   =   =   =
II     +   ~   =   +   =   =   =   =   =   =   =   =   =   =   =
DX     =   ~   =   +  +/= ~/=  =   =   =   =   =   =   =   =   =
YY     +   +   +   +   +   +   +?  +   =   =   +   =   =   +   =
WW     +   +   +   +   +   +   +   +   +   =   +   +   ?   +   ?
N'N'   +   +   +   +   +   +   +   +   +   ~?  +   +   ~?  +   +
N'I    +   +   +   +   +   +   +   ~   ~   =   +   +   =   +   =
RN     +   +   +   +   +   +   +   ~?  ~   =   +   +   ~   +   =
RB     +   +   +   +   +  +/+ +/+  ~   +   =   +   +   =   +   =
FF     +   +   +   +   +   +   +   +   +   =   ?   +   =   +   =
R'I    *   *   +   *   *   *   +   +   +   ~   +   +   +   +   ~? 
KY     +   +   +   +   +   +   +   +   +   =   +?  +?  ?   +   ?~
DF     +   +   +   +   +   +   +   +   +   +   +   +   =   +   ?
DD     +   +   +   +   +   +   +   +   +   +   +   +   =   +   +
KK     +   +   +   +   +   +   +   +   +   +   +   +   +   +   ?
HW     +   +   +   +   +   +   +   +   +   +   +   +   +   +   +
SW     +   +   +   +   +   +   +   +   +   +   +   +   +   +   +
HH     +   +   +   +   +   +   +   +   +   +   +   +   +   +   +
SH     +   +   +   +   +   +   +   +   +   +   +   +   +   +   +
SS     +   +   +   +   +   +   +   +   +   +   +   +   +   +   +
R'N'   *   *   +   *   *   *   +   +   +   +   +   +   +   +   +
R'R'   *   *   +   *   *   *   +   +   +   +   +   +   +   +   +
OY                                     +   +   +   +   +       +
OK                                     +               +       +

We see that the Bede and Fad, despite their lack of mating potential as an individual, form quite strong pairs. This is probably because they are able to drive an unprotected King to checkmate with checks, in a way reminiscent of the 'hand-over-hand' checking of a pair of Rooks. This makes it hard even for a Rook to harrass the pieces, threatening to trade and destroy the mating potential, which is the usual way in which pairs of minors fail to win. So in first approximation a pair wins if both members have mating potential (so that trading any of them will not rescue the defender), or if they are Bede/Fad pairs of unlike color, while other pairs of minors draw against any opposition.

The case major + minor only occurred in FIDE here, (R+B and R+N), as Clobberers have no majors, Rookies have no minors, and for Nutters I could not handle the majors. Because the major is relatively strong in FIDE, only a defending Rook can truly measure up to it; any other defender is so much weaker that adding even a 'standard minor' tips the balance. Against R4 or HFD, however, it takes too long to force the win, and the latter is cursed in almost all, or about half the cases. For Rook + Bishop vs Bede or Fad it doesn't matter if the defender is on like or unlike shade w.r.t. the Bishop.

Of the pairs of minors Bede + WA stands out: it in general beats a Knight, and a Bishop when it is on the same shade as the Bede. The win they in general have against a Fad on the Bede shade, or a Fibnif, is almost always cursed. They cannot beat a WA (which is probably the weakest defender in such end-games), but beating an equal piece is always more difficult, as you cannot attack it without offering it an opportunity to trade. The Bishop pair and a pair of Fibnifs (like a lone Rook) can beat the WA. The pair of Fibnifs is surprisingly strong: it can also beat a Knight. For the Bishop pair it takes so long to beat a Knight that the win is cursed more often than not. A win of two Fibnifs against one is very cursed (it takes on average 90 moves), but in view of the remark above it is amazing that it can force such a win at all. That the Fad is just a bit weaker than the Bede is also demonstrated by that the wins Bede + WA have against Bishop and Knight turn into cursed wins when the Bede is replaced by a Fad.

[Edit 14-5-2019] The Nutters and Dragons pieces were added to the table.

H. G. Muller wrote on Sun, May 5, 2019 12:50 PM UTC:

A pair of WA is a general win. The rule of tumb is that one of the minors must be able to move from c1 to a1 (or their symmetry equivalents) in three moves (more precisely, for divergent or asymmetric pieces an uncapture, a move and a capture). A WA can do that (c1-c2-c3-a1), and can thus inflict a corner mate (moving c2-c3) with its King on b3 after the other minor has driven the bare King with check from b1 to a1. Furthermore, edge mates can be forced when one minor can 'fork' a1 and c1 at the the same time, and the other minor can move from c1 to b1 in three moves. But that doesn't work if the forking piece has to be on b3 (as a Knight would have to be), where it would collide with the King.

As to the level of ambition: perhaps I should start indeed a bit more simple. The general scheme is to discount a pawnless advantage by a factor 2 even if it still is a win (except for known easy wins such as KQK and KRK), to properly reflect the relative difficulty of the win. But known general draws should be discounted much more, e.g. by a factor 8 if there still is some hope, or even 16 or infinite if it is a truly dead draw. (A factor 16 would even shrink the KNNK advantage to much less than a Pawn, and when that would still make it the best option the alternatives will almost certainly offer no hope for a win either.)

That leaves room for discounting end-games with a single Pawn by 4 times smaller factor, when the opponent can afford to sac a piece for that Pawn to leave the pawnless general draw. Such a sac typically increases the advantage from +1 to +3, but the relative factor 4 makes the latter +0.75, so the leading side would be biased against allowing the sac. The remaining discount factor (2 or 4) would still discourage converting to such end-games, e.g. by trading Pawns in KBNPPKBNP.

This scheme would need a table that specifies which non-Pawn material should be considered a dead or a general draw. The simplest version of this would just list single minors vs nothing: KBK and KNK in FIDE. But having some 4-men endings in there (like non-mating pairs of minors, such as KNNK, or 'exchange'-type advantages like KRKN) would not be too demanding either. These entries would already extend their influence to KNNPKB and KRPKNN, through the sac-rule. The really tedious part would be to add 5-men end-games such as the 'minor ahead' situations KBNKN, KRBKR, KQBKQ,... But I already generated a lot of those tables; I will summarize those results in another comment.

It could also be good to discount cases like unlike Bishops with a difference of up to 2 Pawns by a factor 2, but at the moment I have no idea how to generalize that. (E.g. it seems that end-games with unlike Ferzes are not particularly drawish.)

Greg Strong wrote on Sat, May 4, 2019 04:54 PM UTC:

Hard to say; perhaps 2 weeks if I would give it priority.

Ok, thanks. I was just trying to get an idea, not asking to make it priority. It'll probably take me a couple of weeks to get Quadrox ready. I'm going to start with FIDEs vs. Clobberers because that will be easiest. No asymetric pieces or range-limited sliders.

I'm glad you mentioned endgames - I was going to bring that up. At a minimum, I need to determine under which conditions the game should be terminated immediately because there isn't enough material for checkmate to be possible. (E.g., any number of BDs and FADs vs. a lone king if they are all on the same color.) But, yeah, like you I also want to identify those piece combinations where the game should not be terminated because mate is theoretically possible if the opponent walks into the corner but the evaluation function should return zero (e.g., king + fibnif vs. lone king.) Your Javascript checkmating app is really awesome and answers the question for single pieces. I'm glad you're going to work on determining the answer for multiple pieces. Can a king plus two WAs force checkmate? I doubt it but I don't really know and I have no experience with endgame database generation.

It sounds like you're being really ambitious though. Recognizing KBBPKBN as drawish is really advanced. Throw in all the fairy pieces from cwda and the number of permutations is out of sight...

H. G. Muller wrote on Thu, May 2, 2019 09:43 AM UTC:

Greg Strong:

Any guess when you think you'll have your new cwda engine ready for testing?

Hard to say; perhaps 2 weeks if I would give it priority. But the Tenjiku Shogi implementation in Jocly is still not finished, and already 2 months (of 4) have elapsed on its clock in the yearly Modern Tenjiku correspondence championship in which it is supposed to participate...

Main issue is that I want it to recognize drawishness through lack of mating potential, which would include strongly discounting the score in end-games like KBBPKBN, because of the almost undodgeable N-for-P sac leaving a KBBKB known general draw. This requires the knowledge of which Pawnless 5-men CwDA endings are general draws, which I must first aquire by generating EGT for those (with FairyGen). And there are rather many of those, especially if I want to keep open the possibility to test individual pieces out of their own context (i.e. dropping the requirement that the two pieces fighting on one side must belong to the same army, so that I can test, say, WD+R vs R to see if the (winning) advantage of a WD is preserved on adding equal pieces on each side). An additional complication is that the standard version of FairyGen counts on 8-fold symmetry, although I once made a compile that can handle 4-fold symmetric pieces. But even that would not be able to handle the Nutters pieces other than Fibnif.

Otherwise there are only minor issues; KingSlayer supported only 6 piece types (1-6, code 0 being reserved for empty squares), and I already added some initialization code to set their move tables to that of the various armies. I still want to allow use of code 7 as an extra piece type, which requires a small code change because originally I used the 7th entry in the array that counts the number of pieces that is present of each type to hold the 'game phase' (minors + 2*Rooks + Queens). So I must move that to a separate array. And I still have to fix a-side castling for the Clobberers.

Ben Reiniger wrote on Thu, Apr 11, 2019 09:19 PM UTC:

Neat!

Checkmating with the Dragon fly

Play with the Wyvern (The checkmating applet doesn't seem to like the jumping sideways rook component, putting the black king in check by that move.)

H. G. Muller wrote on Wed, Apr 10, 2019 03:22 PM UTC:

The Daring Dragons

I designed a new army, which in tests with Pair-o-Max scores about equal against FIDE. I named it the Daring Dragons.

promoChoice=WHLD graphicsDir=../membergraphics/MSelven-chess/ whitePrefix=w blackPrefix=b graphicsType=png symmetry=none midX=4 midY=3 lightShade=#BBBBBB startShade=#5555AA useMarkers=1 pawn::::a2-h2,,a7-h7 Dragon Fly:F:sNvR:chancellor:b1,g1,,b8,g8 Dragoon:D:KivmN:man:c1,f1,,c8,f8 Dragon Horse:H:BW:crownedbishop:a1,h1,,a8,h8 Wyvern:W:vNsjRB:dragon:d1,,d8 king::::e1,,e8

Interesting feature that sets it apart from other armies is a piece with an unusual (meta-)color binding, the Dragon Fly: this is bound to even or odd board files, along which it moves like a Rook. It can switch between files through a sideway Knight jump. (It is in fact half a Chancellor.) It is worth slightly less than a Bishop, and can often force checkmate on a bare King. The other light pieces is the Man / Commoner, but to facilitate its development (which would otherwise heavily compete with that of the Dragon Fly), it has some additional initial non-capture Knight jumps. It is called a Dragoon. (Dragoons are mounted infantry, using horses for mobility, but fighting on foot.) The Rook replacement is the Dragon Horse known from Shogi (moves as Bishop or one step orthogonally), worth slightly more than a Rook.

The super-piece (called Wyvern) is a somewhat weird construct; first I wanted it to be a Centaur (Knight-Man compound), but then the army proved too weak. Then I replaced the wide Knight moves of the Centaur by a sideway Rook slide, to also have the latter in the game. This makes it a compound of a Man and a 90-degree rotated Dragon Fly. But this was not really stronger than a Centaur; with either the army scored only 40% with black. A sideway Rook slide should be worth more than four Knight moves, but the Centaur already covered the first step of it, so it did not add enough. I also did not like its low speed in the vertical directions, which was unworthy of a super-piece. After some experimenting, a compound of a rotated Dragon Fly and a Bishop proved a little too strong (60% against FIDE), although not out of line with what the other CwDA armies do. A suitable way to weaken it to exactly match FIDE was to replace the sR slide by a ski-slide, skipping the first square on the ray (jumping any occupant if needed).

Ski-sliders are interesting anyway: on a near-empty board they are obviously inferior to the corresponding ordinary slider, as they lack the moves to the adjacent square. That the more distant moves cannot be blocked on that square is of no import if there is nothing around to block them. But on a crowded board, where slides almost always are blocked before they hit the board edge, the ski-slider will have the same number of moves as the normal slider, each target just being moved outward one step. Which should make them nearly equivalent. So ski-slider strength will depend in a different way on game phase as the other pieces, relatively decreasing towards the end-game.

Greg Strong wrote on Sun, Mar 31, 2019 03:46 PM UTC:

Hi, H.G. It's good to hear from you and to hear that you are working on another engine to help test these things! I got distracted on other things and never got around to following up. I have far too many different projects that attract my attention - usually chess variant stuff, but sometimes other things as well. I found this programming language for writing interactive fiction (think Zork) where source code reads like English called Inform7. I would not have thought it possible for a real programming language to be a subset of English. Wild stuff. But yeah, anyway, I get sidetracked a lot :)

First, I did complete the FIDEs vs. Nutters test with the FIDEs given added incentive to move forward through the PST. This helped a tiny bit, but not much at all:

Nutty Knights: 261
Fabulous FIDEs: 84
draw: 55

My next thought is to reduce the value of the knight and bishop when facing off against the Nutters. This will give the FIDEs a strong desire to trade off and the Nutters will have to limit their options to prevent that. Once the minor pieces are traded off I think the FIDEs are fine. I don't believe a charging rook is better than a normal one, although a colonel may be a little better than a queen.

I have recently switched back to trying to get the next version of ChessV out. I have several new features that are mostly done that just need to be closed out. (Of course, I don't always finish a feature before starting on the next ...) The most siginificant of these is that I have added a stand-alone ChessV CECP engine so it can be run without the GUI. This code is all written but almost completely untested. I admit I've been procrastinating on that. In the whole scope of this project, there is nothing that is less appealing to me than trying to plan/code/debug for inter-process communication. The other side of the coin - ChessV's ability to host other XBoard engines is not 100% bug-free either, although it is certainly good enough to be usable.

The material hash is something else I've added but am not making much use of yet. It is implemented as you describe, and will handle binding of any kind such as your even/odd file example. I think it was here I described the recursive algorithm I used to find all the different 'slices' of the board for any given piece. (I'm calling them slices rather than colors because colors becomes confusing when different pieces have different bindings - the knight in Alice Chess being a wacky example.) It will be interesting to see what scientific testing determines colorbinding bonuses/penalties should be for multiple color-bound pieces. Currently, ChessV starts discounting the value of pieces heavily starting with the second piece bound to a slice if there are no pieces on a complimentary slice.

Regarding enabling CwDA for inter-engine play, yes, I am definitely interested in figuring out how we can do this. I am certainly of the opinion that both our GUIs and all our engines should be as inter-operable as possible. I will post some thoughts about this shortly. (I'll start a new thread for it.)

H. G. Muller wrote on Sun, Mar 31, 2019 12:32 PM UTC:

@Greg

Any progress on this? I am contemplating to also return to piece-value measurements. Because I want to measure the more subtle effects, such as mating potential and pair bonuses, this will require a less course approach than Fairy-Max. I have started to extend the capabilities of my engine KingSlayer (originally released as 'Simple', until I was told that name was already taken), which I wrote a few years ago as a demo source code for orthodox Chess somewhat more advanced than TSCP, to also support fairy pieces. And in particular CwdA. So I changed the move generator to support limited-rage sliding/riding on a per-move basis. (For Chess it was done on a per-piece-type basis, and the range could only be 1 or infinite.)

As that engine only supports 6 piece types per side (which, with a little bit of work, could be expanded to 7), I implemented this by initializing the tables with piece properties it uses during play from a larger table that contains descriptions of all supported piece types. (So far the 16 piece types of the 4 classical CwdA armies.) For a particular game it then just picks up to 4 of these in addition to the always participating P and K. Unlike Fairy-Max, this engine has a dedicated check test (rather than just trying a null move and wait for a King capture), and this had to be extended too in order to handle the new moves. Basically it works by having a 15x15 'board' indexed by the relative distance, where for each step a bitmap indicates which piece type in principle could make such a step, where for sliding moves a contact threat is distinguised from a distant one (to easily see if you need to test for blocking). By making use of the fact that some pieces are compounds of others (like Q=R+B), and decomposig some pieces into 'primitives' to make even better use of that, the number of different primitives needed to support CwdA was 13, too large for the byte originally used for this purpose, but less than half a 32-bit integer, so that I can now even use separate bits for white and black attacks, eliminating the need to test the piece for being an enemy by other means. This type of check test would become more cumbersome with hoppers (where you don't only have direct and discovered checks, but also have to deal with 'activation' by interposition), and very awkward in the presence of bent sliders (like the Gryphon). So this engine will probably never support those kinds of moves. Divergent pieces would still be a realistic possibility, though.

Unlike Fairy-Max this engine does have an advanced Pawn-structure evaluation, (e.g. passer recognition), which is directly usable in CwdA, as that uses the same Pawns. It did keep track of the number of pieces of each type that are still present, and used this to award a Bishop pair bonus (if there were two), or discount the static evaluation score when mating potential gets into jeopary for lack of Pawns (i.e. with 1 Pawn or less). This will have to be substantially refined, though, as with multiple color-bound types cross bonuses are to be expected, and you cannot conclude from the piece counts alone whether you have a pair or not. Also drawish cases similar to 'unlike Bishops' cannot be recognized this way, which was already a weakness in regular Chess. So I plan to add a 'material hash', which uses a hash key that depends on the present material, but counts color-bound pieces of the same type but on different square shades as different. (This can be done through a Zobrist-like hashing scheme that doesn't assign a different key to a piece type for each board square, but just one for each 'meta-color' relevant for that type.) Which piece combinations will have mating potential will now depend on the army, and will thus require a more complex analysis, but if the results of that analysis are kept in a hash table, this will not impact engine speed.

BTW, other types of (meta-)color binding can be interesting as well. E.g. odd/even file binding, such as for vRsD, which does have substantial mating potential. (Although a fortress draw is possible when the bare King cannot be cut off from the safe edge. A vRsDD would even be better in preventing that.)

You mentioned lack of standardization presenting a problem for having XBoard engines playing CwdA in ChessV. What would be needed here? Now that I am making KingSlayer into a CwdA engine, it might be good to have a closer look at the specific problems, and try to find ways to remove those. At the moment I have KingSlayer report in the CECP variants feature that it supports variant 'fairy', and gave it engine-defined combo options 'White Army:" and "Black Army:" that can be used to select the flavors FIDE, Clobberers, Rookies or Nutters, and will determine what variant fairy means. But in addition to that I could allow setting of the default value of those options through arguments in the engine command (so that you would never have to bother setting the option).

Greg Strong wrote on Sat, Oct 13, 2018 09:34 PM UTC:

I have a bit of discomfort as the game did not had any lame leapers before but that borders on nothing. I'm more concerned how the change affect the balance against the two other armies. As this seems to me that will lead to a wave of interconnected changes that are probably not easy to pull through. Some sort of logical system of equations needs maintaining and I honestly doubt such and endevour is even doable, little to say about feasible. This because you don't have many options for tunning while keeping the initial flavour on

This is a valid concern, but I'm hoping this does not become a problem. And a tiny bit of rock-paper-scisors effect is acceptable so long as things are balanced against the FIDEs. Obviously, the FIDEs are the one army that cannot be modified. For an example of a board game that has significant R-P-S effect but is still an awesome game, see tournament Star Fleet Battles. I should say this as I was probably not clear - I am NOT proposing making this change until testing of all combinations is complete, along with some testing of evaluation terms changes... This is just what I'm leaning towards given what we know so far.

It is a pity that test takes so long (a common problem in computer chess...)

Indeed it is, but I can scale up quite a bit. I actually have quite a few i5 and i7 PCs that can be pressed into service to do testing (6 or 7 of them.) The longest part, which is largely manual, is calculating out all the starting positions so I can feel very confident that my tests aren't playing the same games over and over. But when this is accomplished I can scale up testing quickly. I have just finished generating 20 positions of FF vs RR and am just starting on those with the colors reversed.

I suppose that the new ChessV is stronger than Fairy-Max? Have you ever measured by how much?

My current builds are definitely stronger than Fairy-Max, at least at the various 10x8 variants, but I have not done formal measurements. I intend to test that with my new "batch mode" capability also, but I've been focused on CwDA tests instead :) ChessV will control XBoard protocol engines for many games, but CwDA is not one of them because it would require more standards than presently exist. I should also mention that Fairy-Max is an absolute speed demon, in terms of nodes-per-second, compared to ChessV at approximately 4x the nodes. ChessV's strength comes from smarter search (using ideas stolen from Stockfish and other GPL engines - I take absolutely no credit for this) and better evaluation.

What TC are you using for these tests?

The 400-game sets use different time controls as one way to get more varied results. They also modify the new Variation setting from None (which is completely deterministic) to Small (for most games) to Medium (for a few games.) The fastest time controls I'm using are 25 sec + 2 sec/move. The longest are 5 minutes + 1 sec/move. Typically a 400-game set on one computer takes about 2 days. I will post a new (unofficial) version here shortly along with all my opening positions and batch mode control files so everyone can see exactly what I'm doing and run tests of their own.

Regarding the NN vs FF test with the FIDEs given more encouragement to advance through the PSTs, the test is half done. The 200 games where the NNs are white and the FFs are black are done. The Nutters won 136, the FIDEs won 36, with 28 draws. So it doesn't look like this is making the situation any better although these are all games where the nutters have the first move. Tomorrow we should know the final results.

Aurelian Florea wrote on Sat, Oct 13, 2018 07:19 AM UTC:

@HG&@Greg

We have discussed the matter of possible rock-paper-scizors effects with negative conclusions so maybe my idea involving musketeer chess gating was an overreaction, but maybe may be kept in the back of the mind if such problems arise. Good luck everybody :)!

H. G. Muller wrote on Sat, Oct 13, 2018 06:49 AM UTC:

It is clear that the change BD -> BnD should weaken the Clobberers against any opponent, which would be undesirable against opponents that already have the upper hand. But usually such opponents would also be stronger than FIDE, and would also have to be weakened. My earlier testing with Fairy-Max suggested that the performance of armies was reasonably 'transitive', in the sense that when A > B, and B > C, then A > C by an amount approximately equal to the sum of the first two. The only anomaly was that the Nutters under-performed against the Clobberers. I conjectured that this could again be a strategic issue, namely that the more forward-directed strategy and slow backwardness of the Nutters backfires when the army has pairs of pieces (or single pieces) that can easily checkmate a King.

It is a pity that test takes so long (a common problem in computer chess...). I suppose that the new ChessV is stronger than Fairy-Max? Have you ever measured by how much? What TC are you using for these tests? Have you tried how far you can push that, without significantly affecting the result? Large depth is only needed to bring the eventual tactical punishment of strategically bad moves within the horizon, so a more advanced evaluation (e.g. for Pawn structure and King Safety) should allow faster games without play becoming so unrealistic that it is no longer a representative sampling of the pieces their tactical abilities. People are nowadays tuning their engine's evaluation at ~0.25 sec/move (e.g. 10 sec + 0.1 sec/move). It would surely save a lot of time if that would work for piece-strength measurements too.

The danger is that playing at a lower level reduces all 'excess scores', even though the ratio of these scores keep constant (so that you get the same value in terms of centi-Pawn when you divide them by the Pawn-odds score). For a twice-lower Pawn-odds score you would need 4 times as many games to get the same resolution in centi-Pawns. So I suppose there will be an optimum there. Too high a quality of play is also not good. btw; you want the typical evaluation lost per move compared to prefect play to be so large that over the duration of a game it typically accumulates to a range wider than the draw interval (say [-150cP, +150cP]), so that small departures from equality in the initial imbalance already significantly sample the won/lost range.

Aurelian Florea wrote on Sat, Oct 13, 2018 03:19 AM UTC:

@Greg,

First, I'm on the tip of my toes about your next trial with conditions adapted to HG's observation.

"Again, I don't think changing BD to BnD changes the flavor or removes any spice. Do you? "

I have a bit of discomfort as the game did not had any lame leapers before but that borders on nothing. I'm more concerned how the change affect the balance against the two other armies. As this seems to me that will lead to a wave of interconnected changes that are probably not easy to pull through. Some sort of logical system of equations needs maintaining and I honestly doubt such and endevour is even doable, little to say about feasible. This because you don't have many options for tunning while keeping the initial flavour on

But I'm very much for any CWDA game. It is just that a sequel to Betza's game should borrow off his elements otherwise it is another chess with different armies game. A better one quite likelly.

"On this we must disagree[about the game not needing rescuing]. Sure, it is playable. It is one of the most popular games on Game Courier so certainly people can play it and have fun. But if the armies are way out of balance, as it has become clear that they are, then it fails at its stated goal. If the game were played and studied even more as time goes on, people would learn exactly how to exploit the unbalance and the game would no longer be playable. "

The game is good enough at my level. It is probably good enough at any current human level (although this could be a stretch) but there is always the quest for even better (I am an engineer after all). And the endeavor of making another game sequel or not is great. I'd venture the idea we may need to make a distinction about it, but if we don't make it other future people will surely do, if it's the case, so much bothering could not be needed here either.

What I was actually insisting about it was that maybe my musketeer technique is and easier goal to achieve without sacrificing any design principles(besides making the board more crowded which is something I actually like, even if 36 pieces on an 8x8 tends to be too much even for me). But we can easily go on our merry way if this back and forth can't advance in an useful way and maybe History will decide. Or not, as currently chess variants don't seem to catch on! The space of possible chess variants is so vast that there is more than enough room for all of us. I remember you actually agreeing to help, so that is cool. So it is a math debate actually: the way I like it :)!

Greg Strong wrote on Fri, Oct 12, 2018 11:45 PM UTC:

It occurred to me that the Nutters are unique amongst Betza's armies in their forward-backward asymmetry. I wonder if this could have an unexpected effect on the outcome of self-play games of engines with an evaluation that is not highly tuned. In a random mover Nutter pieces would tend to diffuse forward. Perhaps this makes the nutters a bit more aggressive than the others, which would benefit them if the others are not aggressive enough. Perhaps the others would benefit from a piece-square table with a larger forward-gradient, while the Nutters automatically play like they have one.

Good observations as always. ChessV has a more sophisticated evaluation than FairyMax but it is certainly not "highly tuned." I can definitely re-run the FF vs. NN test with the forwardness component of the FIDE's PST increased. I'll kick that off and see how much it affects the results. The test will take a few days to complete...

Greg Strong wrote on Fri, Oct 12, 2018 07:28 PM UTC:

And balance is the primary goal but to me the flavor is what bring the spice :)!

Again, I don't think changing BD to BnD changes the flavor or removes any spice. Do you? You seem clearly opposed to this change, but I do not understand why.

I don't say CWDA needs rescuing it is a good game.

On this we must disagree. Sure, it is playable. It is one of the most popular games on Game Courier so certainly people can play it and have fun. But if the armies are way out of balance, as it has become clear that they are, then it fails at its stated goal. If the game were played and studied even more as time goes on, people would learn exactly how to exploit the unbalance and the game would no longer be playable.

Aurelian Florea wrote on Fri, Oct 12, 2018 07:14 PM UTC:

@HG,

In CWDA army tunning is most definetly a thing for any AI, epeacially in the context of flavor I was discussing with Greg earlier. In machine learning that should come rather easy but unfortuneatly I have not god that far. In the end the army is just another variable (be it some multidimensional properties). What I mean is that it should not be more difficult than any other desing of such algorithms.

Aurelian Florea wrote on Fri, Oct 12, 2018 07:05 PM UTC:

@Greg,

We can very much leave Betza's game as is and invent a improved version ourselves. There is nothing wrong with that. And balance is the primary goal but to me the flavor is what bring the spice :)!

" If you want to make such a game, I would encourage it and I would try to help if you wanted, but I don't see this as a valid approach to rescuing CwDA. "

I don't say CWDA needs rescuing it is a good game. But I also see it as a good lesson, we could use. The musketeer chess approach is meant to offer a way to balance the imbalances in a specific way to each match, because yes it is about armies and not the individual pieces but there is that old libertarian saying that society is made out of individuals which I think goes well here. A pair of minors or a rooklike and a bishoplike piece would at least open more doors which is hardly done otherwise, as far as I can see!

H. G. Muller wrote on Fri, Oct 12, 2018 06:18 PM UTC:

It occurred to me that the Nutters are unique amongst Betza's armies in their forward-backward asymmetry. I wonder if this could have an unexpected effect on the outcome of self-play games of engines with an evaluation that is not highly tuned. In a random mover Nutter pieces would tend to diffuse forward. Perhaps this makes the nutters a bit more aggressive than the others, which would benefit them if the others are not aggressive enough. Perhaps the others would benefit from a piece-square table with a larger forward-gradient, while the Nutters automatically play like they have one.

On two occasions I noticed issues that could be related. In Fairy-Max white seems to play better than black, even when I average out the first-move advantage by having black start in half the games. This must be due to the direction the board is scanned during move generation; for white this typically first encounters the Pawns, for black the pieces. So if a Pawn move and a piece move have equal score, white would likely play the Pawn move, black the piece move. As Pawn moves are always forward, this makes white play more aggressively.

The second case was when I was measuring the value of KNAD. I was not sure whether it would be good to give a bonus for centralizing such a valuable piece, so I did the measurement both with a neutral PST and a centralizing PST for the KNAD. In the latter case the KNAD cae out about 1 Pawn more valuable! Normally misconceptions on the evaluation (such as the piece value) hardly affect the outcome of such measurements, as long as both players share the misconception. But not in this case. Without an incentive to centralize the side with the KNAD too often left it unused, in a place where the profitable things it could do stayed beyond the horizon.So strategic errors only one side can make (because of the imbalance) can affect the outcome.

Greg Strong wrote on Fri, Oct 12, 2018 05:13 PM UTC:

There could be a solution but first remember the the state space of the possible solutions is linked to the choosing of the pieces out of a small possible set, is it is probably non-neglijable likely to plainly not be able to succeed as the demands ar pretty tight.

I agree that absolutely perfect balance between all combinations of armies could be very difficult, but I also think it's not necessary. Even if they are not balanced enough for computer vs. computer matches to come out exactly even, so long as the goal is to make a game good for humans I think we absolutely can get sufficiently balanced armies.

My take from cwda is not about balance but aboutsomething i'd call "dinamic balance" as each army seems to "mean" something.

It is unfortunate that Betza hasn't been heard from in nearly 15 years now and may not even be alive. But he has written a lot of content on this site about piece values, his struggles to determine them, and his goals for Chess with Different Armies, and his previous failed attempt at it. I believe we know enough from these writings to feel confident that an even balance between armies was THE primary goal and, if he were here, he would be continuing to work toward it.

Yes, each army does have a unique "flavor" that absolutely should be preserved to the maximum extent possible. But making the BD's leap a lame leap is a very, very tiny change that doesn't change the flavor at all, at least in my opinion. I can't really see an argument against this change unless one believes that it is Betza's game and only he can update it and, consequently, if he's dead we are stuck with it forever.

The fact is we have learned a lot since this game was made and Betza was unfortunately wrong about some things. The Archbishop is worth a lot more than he thought as just one example. If he had known what we know now, he would have made different decisions. There's a page here somewhere where he talks about the Short Rook and trying to decide what the range should be and how he used computer vs. computer test matches to help validate the decisions exactly as we are doing.

The Musketeer Chess approach is problematic. For one thing, you are taking about a radical change that makes a completely different game. You no longer have armies with themes that "mean" something as you put it. And, we have determined that the strength of an army depends heavily on the specific combination of pieces, not just the individual pieces. If you want to make such a game, I would encourage it and I would try to help if you wanted, but I don't see this as a valid approach to rescuing CwDA.

Aurelian Florea wrote on Fri, Oct 12, 2018 02:45 PM UTC:

I thought a bit about Greg's proposal of weakening the charging rook (and his earlier proposal of weakening the Bede). I personally see big flaws with such a approach as the state space of the problem has at least 4 dimensions (16 if you consider playing white or black different things). There could be a solution but first remember the the state space of the possible solutions is linked to the choosing of the pieces out of a small possible set, is it is probably non-neglijable likely to plainly not be able to succeed as the demands ar pretty tight. My proposal for getting out of the impasse is to combine the CWDA with musketeer chess. But instead of offering many options we may give a set of gating pieces for each of the 16 encounters (let's include FFvsFF here as they could receive slightly different pieces in order to compensate for playing white.). They can be just one piece of a general value of approximately 2 or 3 or 4 or 5 or maybe pairs of the same or different pieces. Pairs of approximately 2.5 pieces seems quite interesting to me, as 2 of them worth exactly a rook and for one of them you may capture a regular minor and give up some positional or capture 2 pawns and earn some minor positional bonus.

For example in the FFvsFF encounter which in regular Betza is banned I think white should be able to gate two ffbbNsD and black should be able to gate two ffNsDbbLbH. Maybe the second piece is actaully worse but at first glance more jumping retreats should be better, be them longer. They also add to versatility especially in the endgame. Such pieces should worth around 6.5/8 knights=0.8125 knights=0.8125*3.25 pawns=2.640625 pawns=2.65 pawns, so pretty good.

Another reason Betza's implied (and indeed not stated) principle of armies with different styles should be preserved. The gating piece would probably be counter style, though in order to compensate for the misshapen of that particular matchup.

Aurelian Florea wrote on Fri, Oct 12, 2018 08:56 AM UTC:

Greg,to be honest,i'm not sure if we should plunge ourselves into piece change judgemets. It is, most likelly, more complex than just this experiment. Also the game needs to be fun. My take from cwda is not about balance but aboutsomething i'd call "dinamic balance" as each army seems to "mean" something. I'm preparing a small experiment on this, also!... And maybe a more interesting rook could be along the lines of fsR4bWbB2

Greg Strong wrote on Thu, Oct 11, 2018 11:06 PM UTC:

I have some more results to report.

I've generated 20 balanced opening positions with the FIDEs vs. the Nutters and another 20 with the colors reversed and run the 400-game test. Here are the results of Nutters against the FIDEs:

Nutty Knights: 272
Fabulous FIDEs: 79
draw: 49

Holy crap!!! That is not at all what I expected. I don't really understand why the Nutters are so dominant, given that their total piece values seem to be about the same. Our piece values could certainly be wrong, of course. But I don't think they are that far off - at least in terms of what a piece is worth in general. In which case, it shows that the true value of a piece really, really matters what else is on the board. I'm guessing they can develop very quickly and very flexibly and get early advantage.

How to fix is a hard question. I've thought about this some and considered a few ideas. The one that "feels" best to me is limiting the range of the Charging Rooks to 4. Essentially, this means that instead of the Charging Rooks being regular Rooks that move backwards as a King, they become Short Rooks that move backwards as a King. I will test this, but I'm certainly open to other thoughts.

Speaking of fixes, I've re-run the FIDEs vs. Clobberers test with the suggested fix - change BD to BnD. Here are the results:

Colorbound Clobberers: 180
Fabulous FIDEs: 156
draw: 64

Much better, and probably sufficient for now. Given that we don't know what a lot of evaluation terms should be, the accuracy of these results is limited and this result is probably within the "margin of error" (acknowledging that I am not using that term in the same way that statisticians do.) With this change, I would consider this matchup balanced for all practical purposes.

H.G., I saw your question about what the results would be in pawn odds games. I don't know but I'll work on running that test also.

H. G. Muller wrote on Fri, Oct 5, 2018 05:47 PM UTC:

Basically this is just a scaled version of the 3.25/3.25/5.00/9.50 values. Except that the Pawn was weakened by 5%.

But a Pawn is the most variable piece of all; it is really very ill-defined what an advantage of 1 Pawn means. Rook Pawns, Pawns on central files, doubled Pawns, passers, 7th-rank passers... These all have completely different values, with as much as a factor 5 between them. For this reason I always use the Queen as calibration standard.

Kevin Pacey wrote on Fri, Oct 5, 2018 05:37 PM UTC:

Below is a sub-wiki that quotes many valuations for the chess piece types; I'm wondering why Kaufman in a book of his published in 2011 apparently changed his valuations to make them nearly identical to what the Dutch world chess champion Euwe gave them (notably single N=B=3.5 and Q=10, though unlike Euwe he has R=5.25 instead of 5.5), which is about what I'd use (I'd put a N at e.g. 3.49, as if to be 'precise', and use Euwe's R=5.5):

https://en.wikipedia.org/wiki/Chess_piece_relative_value#Alternative_valuations

H. G. Muller wrote on Thu, Sep 27, 2018 09:53 PM UTC:

I think this is where Betza's 'leveling effect' comes in. You can use a piece in two ways: (1) avoid trades for a nearly equivalent opponent piece; (2) don't care about such trading. In the trade-avoiding strategy (1), the opponent's counterpart will interdict access to the squares it attacks, as going there would give him the opportunity to trade. This limits the use you can make of the piece, thus depressing its effective value. In general, stronger pieces lose value due to the presence of opponent weaker pieces that they have to avoid 1-for-1 trading with.

If the value was close to start with, the value depreciation caused by adopting a trade-avoiding strategy can be larger than the intrinsic difference. In that case you would be better of using strategy (2). But there the fate of the piece is to be traded, which makes them effectively equal in value, as any difference will evaporate with the trade. So pieces nearly equal in value will see their value pulled towards each other when they oppose each other, until it gets exactly the same. I think this is pretty much the case for a Knight and a lone Bishop on 8x8. If the intrinsic value of the Bishop was somehow increased compared to the Knight, initially you would not benefit from it. Because you would have to 'sacrifice' that extra intrinsic value by limiting the use of the Bishop by stricter trade-avoiding.

25 comments displayed

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Permalink to the exact comments currently displayed.