[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Latest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Chess with Different Armies. Betza's classic variant where white and black play with different sets of pieces. (Recognized!)[All Comments] [Add Comment or Rating]

Aurelian Florea wrote on Sat, Oct 13, 2018 07:19 AM UTC:

@HG&@Greg

We have discussed the matter of possible rock-paper-scizors effects with negative conclusions so maybe my idea involving musketeer chess gating was an overreaction, but maybe may be kept in the back of the mind if such problems arise. Good luck everybody :)!

H. G. Muller wrote on Sat, Oct 13, 2018 06:49 AM UTC:

It is clear that the change BD -> BnD should weaken the Clobberers against any opponent, which would be undesirable against opponents that already have the upper hand. But usually such opponents would also be stronger than FIDE, and would also have to be weakened. My earlier testing with Fairy-Max suggested that the performance of armies was reasonably 'transitive', in the sense that when A > B, and B > C, then A > C by an amount approximately equal to the sum of the first two. The only anomaly was that the Nutters under-performed against the Clobberers. I conjectured that this could again be a strategic issue, namely that the more forward-directed strategy and slow backwardness of the Nutters backfires when the army has pairs of pieces (or single pieces) that can easily checkmate a King.

It is a pity that test takes so long (a common problem in computer chess...). I suppose that the new ChessV is stronger than Fairy-Max? Have you ever measured by how much? What TC are you using for these tests? Have you tried how far you can push that, without significantly affecting the result? Large depth is only needed to bring the eventual tactical punishment of strategically bad moves within the horizon, so a more advanced evaluation (e.g. for Pawn structure and King Safety) should allow faster games without play becoming so unrealistic that it is no longer a representative sampling of the pieces their tactical abilities. People are nowadays tuning their engine's evaluation at ~0.25 sec/move (e.g. 10 sec + 0.1 sec/move). It would surely save a lot of time if that would work for piece-strength measurements too.

The danger is that playing at a lower level reduces all 'excess scores', even though the ratio of these scores keep constant (so that you get the same value in terms of centi-Pawn when you divide them by the Pawn-odds score). For a twice-lower Pawn-odds score you would need 4 times as many games to get the same resolution in centi-Pawns. So I suppose there will be an optimum there. Too high a quality of play is also not good. btw; you want the typical evaluation lost per move compared to prefect play to be so large that over the duration of a game it typically accumulates to a range wider than the draw interval (say [-150cP, +150cP]), so that small departures from equality in the initial imbalance already significantly sample the won/lost range.

Aurelian Florea wrote on Sat, Oct 13, 2018 03:19 AM UTC:

@Greg,

First, I'm on the tip of my toes about your next trial with conditions adapted to HG's observation.

"Again, I don't think changing BD to BnD changes the flavor or removes any spice. Do you? "

I have a bit of discomfort as the game did not had any lame leapers before but that borders on nothing. I'm more concerned how the change affect the balance against the two other armies. As this seems to me that will lead to a wave of interconnected changes that are probably not easy to pull through. Some sort of logical system of equations needs maintaining and I honestly doubt such and endevour is even doable, little to say about feasible. This because you don't have many options for tunning while keeping the initial flavour on

But I'm very much for any CWDA game. It is just that a sequel to Betza's game should borrow off his elements otherwise it is another chess with different armies game. A better one quite likelly.

"On this we must disagree[about the game not needing rescuing]. Sure, it is playable. It is one of the most popular games on Game Courier so certainly people can play it and have fun. But if the armies are way out of balance, as it has become clear that they are, then it fails at its stated goal. If the game were played and studied even more as time goes on, people would learn exactly how to exploit the unbalance and the game would no longer be playable. "

The game is good enough at my level. It is probably good enough at any current human level (although this could be a stretch) but there is always the quest for even better (I am an engineer after all). And the endeavor of making another game sequel or not is great. I'd venture the idea we may need to make a distinction about it, but if we don't make it other future people will surely do, if it's the case, so much bothering could not be needed here either.

What I was actually insisting about it was that maybe my musketeer technique is and easier goal to achieve without sacrificing any design principles(besides making the board more crowded which is something I actually like, even if 36 pieces on an 8x8 tends to be too much even for me). But we can easily go on our merry way if this back and forth can't advance in an useful way and maybe History will decide. Or not, as currently chess variants don't seem to catch on! The space of possible chess variants is so vast that there is more than enough room for all of us. I remember you actually agreeing to help, so that is cool. So it is a math debate actually: the way I like it :)!

Greg Strong wrote on Fri, Oct 12, 2018 11:45 PM UTC:

It occurred to me that the Nutters are unique amongst Betza's armies in their forward-backward asymmetry. I wonder if this could have an unexpected effect on the outcome of self-play games of engines with an evaluation that is not highly tuned. In a random mover Nutter pieces would tend to diffuse forward. Perhaps this makes the nutters a bit more aggressive than the others, which would benefit them if the others are not aggressive enough. Perhaps the others would benefit from a piece-square table with a larger forward-gradient, while the Nutters automatically play like they have one.

Good observations as always. ChessV has a more sophisticated evaluation than FairyMax but it is certainly not "highly tuned." I can definitely re-run the FF vs. NN test with the forwardness component of the FIDE's PST increased. I'll kick that off and see how much it affects the results. The test will take a few days to complete...

Greg Strong wrote on Fri, Oct 12, 2018 07:28 PM UTC:

And balance is the primary goal but to me the flavor is what bring the spice :)!

Again, I don't think changing BD to BnD changes the flavor or removes any spice. Do you? You seem clearly opposed to this change, but I do not understand why.

I don't say CWDA needs rescuing it is a good game.

On this we must disagree. Sure, it is playable. It is one of the most popular games on Game Courier so certainly people can play it and have fun. But if the armies are way out of balance, as it has become clear that they are, then it fails at its stated goal. If the game were played and studied even more as time goes on, people would learn exactly how to exploit the unbalance and the game would no longer be playable.

Aurelian Florea wrote on Fri, Oct 12, 2018 07:14 PM UTC:

@HG,

In CWDA army tunning is most definetly a thing for any AI, epeacially in the context of flavor I was discussing with Greg earlier. In machine learning that should come rather easy but unfortuneatly I have not god that far. In the end the army is just another variable (be it some multidimensional properties). What I mean is that it should not be more difficult than any other desing of such algorithms.

Aurelian Florea wrote on Fri, Oct 12, 2018 07:05 PM UTC:

@Greg,

We can very much leave Betza's game as is and invent a improved version ourselves. There is nothing wrong with that. And balance is the primary goal but to me the flavor is what bring the spice :)!

" If you want to make such a game, I would encourage it and I would try to help if you wanted, but I don't see this as a valid approach to rescuing CwDA. "

I don't say CWDA needs rescuing it is a good game. But I also see it as a good lesson, we could use. The musketeer chess approach is meant to offer a way to balance the imbalances in a specific way to each match, because yes it is about armies and not the individual pieces but there is that old libertarian saying that society is made out of individuals which I think goes well here. A pair of minors or a rooklike and a bishoplike piece would at least open more doors which is hardly done otherwise, as far as I can see!

H. G. Muller wrote on Fri, Oct 12, 2018 06:18 PM UTC:

It occurred to me that the Nutters are unique amongst Betza's armies in their forward-backward asymmetry. I wonder if this could have an unexpected effect on the outcome of self-play games of engines with an evaluation that is not highly tuned. In a random mover Nutter pieces would tend to diffuse forward. Perhaps this makes the nutters a bit more aggressive than the others, which would benefit them if the others are not aggressive enough. Perhaps the others would benefit from a piece-square table with a larger forward-gradient, while the Nutters automatically play like they have one.

On two occasions I noticed issues that could be related. In Fairy-Max white seems to play better than black, even when I average out the first-move advantage by having black start in half the games. This must be due to the direction the board is scanned during move generation; for white this typically first encounters the Pawns, for black the pieces. So if a Pawn move and a piece move have equal score, white would likely play the Pawn move, black the piece move. As Pawn moves are always forward, this makes white play more aggressively.

The second case was when I was measuring the value of KNAD. I was not sure whether it would be good to give a bonus for centralizing such a valuable piece, so I did the measurement both with a neutral PST and a centralizing PST for the KNAD. In the latter case the KNAD cae out about 1 Pawn more valuable! Normally misconceptions on the evaluation (such as the piece value) hardly affect the outcome of such measurements, as long as both players share the misconception. But not in this case. Without an incentive to centralize the side with the KNAD too often left it unused, in a place where the profitable things it could do stayed beyond the horizon.So strategic errors only one side can make (because of the imbalance) can affect the outcome.

Greg Strong wrote on Fri, Oct 12, 2018 05:13 PM UTC:

There could be a solution but first remember the the state space of the possible solutions is linked to the choosing of the pieces out of a small possible set, is it is probably non-neglijable likely to plainly not be able to succeed as the demands ar pretty tight.

I agree that absolutely perfect balance between all combinations of armies could be very difficult, but I also think it's not necessary. Even if they are not balanced enough for computer vs. computer matches to come out exactly even, so long as the goal is to make a game good for humans I think we absolutely can get sufficiently balanced armies.

My take from cwda is not about balance but aboutsomething i'd call "dinamic balance" as each army seems to "mean" something.

It is unfortunate that Betza hasn't been heard from in nearly 15 years now and may not even be alive. But he has written a lot of content on this site about piece values, his struggles to determine them, and his goals for Chess with Different Armies, and his previous failed attempt at it. I believe we know enough from these writings to feel confident that an even balance between armies was THE primary goal and, if he were here, he would be continuing to work toward it.

Yes, each army does have a unique "flavor" that absolutely should be preserved to the maximum extent possible. But making the BD's leap a lame leap is a very, very tiny change that doesn't change the flavor at all, at least in my opinion. I can't really see an argument against this change unless one believes that it is Betza's game and only he can update it and, consequently, if he's dead we are stuck with it forever.

The fact is we have learned a lot since this game was made and Betza was unfortunately wrong about some things. The Archbishop is worth a lot more than he thought as just one example. If he had known what we know now, he would have made different decisions. There's a page here somewhere where he talks about the Short Rook and trying to decide what the range should be and how he used computer vs. computer test matches to help validate the decisions exactly as we are doing.

The Musketeer Chess approach is problematic. For one thing, you are taking about a radical change that makes a completely different game. You no longer have armies with themes that "mean" something as you put it. And, we have determined that the strength of an army depends heavily on the specific combination of pieces, not just the individual pieces. If you want to make such a game, I would encourage it and I would try to help if you wanted, but I don't see this as a valid approach to rescuing CwDA.

Aurelian Florea wrote on Fri, Oct 12, 2018 02:45 PM UTC:

I thought a bit about Greg's proposal of weakening the charging rook (and his earlier proposal of weakening the Bede). I personally see big flaws with such a approach as the state space of the problem has at least 4 dimensions (16 if you consider playing white or black different things). There could be a solution but first remember the the state space of the possible solutions is linked to the choosing of the pieces out of a small possible set, is it is probably non-neglijable likely to plainly not be able to succeed as the demands ar pretty tight. My proposal for getting out of the impasse is to combine the CWDA with musketeer chess. But instead of offering many options we may give a set of gating pieces for each of the 16 encounters (let's include FFvsFF here as they could receive slightly different pieces in order to compensate for playing white.). They can be just one piece of a general value of approximately 2 or 3 or 4 or 5 or maybe pairs of the same or different pieces. Pairs of approximately 2.5 pieces seems quite interesting to me, as 2 of them worth exactly a rook and for one of them you may capture a regular minor and give up some positional or capture 2 pawns and earn some minor positional bonus.

For example in the FFvsFF encounter which in regular Betza is banned I think white should be able to gate two ffbbNsD and black should be able to gate two ffNsDbbLbH. Maybe the second piece is actaully worse but at first glance more jumping retreats should be better, be them longer. They also add to versatility especially in the endgame. Such pieces should worth around 6.5/8 knights=0.8125 knights=0.8125*3.25 pawns=2.640625 pawns=2.65 pawns, so pretty good.

Another reason Betza's implied (and indeed not stated) principle of armies with different styles should be preserved. The gating piece would probably be counter style, though in order to compensate for the misshapen of that particular matchup.

Aurelian Florea wrote on Fri, Oct 12, 2018 08:56 AM UTC:

Greg,to be honest,i'm not sure if we should plunge ourselves into piece change judgemets. It is, most likelly, more complex than just this experiment. Also the game needs to be fun. My take from cwda is not about balance but aboutsomething i'd call "dinamic balance" as each army seems to "mean" something. I'm preparing a small experiment on this, also!... And maybe a more interesting rook could be along the lines of fsR4bWbB2

Greg Strong wrote on Thu, Oct 11, 2018 11:06 PM UTC:

I have some more results to report.

I've generated 20 balanced opening positions with the FIDEs vs. the Nutters and another 20 with the colors reversed and run the 400-game test. Here are the results of Nutters against the FIDEs:

Nutty Knights: 272
Fabulous FIDEs: 79
draw: 49

Holy crap!!! That is not at all what I expected. I don't really understand why the Nutters are so dominant, given that their total piece values seem to be about the same. Our piece values could certainly be wrong, of course. But I don't think they are that far off - at least in terms of what a piece is worth in general. In which case, it shows that the true value of a piece really, really matters what else is on the board. I'm guessing they can develop very quickly and very flexibly and get early advantage.

How to fix is a hard question. I've thought about this some and considered a few ideas. The one that "feels" best to me is limiting the range of the Charging Rooks to 4. Essentially, this means that instead of the Charging Rooks being regular Rooks that move backwards as a King, they become Short Rooks that move backwards as a King. I will test this, but I'm certainly open to other thoughts.

Speaking of fixes, I've re-run the FIDEs vs. Clobberers test with the suggested fix - change BD to BnD. Here are the results:

Colorbound Clobberers: 180
Fabulous FIDEs: 156
draw: 64

Much better, and probably sufficient for now. Given that we don't know what a lot of evaluation terms should be, the accuracy of these results is limited and this result is probably within the "margin of error" (acknowledging that I am not using that term in the same way that statisticians do.) With this change, I would consider this matchup balanced for all practical purposes.

H.G., I saw your question about what the results would be in pawn odds games. I don't know but I'll work on running that test also.

H. G. Muller wrote on Fri, Oct 5, 2018 05:47 PM UTC:

Basically this is just a scaled version of the 3.25/3.25/5.00/9.50 values. Except that the Pawn was weakened by 5%.

But a Pawn is the most variable piece of all; it is really very ill-defined what an advantage of 1 Pawn means. Rook Pawns, Pawns on central files, doubled Pawns, passers, 7th-rank passers... These all have completely different values, with as much as a factor 5 between them. For this reason I always use the Queen as calibration standard.

Kevin Pacey wrote on Fri, Oct 5, 2018 05:37 PM UTC:

Below is a sub-wiki that quotes many valuations for the chess piece types; I'm wondering why Kaufman in a book of his published in 2011 apparently changed his valuations to make them nearly identical to what the Dutch world chess champion Euwe gave them (notably single N=B=3.5 and Q=10, though unlike Euwe he has R=5.25 instead of 5.5), which is about what I'd use (I'd put a N at e.g. 3.49, as if to be 'precise', and use Euwe's R=5.5):

https://en.wikipedia.org/wiki/Chess_piece_relative_value#Alternative_valuations

H. G. Muller wrote on Thu, Sep 27, 2018 09:53 PM UTC:

I think this is where Betza's 'leveling effect' comes in. You can use a piece in two ways: (1) avoid trades for a nearly equivalent opponent piece; (2) don't care about such trading. In the trade-avoiding strategy (1), the opponent's counterpart will interdict access to the squares it attacks, as going there would give him the opportunity to trade. This limits the use you can make of the piece, thus depressing its effective value. In general, stronger pieces lose value due to the presence of opponent weaker pieces that they have to avoid 1-for-1 trading with.

If the value was close to start with, the value depreciation caused by adopting a trade-avoiding strategy can be larger than the intrinsic difference. In that case you would be better of using strategy (2). But there the fate of the piece is to be traded, which makes them effectively equal in value, as any difference will evaporate with the trade. So pieces nearly equal in value will see their value pulled towards each other when they oppose each other, until it gets exactly the same. I think this is pretty much the case for a Knight and a lone Bishop on 8x8. If the intrinsic value of the Bishop was somehow increased compared to the Knight, initially you would not benefit from it. Because you would have to 'sacrifice' that extra intrinsic value by limiting the use of the Bishop by stricter trade-avoiding.

Kevin Pacey wrote on Thu, Sep 27, 2018 07:30 PM UTC:

Regarding the 3 tempi are worth a pawn axiom in chess, I think this was long ago originally stated with the added condition that it was a rule of thumb that applied in particular for open positions. In closed positions, there is often no rush and a player is often able to afford the time to maneuver pieces to their best positions one at a time (I have neglected to mention the added condition of an open position in Notes sections of pages on chess variants I've invented, though, regarding my suggested hints on how to play them).

To be clearer, I'd only put the average difference between a N and B within the microscopic margin you suggested, in favour of the (single) B. However I have a soft spot for knights, though many chess players would more often than not just as soon not trade a B for a N as a Cadillac for a Chevrolet. I still remember the late world chess champion Tal looking away disappointedly when I traded away a B for a N against an older Grandmaster (GM) in the last round of an international event in Canada in 1988 (the fellow soon offered me a draw, as I still had the tiny edge of a slightly better pawn structure). An untimely and inappropriate recollection of a remark Dutch GM Timman made about Ns in a book of his was my undoing. I wouldn't be honest though if I didn't mention that I had had the bishop pair already in the game in question.

H. G. Muller wrote on Thu, Sep 27, 2018 06:18 PM UTC:

Well, the equality of a lone Bishop & Knight (both 3.25 Pawns) was what Larry Kaufman found from statistical analysis of a huge database of grandmaster games. If you are talking about deviations of the order of 0.05 Pawn, I doubt that this will be measurable, or even meaningful. Because piece values are by definition averages, and it serves no purpose to know the average much more precise than the typical deviation. Total material balance also depends on how well pieces cooperate, or combat each other, as the case of 3 Queens vs 7 Knights dramatically shows. Kaufman himself already investigated how the B-N difference depends on the number of Pawns, and did indeed find a dependence, where it is better to have Knights if there are many Pawns, but better to have Bishops with very few Pawns. AFAIK he did not try to correlate it with the shade of the Pawns (probably because such a thing is not always easily defined, if the Pawn chains are not fully interlocked). The 'good Bishop' vs 'bad Bishop' probably has a much larger effect than average Bishop vs total number of Pawns.

Common wisdom has it that "3 tempi is a Pawn", which would equate a tempo to 33cP. That makes that for nearly equivalent pieces the actual difference will be mostly determined by where they are located (centralized vs on the edge), as moving them to improve their location is so costly it already defeats the purpose. The difference between having a Knight on e4 and having one on a1 would certainly be more than 0.05 Pawn.

I remember spending a lot of time on determining the value of limited range Rooks (R2 - R5), so the 400 cP I used with the Rockies is probably quite reliable. And you are right: your value for the Cardinal is definitely too low (all my tests point to A+P being slightly stronger than Q), and the unexpectedly large cooperativity bonus of the B and N move is most-likely indeed due to this concentration of attacks. If I watch games the Cardinal turns out to be extremely adept at annihilating enemy Pawn chains, and I gues this is because it can attack a Pawn, the square it can be pushed to, and a Pawn it protects, all at the same time. I am suspecting that orthogonally adjacent move targets are an asset by themselves, in addition to the individual moves. The Cardinal's 'footprint' has 16 of those, Queen and Marshal only 8. This would also explain why a Rook is still worth more than a Bishop (500 vs 400) on a cylinder board, where the average number of moves is about the same. (On 8x8 one square less for the Bishop, but one square can be rached through two paths, which should partly compensate that.)

It would be very interesting to do a more thorough investigation of pair bonuses. (Still on my ever-growing to-do list...) The only thing I tested so far is that two B-pairs seems exactly twice as strong as one pair. So it doesn't count as 2x2 pairs; the even Bishops are always worth 50cP more than the odd Bishops.

Kevin Pacey wrote on Thu, Sep 27, 2018 04:32 PM UTC:

Hi H.G.

I did note in my (unchanged) Edit sentence just above the values that I gave (in my second last post in this thread) that my memory had been rather off, though I could have been more explicit about that pretty much negating my earlier remarks in the text about the ranking of the armies as I recalled it. A natural aversion to my eating crow, I suppose. :)

The effectiveness with which an army works together is indeed not necessarily reflected by the material sum of its parts.

I was happy many of our values for pieces of the 4 armies seemed relatively close to each other, with some notable exceptions (perhaps especially the Cardinal, Colonel and Short Rook). I can see how I might have underestimated a Cardinal (on 8x8 at least) since my primative formulas for valuations don't account for a Cardinal's great concentration of power within a 2 square radius around it, covering the same number of squares as a Chancellor or Queen would.

I still rate a knight as microscopically worse than a bishop on average, though I didn't bother to say so explicitly in my recent post on CWDA values. At least two chess grandmasters that have been in my area (besides some advanced chess books I read long ago) note that once a B is gone, it's harder to cover squares of its colour. I'd say that's since a knight takes at least two moves to cover a square of the same colour it wasn't already, whereas a B might often take only one move to do so. There is also that a B can sometimes trap a N against an edge of the board in an endgame. Of course, there are many other things to consider, but these things are what chessplayers have recently pointed out to me. Then there's my still not being 100% trusting of computer statistical studies/methodologies, but at times that comes down to vague doubts and my own intuition/studies as a chess player.

H. G. Muller wrote on Thu, Sep 27, 2018 11:09 AM UTC:

The values you added seem at odds with the text above it.

Note that the individual piece values Fairy-Max uses do not really add up to give the observed total strength of the army. In particular it seems to overestimate the Clobberers, and under-estimate Rookies and Nutters (which get about the same total as FIDE, which I will use as a reference). The values reported for color-bound pieces include half the pair bonus, as Fairy-Max does not explicitly keep track of such bonuses, and I always do the value determination of such pieces in pairs. This might be unrealistic for the Clobberers, as it will be difficult for them to conserve both pairs. And the color-bound pieces are significantly stronger than Bishop, so their pair bonus is probably larger too.

Another possible explanation could be that the Clobberers pieces poorly cooperate, in the sense that they do not complement each other's weaknesses, but tend to have the same. E.g. the Cloberrers have only one major piece. And 4 of the 6 minor pieces are quite valuable (about a Rook). So there is a fair chance that despite a large advantage in terms of piece value, they can often not win for lack of mating potential. That so many pieces are color bound makes it worse. You can end up with Fad + Bede on the same shade, an advantage numerically as large as 2 Rooks, but still a dead draw. And Fad or Bede + Pawn vs minor can be drawn by sacrificing the minor for the Pawn, or perhaps simply by having the King stand in the path of the Pawn on the other shade (so you could only win if you can somehow catch the minor to force that King to leave by zugzwang). If it would be a Bishop of opposite shade, it would obviously be impossible to harras it. And Fibnif or Woody Rook can always move to the other shade and safely stay there.

The Rookies might have an advantage that all their pieces have mating potential. On an individual piece mating potential doesn't seem very valuable, but when all your pieces have it, even a disadvantage as small as a Pawn might always be lost, for lack of drawing tricks. This could be worth something.

Kevin Pacey wrote on Thu, Sep 27, 2018 04:05 AM UTC:

I've added some content to my previous post in this thread, with an edit (notably my own estimates of the relative material values for each army).

H. G. Muller wrote on Mon, Sep 24, 2018 12:57 PM UTC:

Indeed, I addressed this this inconsistency in a follow-up comment, at the time. Paper-Sciccors-Rock situations are very uncommon with piece values; usually the empirically determined value of a piece is highly independent on what you play it against. (Except for extreme situations such as 3 Queens vs 7 Knights; it has to be a mix of pieces of different value.) I guess that this is why 'piece value' is a useful concept in the first place.

It could be that the Clobberers are composed such that they can better exploit the most important weakness of the Nutters, namely that they cannot quickly pull back. The Clobberers have only one major piece, but they have several combinations of two minor pieces that together can force checkmate (through repetitive checking) on an unprotected King. As Kings tend to stay on the back rank until the late end-game, it is rather tempting for a naive Nutters player to abandon its King while aggressively attacking (possibly gaining significant material), to discover that a counter-strike expedition of two pieces will unescapably kill its King. I don't think any of the other armies has the ability to inflict mate with such a small force. (In FIDE there is the pair of Rooks, but that already fails when there is a Pawn to shelter behind, while the FAD can jump.)

So it seems it is more important for the Nutters to have some strategic knowledge (which Fairy-Max utterly lacks), namely that it should always keep a 'sweeper' piece near the back rank to defend its King against sudden break throughs. That the opponent also doesn't know that this is a weakness, and won't intentionally lure the Nutter pieces forward (e.g. by forcing them to make a forward distant recapture) only partly compensates this ignorance, as it will happen enough that the Nutters will just accidentally (unforced) move their pieces ahead. This is actually statistically likely, as the Nutter pieces in general have more forward than backward moves. So they tend to 'drift' forward when they would wander around aimlessly. In an engine with the required strategic knowledge the Nutters should do even better, though, and they were already one of the strongest. So if the method has a systematic error here, it is in the wrong direction.

The reason I was not so worried about 'disadvantaging' the Nutters by denying them the opportunity to promote to Queen / Marhall / Archbishop is that they already seem to have too strong an army despite this 'handicap'. Also, the disadvantage for the Clobberers that they cannot do better than Archbishop is not nearly as large as what Betza thought, as the Archbishop is unexpectedly strong. And also with this handicap, Clobberers seem stronger than FIDE. So I thought it entirely acceptable to limit promotion to each army's own super-piece.

If it would help, it would not be a bad idea to limit the Rookie's promotion choice to { Fibnif, Short Rook, Half Duck }, or perhaps even just to Fibnif. But such promotion limitations seem actually ineffective in altering the strength of an army.

Aurelian Florea wrote on Mon, Sep 24, 2018 11:52 AM UTC:

I wanted to notice something in HG's old results.

I see the CC and the NN are balanced against each other but the CC behaves quite worse versus the RR (10%). So it seems same small but not insignificant rock-paper-scisors effect is taking place. This could be due to a possible need (I mentioned a long time ago in the different context) of a concept of multidimensional piece values. But it is probably more than that if any such thing is possible? The NN are a "pressure" army as they have more forward moves and the RR are usually slower as they can't turn a corner that easily. This seems a reduction of the weakness of the NN. On the other hand CC has the strategic weakness that it can be twice impaired by te lack of an counterpart of the other color bound piece.. RR can profit more easily from that as because of it slowness weakness it is a more strategic army herself. The NN don't have time for such debates. They need to "act" so they can't profit out of it.

Such lines of reasoning are most likely usefull but I can't pinpoint why I'm a bit uncomfortable with the idea of studying it exactly here. Maybe the game dimensions.

H. G. Muller wrote on Mon, Sep 24, 2018 10:09 AM UTC:

I looked up my old comment in this topic ( https://www.chessvariants.com/index/listcomments.php?id=31222 ). There I report that the Rookies were actually strongest of all. So RR >~ NN > CC > FF. This was based on the scores in 400-game matches between each pair of armies with Fairy-Max. In a comment just before it Fergus had arrived to the same conclusion based on ChessV (but with far fewer games). Fairy-Max randomizes the first 4 opening moves of each player, which should be enough to not have significant duplication of games. (I did not actually check for duplicate games.)

My experience with this kind of materialy-imbalanced testing is that the result is not very sensitive to the piece values used by the engine. E.g. if you give one player an Archbishop instead of a Queen, and assign it a value of 900 (where Q=950), the side with the Queen will score about 62%. If you then repeat the test with A=1000), the player with Queen will still score around 62%. The reason is likely that, as long as the values are different, 1-for-1 trading is not frequent, because there is always one player that thinks it is to his disadvantage, and will avoid it. And it does not matter much which player this is. The imbalance is therefore long-lived, and you measure the relative effectivity of the imbalanced pieces for doing (or helping to do) damage to the common pieces. Which is pretty much independent of how the computer values them, as they will mostly not be traded directly for other material (2-for-1 trades are also pretty rare).

So as long as both players share the misconception on actual value, the programmed value doesn't seem to be very critical. Of course it should not be totally off; if you set the value of a Queen below that of a Pawn, it will indeed get worth as much as a Pawn, because it will be immediately traded for one. There is just no way the other player could shield all its Pawns from Queen attack, before the Queen gets to see it can force a more profitable trade. If you assign reversed values to pieces that differ very much in power, the strong one will probably succeed to force it being traded for the weak one, which it mistakenly considers profitable.

Before I did the test with complete armies, I did similar tests on all individual pieces in the armies to determine their value. E.g. use FIDE as context, and then replace Rooks of one side by (say) HFD on the other, to see whether HFD is better or worse than an orthodox Rook (and by how much). In such tests I always make sure they are self-consistent, i.e. performed with the programmed value equal to the value suggested by the eventual score. If my initial guestimate of the value was wrong in this respect, I just repeat the test with the value suggested by the outcome of the flawed test. Which then usually does not significantly alter the outcome. This should have made the individual piece values more or less OK, so that the play during the whole-army tests must have been realistic, and thus must have made the sampling of what the pieces can do representative.

I am a bit surprised about the low score imbalance you get for CC-FF. My old results table says +9% for this (meaning the match score averaged over both colors was 59% in favor of the Clobberers). You get only 52.6%. What is the Pawn-odds score for ChessV (i.e. when you use equal armies, except that one of the players gets f2 or f7 deleted)?

These are the piece values Fairy-Max is using (Pawn = 100):

FIDE Knight 325 Bishop 350 Rook 500 Queen 950

Clobberers Waffle 320 Fad 480 Bede 530 Archbishop 875

Nutters Fibnif 310 Charging Knight 400 Charging Rook 485 Colonel 935

Rookies Woody Rook 310 Short Rook 400 Half Duck 480 Marshall 935

Kevin Pacey wrote on Mon, Sep 24, 2018 06:20 AM UTC:

I'd rate the RR army and the FF army about the same, though the mobilitity of a Q may give an edge to the latter. The CC army would seem to be at least slightly better than either (based on material valuations) IMO, but noting, perhaps very significantly, that it seems it could often be really awkward to develop both of the two waffles (i.e. not just one) that a player has with any speed, especially with the Black pieces (the edge pawns being unprotected at the start doesn't help either, especially vs. a FIDE army, with its Q). The NN army seems clearly the best army of the 4 in theory to me (based on material valuations), except I've yet to play with it, rather than against it. Thus I find myself pretty much agreeing with H.G.'s assessment of the 4 armies' relative strengths, though perhaps for many differing reasons.

[Edit: It seems my memory of the relative strength of the 4 armies that I estimated long ago was rather off. In any case, here's my current tentative estimates of each army's relative strength (based on material valuations alone):]

FIDE:

Knight 3.5 approx. Bishop 3.5 Rook 5.5 Queen 10 Army (w/o Ps/K) 35

Clobberers:

Waffle 3.125 Fad 4.75 Bede 5.125 Cardinal 8 Army (w/o Ps/K) 34

Nutters:

Fibnif 3.125 Charging Knight 4 Charging Rook 4.9375 Colonel 8.9375 Army (w/o Ps/K) 33.06

Rookies:

Woody Rook 3.125 Short Rook 3.54 Half Duck 4.915 Marshall 10 Army (w/o Ps/K) 33.16

Aurelian Florea wrote on Sat, Sep 22, 2018 03:00 AM UTC:

I think the NN despite their many weaknesses they have a wonderfull middle game. That should probably always do it!

@Greg The Bede thingie seems a good idea for me and it does sound more natural for a rook!

About the RR, I find the FDH quite akward and it sould be around rook level. It probably is not (but the R4 definetly compensates for it).

Anyway very nice effort on mister Betza part in an era when computers were much weaker :)!

25 comments displayed

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Permalink to the exact comments currently displayed.