[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Mon, Jan 9, 2006 07:30 PM UTC:

I have expanded the information in the 'Percent won' column to show the fraction of won games to played games used to calculate the percentage. This should give a good indication of how much to trust any given rating.

🕸📝Fergus Duniho wrote on Mon, Jan 9, 2006 08:47 PM UTC:

I made a couple significant changes to the code I posted here earlier.

One is a bug fix. The uasort on $players should be a usort. A uasort
leaves array keys intact, but my subsequent use of the array assumed that
array keys had been changed.

The other change is to the generation of the second two sets of ratings.
Between the second and third sets, I changed the order of the $players
array, essentially twisting it inside out in a spiral, so that the order
is very different. I then calculated the last two sets of ratings in the
otherwise same zig-zagging and reverse zig-zagging order of the first two
sets. Here's the code I used to change the order:

$neworder = array(); $midpoint = (int)floor($pc/2);
for ($i = 0; $i < $pc; $i++) {
	$neworder[$i] = ($i & 1) ? $players[$midpoint - (int)ceil($i/2)] :
$players[$midpoint + $i/2];
}
$players = $neworder;

I may change this in the future. I am thinking of evaluating all pairs in
a single order, based on which pairs have played the most games together.
This would start with the pairs that are most likely to give reliable
ratings and move on to pairs less likely to give reliable ratings. This
would help make the ratings of the latter more reliable when it finally
got to them, and it is probably the best order overall for reliable
ratings.

Roberto Lavieri wrote on Mon, Jan 9, 2006 08:59 PM UTC:

The information must be expanded adding drawn games. By example, 10 victories + 6 draws + 4 loses may be indicated as 13 points/20 games. A player that shows 20 draws in 20 games has an effectivity of 50%, but it is indicated as 0/20, and it distorts the information. I´m not enterely convinced of the goodness of the method, but I think it is, at first view, reasonable, and I understand the intention with some modifiers present in the algorithm, but I´m not very sure they are solid enough or the best possible. The experience or tests can show more about it, also showing the strong and weak points. It is too soon for me to say much more, or to compare it with ELO, but when I have some time, I´ll try to go more deep on it. One point of serious discussion is whether a rated player must lose rating when defeated by a less rated player, and if the response is accepted to be YES, how much must be the lose?. ELO is a relative measure, it works in that way, and experience has shown it is almost perfect in this way, considering the opinion of experts. The cause may be due to the fact that, at good leveles of play, luck is not of extereme importance, but for us, with a lot of games in which theory does not exist, luck is a factor, undoubtely, for many games: you can lose in the opening, regardless you are a moderately solid player in many other games.

Tony Quintanilla wrote on Mon, Jan 9, 2006 10:20 PM UTC:Excellent ★★★★★

In order to focus on the more meaningful ratings, it would be interesting to be able to filter out 'provisional' ratings, those ratings using less than say 14 games for the calculations, following Gary's remark.

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 02:29 AM UTC:

Roberto,

You were right about it not accounting for draws. Although it was supposed to, it was checking for the wrong string. So it was missing all the draws. This has now been fixed.

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 03:51 AM UTC:

Roberto writes:

One point of serious discussion is whether a rated player must lose rating when defeated by a less rated player, and if the response is accepted to be YES, how much must be the loss?

I'm not sure this question makes sense within the context of how GCR works. When GCR compares two opponents, it uses the total of their scores in a single comparison, and it does not compare them on a game-by-game basis. But let's suppose two players play only one game together, and the higher rated player loses to the lower rated player. In that case, the higher rated player will lose points. The loss of points is determined by the difference between their ratings, by the stability of the higher rated player's rating, and by the reliability of the score, which for only one game is its lowest at 25.

Antoine Fourrière wrote on Tue, Jan 10, 2006 08:14 AM UTC:

I think only recognized variants or variants which have made it into one or
two Game Courier Tournaments should be considered for an overall rating
anyway.
(A game may need some fixing in the rules or in their writing. There have
been recent ambiguities about Rococo or Switching Chess. More annoyingly,
my own ill-considered Pocket Polypiece Chess setup gave me the opening
advantage of one Pawn against George Duke.)

Roberto Lavieri wrote on Tue, Jan 10, 2006 10:57 AM UTC:

I agree with Antoine: basically, recognized variants or variant that have been played on Tournaments must be considered for rating purposes. The reasons are various, but the main are that some variants can be considered rich enough, stable, balanced, deep and good for game play, without a clear 'a priori' advantage for one player or the other, or with a clear tendence to draws, or extremely sensitive to openings, or not very related to Chess, or very large in such a way the games are kilometric, or very little and the game result is not relevant, or chaotic enough for rate the players after a game in such a way the rating makes sense. But other games may be also considered, if there is consense. In every case, I consider that NOT ALL games may be subject of rate the results, by many reasons, depending on the game.

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 04:18 PM UTC:

I'm not going to restrict which Chess variants can be rated. I will let players decide which games they want rated, and I will allow this script to work with any game. On the matter of multivariant ratings, which is what I think Antoine may mean by overall ratings, this script comes with various filters. None will distinguish recognized variants from others, but they do offer the option of looking at ratings for games played in specific tournaments. If you want to look at games played in any tournament, change the Tournament Filter to ?*. This will filter out null strings.

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 06:47 PM UTC:

I think one of the main points of discussion here should be the merits of a bell-curved scale of probabilities, which Elo uses, verses a linear scale of expected outcomes, which GCR uses. So far, my main reasons for using the latter are convenience and insufficient understanding of the Elo method.

Gary Gifford wrote on Tue, Jan 10, 2006 09:40 PM UTC:

In regard to the 'bell-curved scale of probabilities,' I think we should
be seeing the bell-curve as a distribution of the number of players
(y-axis) with respect to playing strength (x-axis).  Thus giving us the
bell.  But perhaps we are refering to 2 different curves here.  In regard
to probability, I read that a 200 point rating difference implies that the
higher rated player should be winning 3 out of 4 games between the 2. I can
look up the source later.

I again mention the following website as it offers a relatively simple
method of rating calculation. I still believe that it may be of great
value in the CV Ratings project.  The site includes example calculations.

http://www.chess-express.com/explain.htm

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 11:03 PM UTC:

Since you've repeated it, I'll mention that I don't think the CXR method will be of value to this project. It seems to be a very different approach than the one I am taking. If you think otherwise, I'll leave it to you to try to explain it to me.

🕸📝Fergus Duniho wrote on Tue, Jan 10, 2006 11:21 PM UTC:

I'm sure we are referring to two different bell curves. The Elo method
specifically tries to measure probabilities, and the probabilities fall
along a bell curve. The following page lists the probabilities associated
with various differences of Elo points:

http://www.ascotti.org/programming/chess/elo.htm

The two axes are (1) the difference between two Elo ratings and (2) the
probability that one player will defeat the other in a game.

In contrast, the GCR method correlates (1) the difference between GCR
ratings with (2) an expected total score on the games played between two
players. This relation is linear.

Gary Gifford wrote on Wed, Jan 11, 2006 12:30 AM UTC:

The CxR concept is as follows... and to use it for CV seems easy.

New Rating = Your Old Rating PLUS (Score x 21) PLUS (Pre-Game Rating of
Opponent MINUS Your Pre-Game Rating) [divided by 25]

where Score is +1 for a WIN, 0 for a DRAW,  -1 for a LOSS

So, if I am 1800 and my opponent is 1900 and I win:

My New Rating would be:  1800 + (1 x 21) + (1900 - 1800)/25
My New Rating would be:  1800 + 21 + 4 =   1825

For the 1900 guy I played we'd see:
His New Rating = 1900 + (-1 x 21) + (1800 -1900) / 25
His New Rating = 1900 -21 -4 = 1875

The website I mentioned has different examples and includes unrated player
calculations.  But, even if we apply the CxR to the initial Ratings you
(Fergus) have calculated, this system will polish the values over time
and we will have numbers close to those seen in the USCF.

The winning probability is actually irrelevant when using this system.

Roberto Lavieri wrote on Wed, Jan 11, 2006 12:37 AM UTC:

I think GCR works very well with the asumption that a player is not going to diminish a lot his abbilities quickly, say, from a day to another. But if it happens by any reason, the method has the same problems of ELO, and it can be more pronounced in GCR, if the player has a high number of games rated before and his rating is relatively solid-estable: the player´s rating may diminish slowly, and it can´t reflect the real change appeared in the player´s abbility. It is not a secret that Kramnik has some health problems, and it has been reflected in his playing abbilities, but ELO has take it into account after many months and many games played. GCR is, perhaps, even less sensitive to reflect such cases.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 12:57 AM UTC:

I'm returning to a matter Roberto raised earlier. Suppose a 3000 rated
player plays one game against a 1500 rated player and loses. As the system
works right now, the 3000 rated player would lose 300 points. In contrast,
if two 1500 rated players played one game together, the loser would lose
100 points.  Therefore, I think the formulas for reliablity and stability
need to be changed. One idea is this:

reliability = pow((number of games played together),2)

stability = pow((abs(old_rating - 1500)/100 + 1), 2)

With these formulas, the 3000 rated player would lose about six points for
losing a single game to a 1500 rated player. Furthermore, both scores would
be equal for one game played between 1500 rated players, as they both are
now, and this would have the same effect. When two 1500 rated players
played a single game together, one would get 1600 and the other 1400.

Roberto Lavieri wrote on Wed, Jan 11, 2006 01:13 AM UTC:

Reliability and stability may be tuned, but it must be made in base to ideals for the purposes. These ideals are not so easy to stablish quickly, and perhaps some probalistic considerations may help. For me, at first appretiation, when a 3000 player loses against a 1500, the 'rating lose' must be more that when a 1500 loses against other 1500, I can´t say in five seconds how much must be in both cases, but the difference between the first and second case must be relatively notorious.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 01:20 AM UTC:

Regarding Roberto's recent comments, let me mention that GCR does not work with the same paradigm as Elo. A player does not have a fixed GCR that eventually becomes too stable to change. Each time someone reloads this page, each player gets an initial rating of 1500, and his GCR is freshly recalculated on the basis of all available data. If he has previously held a very high rating but has started to do poorly, this may cause his rating to drop more before it is calculated nearly as high as it used to be. Furthermore, each GCR is based on all available data of everyone in the same playing network. Even if someone has previously attained a very high rating, it could drop as other players start to do better, even without him playing another game.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 01:27 AM UTC:

When two players play only one game together, GCR works with the assumption that the results are more due to chance than to skill. That is why I think a 3000 rated player who loses to a 1500 rated player in their only game together should lose fewer points than another 1500 rated player would. Furthermore, since 1500 is everyone's initial rating, whereas a 3000 rating must be earned, it is assumed that the 3000 rating is more reliable than the 1500 rating, and the main change in rating should be to bring the 1500 rated player up in rating. As things work right now, the 1500 rated player who beat a 3000 rated player would rise to a rating of 2250.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 03:29 AM UTC:

Gary, the CXR method works on a game-by-game basis, and it generally assumes that players won't be playing other players more than 400 points apart. GCR works on an opponent-by-opponent basis, and since it freshly calculates all ratings in a non-chronological order, it cannot make the assumption that players will usually be less than 400 points apart.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 03:45 AM UTC:

Roberto, I don't think you expressed yourself well with the word notorious. It means infamous, as in being famous for something questionable, unworthy, or downright awful, and it generally describes people. What did you mean to say?

Roberto Lavieri wrote on Wed, Jan 11, 2006 12:49 PM UTC:

Fergus, I have tried to mean 'clear, by argued reasons'. But returning to the point, I am now contrary to drastic changes in a rating after a single game, even when a low rated player beats a very high rated one. There are many factors that can produce it, including forfeit or stopping in the middle by any reason, and one game can´t be so decisive. I agree with tuning the modifiers, but I believe this is not a very easy task.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 03:54 PM UTC:

Michael,

The purpose of a rating system is to measure relative differences between
playing strength. I can't emphasize the world relative enough. The best
way to measure relative playing strength is a holistic method that
regularly takes into account all games in its database. One consequence of
this is that ratings may change even when someone stops playing games. This
makes the method more accurate. The Elo and CXR methods have not been
holistic, because a holistic method is not feasible on the scale these
systems are designed for. They have to settle for strictly sequential
changes. Because GCR works in a closed environment with complete access to
game logs, it does not have to settle for strictly sequential changes. It
has the luxury of making global assessments of relative playing strength
on the basis of how everyone is doing.

A separate issue you raised is of a 3000 rated player losing less points
than a 1500 rated player. Since last night, I have rethought how to use
and calculate stability. Instead of basing stability on a player's
rating, I can keep track of how many games have so far factored into the
estimate of each player's rating. One thought is to just count the games
whose results have so far directly factored into a player's rating.
Another thought is to also keep track of each opponent's stability, keep
a running total of this, and divide it by the number of opponents a player
has so far been compared with. I'm thinking of adding these two figures
together, or maybe averaging them, to recalculate the stability score of
each player after each comparison. Thus, stability would be a factor of
how reliable an indicator a player's past games have been of his present
rating.

That covers my new thoughts on recalculating stability. As for using it, I
am thinking of using both player's stability scores to weigh how much
ratings may change in each direction. I am still trying to work out the
details on this. The main change is that both stability scores would
affect the change in rating of both players being compared. In contrast,
the present method factors in only a player's own stability score in
recalculating his rating.

One consequence of this is that if a mid-range rated player defeats a
high-rated player, and the mid-range player has so far accumulated the
higher stability score, the change will be more towards his rating than
towards the high-rated player's rating. The overall effect will be to
adjust ratings toward the better established ratings, making all ratings
in general more accurate.

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 03:59 PM UTC:

Roberto,

As you know, there is a language barrier between us. It sometimes gets in the way of understanding you, as it has with your misuse of the word notorious. I am simply asking for clarification on what you are trying to say.

Anyway, I am in agreement with the last points you have raised. A single game should not have a great effect on a player's score, and the formulas need more tweaking, but it's not an easy task.

Christine Bagley-Jones wrote on Wed, Jan 11, 2006 04:05 PM UTC:

well if you don't play games Michael, your rating will drop :)
looks like mine will be dropping too he he. (i'm kinda a little shocked
by that)
not that i really care but, i must be bored, but doesn't that mean, if
you have two players that have a 'true' rating (played many rated games)
of 1500, and one of them is inactive for a bit, therefore rating drops, now
if these players play, it will be a game between 2 players where one is
higher rated than the other, where in reality, it should be a game between
equals ... wouldn't that distort ratings after outcome?
another thing, fair amount of games played are more in the spirit of
TESTING OUT A VARIANT, more than anything else. i agree with those that
said that only 'tournament games' should be rated, unless people agree
otherwise beforehand.
as far as 1500 vs 3000, and 1500 rises 750 points if wins, surely that is
too much. i agree that 3000 player should not drop 'heavily'
finally (yawn), are we going to see people less likely to put up a
challenge because of fear of someone much less rated accepting?
will this lead to 'behind the scenes' arranging of games? 
if a vote was taken, would more people want ratings than not?
sorry for length, just adding food for thought.

25 comments displayed

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.