lichess.org
Donate

Centipawn Loss Distribution

@jdannan said in #9:
> Is there a good tool for calculating centipawn loss on a move by move basis, and also for processing a bunch of games? lichess reports the average for an individual game of course but that's a bit limited.

I used python-chess to write a program go through the game, analyse each move and calculated the centipawn loss afterwards. I don't know if there exists a program that does this.

@jdannan said in #10:
> One observation I can make about centipawn loss is that it varies substantially with the style of the player and/or nature of the individual game. I sometimes get very low values for a player of my modest ability, whereas a club mate with very similar (actually higher) rating usually generates much larger values. He takes more risks but this means he poses more questions.

In a previous post (chessenginelab.substack.com/p/evaluating-sharpness-using-lc0s-wdl) I looked at a way to quantify the sharpness of a position. In the situation that you described, the sharpness of the games of your club mate should be higher. I still want to do some testing to see if sharpness and centipawn loss/accuracy are correlated.
@dboing said in #5:
> you could have a log on the axis, if there is some data you would like to be seen. No?

I thought about using a log axis but since the only very high peak in the number of moves was at a CPL of 0, I decided to just omit that bar since these are anyway the "boring" moves.
I didn't do a log x-axis since centipawns are more of a linear measure, at least at lower values.

I'm not quite sure that I fully understand what you mean with the other parts of you comment, but I thought about using some more internal factors of an engine to figure out how difficult a position is to play. For example, I thought a bit about using the depth at which an engine finds the best move as such an indicator. Needing only a low depth to find the move would mean that it's easier than a move which is only found at a higher depth.

I haven't done more testing in this direction since it's difficult to tell which things to include. I'll look into things like that in the future.
@jk_182 said in #12:
> I thought about using a log axis but since the only very high peak in the number of moves was at a CPL of 0, I decided to just omit that bar since these are anyway the "boring" moves.
> I didn't do a log x-axis since centipawns are more of a linear measure, at least at lower values.

A log axis does not have to be to bring back a distribution or relation (if it were and Y - X dependent "law" trying to plot, or a log-normal we would like to have looking symmetrical like a normal. log-normal being a multiplicative small factor made to look like additive small factors (bad syntax but good key words, I hope).

It can be just a log axis, to compress high values, and relatively spread the lower values. I think it is locally linear at 1 (forgot the details, just remembered the data spread effect, might want to shift something, to make that work).

but if you tell me that nothing of interest is there. Yes, cutting off outliers might have been done before. Good that you mentioned it.

> ... I thought about using some more internal factors of an engine to figure out how difficult a position is to play.

Interesting direction, I am bound to appreciate. That would require using command line, and UCI options if you wanted more information about the position content going into your set of measured end-points, meant to represent a player (playing).

It might also have needed imagination for the internals of engine, that UCI don't impose, but the engine dev. might be generous about (in some analytical use design, away from the engine tournament design, but promising same ELO at the end, just slower for us).

> For example, I thought a bit about using the depth at which an engine finds the best move as such an indicator. Needing only a low depth to find the move would mean that it's easier than a move which is only found at a higher depth.

It would have been an easier averted error. Yes, that is available from UCI (it talks like that without engine generosity). But lichess does not show that to us. The PVs cutoff even with local engine (16 plies), and even more from cloud (10 plies).

> I haven't done more testing in this direction since it's difficult to tell which things to include. I'll look into things like that in the future.

I think the leaf depth for the best PV that is basis for the error (compared to played), or the shallower depth at which the engine actually found out for the first time that it was that error. You might want to look at shallower godpeth, or get the engine to spill the beans of the iterative deepening pictures. you might get the story of the partial tree discovering breath minimally needed to diagnose the error. I hope I did not make it confusing. But in command line, if your machine is slow enough, you could see in the info, some of that happening. However, can't rely of that output. It would have to be given by engine.

The sure way would be to batch UCI go depth commands. Although I did see in the new SF source code a keyword on top of search: "trace". If that can make its way through UCI, I think it ought to have iterative under user control.

Was I clear on that more tangible idea? next post for my stuff.
Oh, and I was trying to link your other post to this one. As it might be interacting, as a position characteristic among other, with the position set part of each player sampled pool of games in your data "corpus" over which your are extracting a player profile or phenotype or model or error. (it could be other things that error, but that is how you started, might as well keep in that direction until it needs revision, and that is how the chess community is tooled ubiquitously, so.. ).
I'm pretty sure this distribution can be directly converted into ELO estimate. All you need to do is to calculate it for a bunch of players with established ratings from their games and find out how distribution parameters correlate with ELO. And yes, probably better to look at winning percentage drop than at centipawns loss.
I think you might consider using more than one definition of sharpness computer measure, and give them temporary qualifiers.
Then at some point you would be able to have the fuzzier notion that people use to be confronted to what each prototype definition would bring as positions.

I think one might use SF PV profiles at a position as one way of defining some sharpness. That would be of a specific position being sharp. But I think there are extended definition, that are about sustained sharpness, that one might have to keen dealing for a long segment of positions (all from adjacent positions by legal moves, or in other words a contiguous game fragment).

That at each successive position from such a position, the best move would be very isolated from many other moves. One would need to calculate tightly for large human depths.

From the mere leftmost node scoring in a single search, there might be 3 things to consider, the legal branching number, and then the relative scores of all the candidate if you can force SF to work at its maxim PV profile you might get enough data there. But even 5 might be enough. The number of legal moves, would be to be able to differentiate with a forcing segment if needed.

I may be missing some points, in the above. When letting loose the idea machine, sometimes things get loose we might have wanted to keep in check. a reasoning bling spot of sorts. So, I enjoy being checked for those, if readable first.

Maybe your idea of depth of actual leaf reward discovery (difference from other leaves at same depth) might also play a role also in what people refer to being sharp (position, or opening "territory"). Also, that might be included above, already in that SF is looking far at times to it scoring attribution to the current position.

This last bit, I was thinking, that maybe opening sharpness, having its consequences being visible, say in endgame or a lot deeper than one can calculate or have experience digested about, might make some people call that sharp or narrow choice of move early.

I am Just brainstorming, because I am not sure there is a well-defined version yet of sharpness in the community (not yours, it is a computer prototype for sure). I may be wrong about the community. as well. this is an impression that I have been having for a while. If I am right, it might be more inclusive and inviting people to bring in their own experience of the word, if there was room for that, by making Sharpness_wdl, or Sharpness_cp or as many candidates as one can formulate with analytical chess tools, so that further dataset based analysis could allow various strains of the internal version of the word in each person, already having some notion of what it is (like agreeing with someone else often enough about this or that being sharp). One could confront their own on-board sense of that with what your prototype definitions would suggest.

otherwise, you might be reducing your room for maneuvering. Or are you convinced that your last blog definition is consensus?
I suppose this is one of the methods for anti cheating. Everyone would have their standard cpl distribution. If suddenly one deviates significantly from their standard cpl distribution, red flags are raised.
This is also somewhat available in lichess insights, accuracy per number of moves.
It might be easier to read your CPL graphs with some sort of smoothing / density. You say that 30-130 is roughly constant but to my eye it seems there are more around 50 than 130 for sure. Smoothing will help assess this by eye.

Cool work regardless!