Re: Leptir
Posted: Mon Feb 26, 2024 9:18 pm
Test Big Leptir 260224 (Dimension 3072) vs Stockfish 16.1.
First of all: I previously tested Leptir 260224 (Dimension 2560) against Stockfish 16.1 under the same conditions, Leptir won +1.
I don't believe in testing with UHO variants (Unbalanced Human Openings by Stefan Pohl). I downloaded a database once, and in every position one side had an advantage of about +1. So I ask myself, what is being tested here? In my opinion, only the tactical power of an engine in positions that have already been almost won. But chess is not just tactics (Vasik Rajlich wrote this sentence many years ago when I was testing Rybka). Stockfish 16.1 now even uses a special small NNUE network that is trained with tactics. It is 20x smaller than the large network (and therefore significantly faster), and kicks in when positions are won. It is possible that only this small network can be tested with the UHO variants.
I prefer to stick with variants that occur in practice. For me these are ENG games on the servers and correspondence chess. When it comes to correspondence chess, I am in contact with some top players and know some of the variations. I myself play on Lichess and PlayChess and have also played in the Infinity Engine Masters and Freestyle Chess.
What is tested with UHO variants has nothing to do with any of this! However, I believed that Stockfish 16.1 would be the best bullet engine. Most tests are carried out at 60s + 0.6s. Yes, the developers are now also testing with the UHO Openings.
So I recently wanted to see how much better Stockfish 16.1 Bullet plays than "Big Leptir", which I made specifically for correspondence chess.
For this purpose, I put together 50 variants (a long time ago) that are played with color swapping. In a few variations, one side has a +1 advantage. In other variants I did NOT choose the best responses either. But most are playable.
Download all variants in CBH and PGN:
https://pixeldrain.com/u/oZvtuwtB
I carry out my tests under the Fritz GUI (currently Fritz 19). Ponder ON is important to me. This is how we play on all servers. If you play with ponder ON, the engine often takes more time for critical moves. With obvious moves, however, sometimes several moves are played in series in 0s if they are pondered. The quality of the games is better!
If you play under the Fritz GUI (also applies to PlayChess) the engine behaves differently with CTG books than with Polyglot books. CTG book moves are recognized as such and the engine moves more slowly. With the Polyglot book, all moves are always recognized as first moves and the engine moves quickly even after 50 book moves. That's why I use the slow mover on lichess and set it to value 150 in Bullet games.
How much better did Stockfish 16.1 play? When I looked at the screen after a few hours, it looked like this:
https://up.picr.de/47154126cy.png
Win 10, Ryzen 3900X
Cores = 4 cores/engine (5000 kns)
GUI = Fritz 19
Hash = 128MB/Engine
Slow mover = 100
Ponder = ON
all 3456men Syzygy
Time control = 60s + 1s
Openings = EN-Select with color swap
Final result:
A disastrous defeat for Stockfish 16.1. In a similar test before, a match ended 50:50. So I thought that Big Leptir 260224 would lose, the only question was how much? And then everything turns out differently than you think!
Download all games in PGN:
https://pixeldrain.com/u/fZ7VPDFj
A word about Big Leptir 260224:
I offered Leptir for download on my homepage (including the source code). Unfortunately, some people didn't follow the rules, which is why I removed the download. For example, someone downloaded the source code and then modified it to present their own “Leptir dev” version.
Photo: https://up.picr.de/47062339sj.png
This is not my engine, it's some fake one. Everyone can do whatever they want with the source code and call the engine whatever they want, just please don't fake it! If there is no source code, then nothing can be changed/faked.
For everyone who criticizes me (like in german css forums) because of my unimportant changes (really?) to the source code. Differences from Big Leptir 260224 to Stockfish 16.1:
* Network Dimension 3072
* "Simple Eval" changed for small network (less impact)
* Corchess code implemented
* LMR (last move reduction) changed
* “Eval Hash” implemented. For better ultra-long analyses. Stockfish does not use Eval Hash.
Eduard
https://solistachess.jimdosite.com/
First of all: I previously tested Leptir 260224 (Dimension 2560) against Stockfish 16.1 under the same conditions, Leptir won +1.
I don't believe in testing with UHO variants (Unbalanced Human Openings by Stefan Pohl). I downloaded a database once, and in every position one side had an advantage of about +1. So I ask myself, what is being tested here? In my opinion, only the tactical power of an engine in positions that have already been almost won. But chess is not just tactics (Vasik Rajlich wrote this sentence many years ago when I was testing Rybka). Stockfish 16.1 now even uses a special small NNUE network that is trained with tactics. It is 20x smaller than the large network (and therefore significantly faster), and kicks in when positions are won. It is possible that only this small network can be tested with the UHO variants.
I prefer to stick with variants that occur in practice. For me these are ENG games on the servers and correspondence chess. When it comes to correspondence chess, I am in contact with some top players and know some of the variations. I myself play on Lichess and PlayChess and have also played in the Infinity Engine Masters and Freestyle Chess.
What is tested with UHO variants has nothing to do with any of this! However, I believed that Stockfish 16.1 would be the best bullet engine. Most tests are carried out at 60s + 0.6s. Yes, the developers are now also testing with the UHO Openings.
So I recently wanted to see how much better Stockfish 16.1 Bullet plays than "Big Leptir", which I made specifically for correspondence chess.
For this purpose, I put together 50 variants (a long time ago) that are played with color swapping. In a few variations, one side has a +1 advantage. In other variants I did NOT choose the best responses either. But most are playable.
Download all variants in CBH and PGN:
https://pixeldrain.com/u/oZvtuwtB
I carry out my tests under the Fritz GUI (currently Fritz 19). Ponder ON is important to me. This is how we play on all servers. If you play with ponder ON, the engine often takes more time for critical moves. With obvious moves, however, sometimes several moves are played in series in 0s if they are pondered. The quality of the games is better!
If you play under the Fritz GUI (also applies to PlayChess) the engine behaves differently with CTG books than with Polyglot books. CTG book moves are recognized as such and the engine moves more slowly. With the Polyglot book, all moves are always recognized as first moves and the engine moves quickly even after 50 book moves. That's why I use the slow mover on lichess and set it to value 150 in Bullet games.
How much better did Stockfish 16.1 play? When I looked at the screen after a few hours, it looked like this:
https://up.picr.de/47154126cy.png
Win 10, Ryzen 3900X
Cores = 4 cores/engine (5000 kns)
GUI = Fritz 19
Hash = 128MB/Engine
Slow mover = 100
Ponder = ON
all 3456men Syzygy
Time control = 60s + 1s
Openings = EN-Select with color swap
Final result:
Code: Select all
Ryzen 3900X, Blitz 1.0min + 1.0sec
1 Big Leptir 260224-avx2 +17 +11/=83/-6 52.50% 52.5/100
2 Stockfish 16.1-avx2 -17 +6/=83/-11 47.50% 47.5/100
Download all games in PGN:
https://pixeldrain.com/u/fZ7VPDFj
A word about Big Leptir 260224:
I offered Leptir for download on my homepage (including the source code). Unfortunately, some people didn't follow the rules, which is why I removed the download. For example, someone downloaded the source code and then modified it to present their own “Leptir dev” version.
Photo: https://up.picr.de/47062339sj.png
This is not my engine, it's some fake one. Everyone can do whatever they want with the source code and call the engine whatever they want, just please don't fake it! If there is no source code, then nothing can be changed/faked.
For everyone who criticizes me (like in german css forums) because of my unimportant changes (really?) to the source code. Differences from Big Leptir 260224 to Stockfish 16.1:
* Network Dimension 3072
* "Simple Eval" changed for small network (less impact)
* Corchess code implemented
* LMR (last move reduction) changed
* “Eval Hash” implemented. For better ultra-long analyses. Stockfish does not use Eval Hash.
Eduard
https://solistachess.jimdosite.com/