Policy or Value ? Loss Function and Playing Strength in AlphaZero-like Self-play
Por um escritor misterioso
Last updated 15 março 2025

Results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a Deep Q-network, that is trained using self-play. The unified Deep Q-network has a policy-head and a value-head. In AlphaZero, during training, the optimization minimizes the sum of the policy loss and the value loss. However, it is not clear if and under which circumstances other formulations of the objective function are better. Therefore, in this paper, we perform experiments with combinations of these two optimization targets. Self-play is a computationally intensive method. By using small games, we are able to perform multiple test cases. We use a light-weight open source reimplementation of AlphaZero on two different games. We investigate optimizing the two targets independently, and also try different combinations (sum and product). Our results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Moreover, we find that care must be taken in computing the playing strength. Tournament Elo ratings differ from training Elo ratings—training Elo ratings, though cheap to compute and frequently reported, can be misleading and may lead to bias. It is currently not clear how these results transfer to more complex games and if there is a phase transition between our setting and the AlphaZero application to Go where the sum is seemingly the better choice.

Reinforcement learning is all you need, for next generation

AlphaGo/AlphaGoZero/AlphaZero/MuZero: Mastering games using

Superhuman Algorithm: MuZero Explained, by Ape Gainz

Policy or Value ? Loss Function and Playing Strength in AlphaZero

AlphaZero: A General Reinforcement Learning Algorithm that Masters

The Evolution of AlphaGo to MuZero, by Connor Shorten

Reimagining Chess with AlphaZero, February 2022

Student of Games: A unified learning algorithm for both perfect

Acquisition of Chess Knowledge in AlphaZero – arXiv Vanity

The future is here – AlphaZero learns chess

Representation Matters: The Game of Chess Poses a Challenge to
Recomendado para você
-
The future is here – AlphaZero learns chess15 março 2025
-
Acquisition of chess knowledge in AlphaZero15 março 2025
-
AlphaZero - Wikipedia15 março 2025
-
New AlphaZero (4050 Elo) Played Perfect Chess Against Stockfish 15.1, Gothamchess, AlphaZero15 março 2025
-
Are there any ways to calculate the rating difference between AlphaGo Zero and Leela Zero? · Issue #2576 · leela-zero/leela-zero · GitHub15 março 2025
-
4050 Elo Rating Performance of AlphaZero, AlphaZero Vs AlphaZero, Chess com, Gotham chess15 março 2025
-
DeepMind AlphaGo Zero learns on its own without meatbag intervention15 março 2025
-
Evans Gambit on The Highest Level15 março 2025
-
What is the ELO rating of AlphaGo Zero or AlphaZero in chess? - Quora15 março 2025
-
How to build your own AlphaZero AI using Python and Keras, by David Foster, Applied Data Science15 março 2025
você pode gostar
-
Adesivo personalizado honda fan cg 150 família do grau 2011-2015 premium plus - OLIGRAPHICS - Adesivo para Moto - Magazine Luiza15 março 2025
-
Jogo De Jarra Com Copos Lyor 7 Peças - Line15 março 2025
-
REVIEW, Call of the Night - Vols. 7 & 815 março 2025
-
Stockfish. Sabrefish on the Table. Stock Image - Image of healthy, sichel: 8257236515 março 2025
-
Kit Biscoito Passatempo E Bonno + Toddynho (3und) - Biscoito / Bolacha - Magazine Luiza15 março 2025
-
Phonecard: Megamente (Movistar, Venezuela(Mini Cards - Megamind15 março 2025
-
5 MOST DANGEROUS roblox hackers!15 março 2025
-
Kit Relação Cbx 200 Strada Com Retentor Starke - Vivemos Moto15 março 2025
-
EVENT] Lazr's MM2 - Roblox15 março 2025
-
Kilter Anime, Berserk, Berserk anime 199715 março 2025