Add more eval examples + filtering examples by language + fix git concurrent usage (#28719)

Release Notes:

- N/A

---------

Co-authored-by: michael <michael@zed.dev>
Co-authored-by: agus <agus@zed.dev>
This commit is contained in:
Thomas Mickley-Doyle 2025-04-14 17:05:46 -05:00 committed by GitHub
parent a8b1ef3531
commit d74f0735c2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
76 changed files with 365 additions and 8 deletions

View file

@ -0,0 +1,3 @@
url = "https://github.com/YuhangSong/Arena-Baselines.git"
revision = "801ed8566110ddc4a6ada0cc70171c636d78dbb8"
language_extension = "py"

View file

@ -0,0 +1,12 @@
1. README.md Features Section Reorganization
The features section has been reorganized into two subsections ("Baselines" and "Games") with markdown tables added. The previous bullet points were replaced with more structured content including supported/benchmarked status indicators. A new "Visualization" section was added with TensorBoard and port forwarding instructions.
2. Content Relocation and File Restructuring
The Tennis game documentation and action space details were moved from README.md to a new games.md file. The README was cleaned up by removing commented-out content and consolidating documentation sections. YAML config files (Benchmark-2T1P-Discrete.yaml and Test-Pong.yaml) were modified to replace `selfplay_recent_prob` with `playing_policy_load_recent_prob` and adjust population size options.
3. train.py Refactoring
Significant changes to train.py including:
- Renamed `selfplay_recent_prob` parameter to `playing_policy_load_recent_prob`
- Simplified the nested grid search structure by removing unnecessary loops
- Improved policy loading logic with better checkpoint path handling
- Enhanced error handling and logging for policy saving/reloading
- Removed redundant code and improved code organization
- Added more descriptive console output during policy operations

View file

@ -0,0 +1,13 @@
I need to refactor the multi-agent configuration system in our Arena-Baselines repository. The current policy_assignment parameter (self_play, independent) is too coarse. I want to replace it with a more flexible set of parameters to better support advanced training schemes like population-based training (PBT) and sophisticated self-play with historical opponents.
Specifically, I will introduce four new configuration parameters:
iterations_per_reload: Controls the frequency (in training iterations) at which policies are saved and potentially reloaded.
num_learning_policies: Explicitly defines how many agents use policies that are actively being trained (can be an integer or 'all').
selfplay_recent_prob: For non-learning agents (players), this determines the probability of loading the latest version of a learning policy versus loading a uniformly random historical version during reloads.
size_population: Specifies the number of distinct policy versions maintained for each learning agent, enabling PBT-style experiments.
To implement this, I will significantly modify train.py. This includes updating the argument parser, changing how experiment configurations are expanded (especially with grid_search), and implementing a new callback function (on_train_result). This callback will handle the periodic saving (using pickle) of learning policies to structured directories and the reloading of all policies (learning and playing) according to the new parameters (iterations_per_reload, selfplay_recent_prob, size_population). Playing policies will use deterministic actions.
I'll also reorganize the codebase by renaming arena/rllib_env.py to arena/arena.py and creating a new arena/utils.py file to house utility functions (like configuration helpers, ID generators, DeterministicCategorical) and constants.
Finally, I will update the example configuration files (Benchmark-2T1P-Discrete.yaml, Test-Pong.yaml) to remove policy_assignment and demonstrate the usage of the new parameters, including within grid_search.