Add more eval examples + filtering examples by language + fix git concurrent usage (#28719)

Release Notes: - N/A --------- Co-authored-by: michael <michael@zed.dev> Co-authored-by: agus <agus@zed.dev>
2025-04-14 17:05:46 -05:00 · 2025-04-14 17:05:46 -05:00 · d74f0735c2
commit d74f0735c2
parent a8b1ef3531
76 changed files with 365 additions and 8 deletions
--- a/crates/eval/examples/docs_restructure/base.toml
+++ b/crates/eval/examples/docs_restructure/base.toml
@ -0,0 +1,3 @@
+url = "https://github.com/YuhangSong/Arena-Baselines.git"
+revision = "801ed8566110ddc4a6ada0cc70171c636d78dbb8"
+language_extension = "py"
--- a/crates/eval/examples/docs_restructure/criteria.md
+++ b/crates/eval/examples/docs_restructure/criteria.md
@ -0,0 +1,12 @@
+1. README.md Features Section Reorganization
+The features section has been reorganized into two subsections ("Baselines" and "Games") with markdown tables added. The previous bullet points were replaced with more structured content including supported/benchmarked status indicators. A new "Visualization" section was added with TensorBoard and port forwarding instructions.
+2. Content Relocation and File Restructuring
+The Tennis game documentation and action space details were moved from README.md to a new games.md file. The README was cleaned up by removing commented-out content and consolidating documentation sections. YAML config files (Benchmark-2T1P-Discrete.yaml and Test-Pong.yaml) were modified to replace `selfplay_recent_prob` with `playing_policy_load_recent_prob` and adjust population size options.
+3. train.py Refactoring
+Significant changes to train.py including:
+- Renamed `selfplay_recent_prob` parameter to `playing_policy_load_recent_prob`
+- Simplified the nested grid search structure by removing unnecessary loops
+- Improved policy loading logic with better checkpoint path handling
+- Enhanced error handling and logging for policy saving/reloading
+- Removed redundant code and improved code organization
+- Added more descriptive console output during policy operations
--- a/crates/eval/examples/docs_restructure/prompt.md
+++ b/crates/eval/examples/docs_restructure/prompt.md
@ -0,0 +1,13 @@
+I need to refactor the multi-agent configuration system in our Arena-Baselines repository. The current policy_assignment parameter (self_play, independent) is too coarse. I want to replace it with a more flexible set of parameters to better support advanced training schemes like population-based training (PBT) and sophisticated self-play with historical opponents.
+
+Specifically, I will introduce four new configuration parameters:
+
+iterations_per_reload: Controls the frequency (in training iterations) at which policies are saved and potentially reloaded.
+num_learning_policies: Explicitly defines how many agents use policies that are actively being trained (can be an integer or 'all').
+selfplay_recent_prob: For non-learning agents (players), this determines the probability of loading the latest version of a learning policy versus loading a uniformly random historical version during reloads.
+size_population: Specifies the number of distinct policy versions maintained for each learning agent, enabling PBT-style experiments.
+To implement this, I will significantly modify train.py. This includes updating the argument parser, changing how experiment configurations are expanded (especially with grid_search), and implementing a new callback function (on_train_result). This callback will handle the periodic saving (using pickle) of learning policies to structured directories and the reloading of all policies (learning and playing) according to the new parameters (iterations_per_reload, selfplay_recent_prob, size_population). Playing policies will use deterministic actions.
+
+I'll also reorganize the codebase by renaming arena/rllib_env.py to arena/arena.py and creating a new arena/utils.py file to house utility functions (like configuration helpers, ID generators, DeterministicCategorical) and constants.
+
+Finally, I will update the example configuration files (Benchmark-2T1P-Discrete.yaml, Test-Pong.yaml) to remove policy_assignment and demonstrate the usage of the new parameters, including within grid_search.