Add more eval examples + filtering examples by language + fix git concurrent usage (#28719)
Release Notes: - N/A --------- Co-authored-by: michael <michael@zed.dev> Co-authored-by: agus <agus@zed.dev>
This commit is contained in:
parent
a8b1ef3531
commit
d74f0735c2
76 changed files with 365 additions and 8 deletions
3
crates/eval/examples/docs_restructure/base.toml
Normal file
3
crates/eval/examples/docs_restructure/base.toml
Normal file
|
@ -0,0 +1,3 @@
|
|||
url = "https://github.com/YuhangSong/Arena-Baselines.git"
|
||||
revision = "801ed8566110ddc4a6ada0cc70171c636d78dbb8"
|
||||
language_extension = "py"
|
12
crates/eval/examples/docs_restructure/criteria.md
Normal file
12
crates/eval/examples/docs_restructure/criteria.md
Normal file
|
@ -0,0 +1,12 @@
|
|||
1. README.md Features Section Reorganization
|
||||
The features section has been reorganized into two subsections ("Baselines" and "Games") with markdown tables added. The previous bullet points were replaced with more structured content including supported/benchmarked status indicators. A new "Visualization" section was added with TensorBoard and port forwarding instructions.
|
||||
2. Content Relocation and File Restructuring
|
||||
The Tennis game documentation and action space details were moved from README.md to a new games.md file. The README was cleaned up by removing commented-out content and consolidating documentation sections. YAML config files (Benchmark-2T1P-Discrete.yaml and Test-Pong.yaml) were modified to replace `selfplay_recent_prob` with `playing_policy_load_recent_prob` and adjust population size options.
|
||||
3. train.py Refactoring
|
||||
Significant changes to train.py including:
|
||||
- Renamed `selfplay_recent_prob` parameter to `playing_policy_load_recent_prob`
|
||||
- Simplified the nested grid search structure by removing unnecessary loops
|
||||
- Improved policy loading logic with better checkpoint path handling
|
||||
- Enhanced error handling and logging for policy saving/reloading
|
||||
- Removed redundant code and improved code organization
|
||||
- Added more descriptive console output during policy operations
|
13
crates/eval/examples/docs_restructure/prompt.md
Normal file
13
crates/eval/examples/docs_restructure/prompt.md
Normal file
|
@ -0,0 +1,13 @@
|
|||
I need to refactor the multi-agent configuration system in our Arena-Baselines repository. The current policy_assignment parameter (self_play, independent) is too coarse. I want to replace it with a more flexible set of parameters to better support advanced training schemes like population-based training (PBT) and sophisticated self-play with historical opponents.
|
||||
|
||||
Specifically, I will introduce four new configuration parameters:
|
||||
|
||||
iterations_per_reload: Controls the frequency (in training iterations) at which policies are saved and potentially reloaded.
|
||||
num_learning_policies: Explicitly defines how many agents use policies that are actively being trained (can be an integer or 'all').
|
||||
selfplay_recent_prob: For non-learning agents (players), this determines the probability of loading the latest version of a learning policy versus loading a uniformly random historical version during reloads.
|
||||
size_population: Specifies the number of distinct policy versions maintained for each learning agent, enabling PBT-style experiments.
|
||||
To implement this, I will significantly modify train.py. This includes updating the argument parser, changing how experiment configurations are expanded (especially with grid_search), and implementing a new callback function (on_train_result). This callback will handle the periodic saving (using pickle) of learning policies to structured directories and the reloading of all policies (learning and playing) according to the new parameters (iterations_per_reload, selfplay_recent_prob, size_population). Playing policies will use deterministic actions.
|
||||
|
||||
I'll also reorganize the codebase by renaming arena/rllib_env.py to arena/arena.py and creating a new arena/utils.py file to house utility functions (like configuration helpers, ID generators, DeterministicCategorical) and constants.
|
||||
|
||||
Finally, I will update the example configuration files (Benchmark-2T1P-Discrete.yaml, Test-Pong.yaml) to remove policy_assignment and demonstrate the usage of the new parameters, including within grid_search.
|
Loading…
Add table
Add a link
Reference in a new issue