
Now that we've established a proper eval in tree, this PR is reboots of our agent loop back to a set of minimal tools and simpler prompts. We should aim to get this branch feeling subjectively competitive with what's on main and then merge it, and build from there. Let's invest in our eval and use it to drive better performance of the agent loop. How you can help: Pick an example, and then make the outcome faster or better. It's fine to even use your own subjective judgment, as our evaluation criteria likely need tuning as well at this point. Focus on making the agent work better in your own subjective experience first. Let's focus on simple/practical improvements to make this thing work better, then determine how we can craft our judgment criteria to lock those improvements in. Release Notes: - N/A --------- Co-authored-by: Max <max@zed.dev> Co-authored-by: Antonio <antonio@zed.dev> Co-authored-by: Agus <agus@zed.dev> Co-authored-by: Richard <richard@zed.dev> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Michael Sloan <mgsloan@gmail.com>
1.3 KiB
1.3 KiB
- The changes involve renaming the
TestData
class toLabeledData
across multiple files. This includes updating the import statements in__init__.py
,cache.py
,router.py
,schema.py
, andutils.py
to reflect this new class name. The__all__
list in__init__.py
is also updated to exportLabeledData
instead ofTestData
. This appears to be a conceptual renaming to better reflect the purpose of the data structure. - The modifications update all function signatures and type hints that previously used
TestData
to now useLabeledData
. This affects several functions incache.py
including_generate_run_cache
,_eval_cache
, and_grid_search_opt_cache
, as well as functions inrouter.py
like_generate_run_router
and_eval_router
. The utility functions inutils.py
are also updated to work withLabeledData
instead ofTestData
. - The changes introduce a new
search_step
parameter in the router optimization logic withinrouter.py
, with a default value of 0.10. This parameter is passed through to the_router_random_search
function and is used in the optimization process. The test filetest_threshold_optimizer.py
is updated to explicitly set this parameter to 0.5 when calling the optimize method, demonstrating how it can be configured for different search granularities during threshold optimization.