ZIm/crates/eval/examples/optimizer_schema_refactor/diff_criteria.md
Nathan Sobo bab28560ef
Systematically optimize agentic editing performance (#28961)
Now that we've established a proper eval in tree, this PR is reboots of
our agent loop back to a set of minimal tools and simpler prompts. We
should aim to get this branch feeling subjectively competitive with
what's on main and then merge it, and build from there.

Let's invest in our eval and use it to drive better performance of the
agent loop. How you can help: Pick an example, and then make the outcome
faster or better. It's fine to even use your own subjective judgment, as
our evaluation criteria likely need tuning as well at this point. Focus
on making the agent work better in your own subjective experience first.
Let's focus on simple/practical improvements to make this thing work
better, then determine how we can craft our judgment criteria to lock
those improvements in.

Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Agus <agus@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-04-19 02:47:59 +00:00

1.3 KiB

  1. The changes involve renaming the TestData class to LabeledData across multiple files. This includes updating the import statements in __init__.py, cache.py, router.py, schema.py, and utils.py to reflect this new class name. The __all__ list in __init__.py is also updated to export LabeledData instead of TestData. This appears to be a conceptual renaming to better reflect the purpose of the data structure.
  2. The modifications update all function signatures and type hints that previously used TestData to now use LabeledData. This affects several functions in cache.py including _generate_run_cache, _eval_cache, and _grid_search_opt_cache, as well as functions in router.py like _generate_run_router and _eval_router. The utility functions in utils.py are also updated to work with LabeledData instead of TestData.
  3. The changes introduce a new search_step parameter in the router optimization logic within router.py, with a default value of 0.10. This parameter is passed through to the _router_random_search function and is used in the optimization process. The test file test_threshold_optimizer.py is updated to explicitly set this parameter to 0.5 when calling the optimize method, demonstrating how it can be configured for different search granularities during threshold optimization.