History

Agus Zubiaga 43cb925a59 ai: Separate model settings for each feature (#28088 ) Closes: https://github.com/zed-industries/zed/issues/20582 Allows users to select a specific model for each AI-powered feature: - Agent panel - Inline assistant - Thread summarization - Commit message generation If unspecified for a given feature, it will use the `default_model` setting. Release Notes: - Added support for configuring a specific model for each AI-powered feature --------- Co-authored-by: Danilo Leal <daniloleal09@gmail.com> Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>		2025-04-04 11:40:55 -03:00
..
src	ai: Separate model settings for each feature (#28088 )	2025-04-04 11:40:55 -03:00
build.rs	Switch fully to Rust Livekit (redux) (#27126 )	2025-03-28 17:58:23 +00:00
Cargo.toml	assistant_eval: Add ACE framework (#27181 )	2025-04-02 23:02:06 -05:00
LICENSE-GPL	Add initial implementation of evaluating changes generated by the assistant (#26799 )	2025-03-14 23:10:25 +00:00
README.md	assistant_eval: Add ACE framework (#27181 )	2025-04-02 23:02:06 -05:00

README.md

Tool Evals

A framework for evaluating and benchmarking the agent panel generations.

Overview

Tool Evals provides a headless environment for running assistants evaluations on code repositories. It automates the process of:

Setting up test code and repositories
Sending prompts to language models
Allowing the assistant to use tools to modify code
Collecting metrics on performance and tool usage
Evaluating results against known good solutions

How It Works

The system consists of several key components:

Eval: Loads exercises from the zed-ace-framework repository, creates temporary repos, and executes evaluations
HeadlessAssistant: Provides a headless environment for running the AI assistant
Judge: Evaluates AI-generated solutions against reference implementations and assigns scores
Templates: Defines evaluation frameworks for different tasks (Project Creation, Code Modification, Conversational Guidance)

Setup Requirements

Prerequisites

Rust and Cargo
Git
Python (for report generation)
Network access to clone repositories
Appropriate API keys for language models and git services (Anthropic, GitHub, etc.)

Environment Variables

Ensure you have the required API keys set, either from a dev run of Zed or via these environment variables:

ZED_ANTHROPIC_API_KEY for Claude models
ZED_GITHUB_API_KEY for GitHub API (or similar)

Usage

Running Evaluations

# Run all tests
cargo run -p assistant_eval -- --all

# Run only specific languages
cargo run -p assistant_eval -- --all --languages python,rust

# Limit concurrent evaluations
cargo run -p assistant_eval -- --all --concurrency 5

# Limit number of exercises per language
cargo run -p assistant_eval -- --all --max-exercises-per-language 3

Evaluation Template Types

The system supports three types of evaluation templates:

ProjectCreation: Tests the model's ability to create new implementations from scratch
CodeModification: Tests the model's ability to modify existing code to meet new requirements
ConversationalGuidance: Tests the model's ability to provide guidance without writing code

Support Repo

The zed-industries/zed-ace-framework contains the analytics and reporting scripts.