Add new action to run agent eval (#29158)
The old one wasn't linking, and https://github.com/zed-industries/zed/pull/29081 has a bunch of merge conflicts. Wanted to start simple/small. ## Todo * [x] Remove low-signal examples * [x] Make the eval run on a cron, on main, and on any PR with the `run-eval` label * [x] Noise in logs about failure to write settings ``` [2025-04-21T20:45:04Z ERROR settings] Failed to write settings to file "/home/runner/.config/zed/settings.json" Caused by: No such file or directory (os error 2) at path "/home/runner/.config/zed/.tmpLewFEs" ``` * [x] `Agentic loop stalled` (https://github.com/zed-industries/zed/actions/runs/14581044243/job/40897622894) * [x] Make sure that events are recorded in snowflake * [ ] Change judge criteria to be more explicit about meanings of scores Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Agus Zubiaga <hi@aguz.me> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
This commit is contained in:
parent
b14356d1d3
commit
458ffaa134
58 changed files with 291 additions and 385 deletions
77
.github/workflows/eval.yml
vendored
Normal file
77
.github/workflows/eval.yml
vendored
Normal file
|
@ -0,0 +1,77 @@
|
|||
name: Run Agent Eval
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 * * * *"
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
- "v[0-9]+.[0-9]+.x"
|
||||
tags:
|
||||
- "v*"
|
||||
|
||||
pull_request:
|
||||
branches:
|
||||
- "**"
|
||||
types: [opened, synchronize, reopened, labeled]
|
||||
|
||||
workflow_dispatch:
|
||||
|
||||
concurrency:
|
||||
# Allow only one workflow per any non-`main` branch.
|
||||
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
CARGO_TERM_COLOR: always
|
||||
CARGO_INCREMENTAL: 0
|
||||
RUST_BACKTRACE: 1
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ZED_CLIENT_CHECKSUM_SEED: ${{ secrets.ZED_CLIENT_CHECKSUM_SEED }}
|
||||
ZED_EVAL_TELEMETRY: 1
|
||||
|
||||
jobs:
|
||||
run_eval:
|
||||
timeout-minutes: 60
|
||||
name: Run Agent Eval
|
||||
if: >
|
||||
github.repository_owner == 'zed-industries' &&
|
||||
(github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'run-eval'))
|
||||
runs-on:
|
||||
- buildjet-16vcpu-ubuntu-2204
|
||||
steps:
|
||||
- name: Add Rust to the PATH
|
||||
run: echo "$HOME/.cargo/bin" >> $GITHUB_PATH
|
||||
|
||||
- name: Checkout repo
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
|
||||
with:
|
||||
clean: false
|
||||
|
||||
- name: Cache dependencies
|
||||
uses: swatinem/rust-cache@9d47c6ad4b02e050fd481d890b2ea34778fd09d6 # v2
|
||||
with:
|
||||
save-if: ${{ github.ref == 'refs/heads/main' }}
|
||||
cache-provider: "buildjet"
|
||||
|
||||
- name: Install Linux dependencies
|
||||
run: ./script/linux
|
||||
|
||||
- name: Configure CI
|
||||
run: |
|
||||
mkdir -p ./../.cargo
|
||||
cp ./.cargo/ci-config.toml ./../.cargo/config.toml
|
||||
|
||||
- name: Compile eval
|
||||
run: cargo build --package=eval
|
||||
|
||||
- name: Run eval
|
||||
run: cargo run --package=eval
|
||||
|
||||
# Even the Linux runner is not stateful, in theory there is no need to do this cleanup.
|
||||
# But, to avoid potential issues in the future if we choose to use a stateful Linux runner and forget to add code
|
||||
# to clean up the config file, I’ve included the cleanup code here as a precaution.
|
||||
# While it’s not strictly necessary at this moment, I believe it’s better to err on the side of caution.
|
||||
- name: Clean CI config file
|
||||
if: always()
|
||||
run: rm -rf ./../.cargo
|
28
.github/workflows/run_agent_eval_daily.yml
vendored
28
.github/workflows/run_agent_eval_daily.yml
vendored
|
@ -1,28 +0,0 @@
|
|||
name: Run Eval Daily
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 2 * * *"
|
||||
workflow_dispatch:
|
||||
|
||||
env:
|
||||
CARGO_TERM_COLOR: always
|
||||
CARGO_INCREMENTAL: 0
|
||||
RUST_BACKTRACE: 1
|
||||
|
||||
jobs:
|
||||
run_eval:
|
||||
name: Run Eval
|
||||
if: github.repository_owner == 'zed-industries'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout repo
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
|
||||
with:
|
||||
clean: false
|
||||
|
||||
- name: Setup Rust
|
||||
uses: dtolnay/rust-toolchain@stable
|
||||
|
||||
- name: Run cargo eval
|
||||
run: cargo run -p eval
|
Loading…
Add table
Add a link
Reference in a new issue