This PR fixes some broken links that where found using
[lychee](https://github.com/lycheeverse/lychee/) as discussed today with
@JosephTLyons and @nathansobo at the RustNL hackathon. Using
[lychee-action](https://github.com/lycheeverse/lychee-action/) we can
scan for broken links daily to prevent issues in the future.
There are still 6 broken links that I didn't know how to fix myself.
See https://github.com/thomas-zahner/zed/actions/runs/15075808232 for
details.
## Missing images
```
Errors in ./docs/src/channels.md
[ERROR] file:///home/runner/work/zed/zed/docs/.gitbook/assets/channels-3.png | Cannot find file
[ERROR] file:///home/runner/work/zed/zed/docs/.gitbook/assets/channels-1.png | Cannot find file
[ERROR] file:///home/runner/work/zed/zed/docs/.gitbook/assets/channels-2.png | Cannot find file
```
These errors are showing up as missing images on
https://zed.dev/docs/channels
I tried to search the git history to see when or why they were deleted
but didn't find anything.
## ./crates/assistant_tools/src/edit_agent/evals/fixtures/zode/prompt.md
There are three errors in that file. I don't fully understand how these
issues were caused historically. Technically it would be possible to
ignore the files but of course if possible we should address the issues.
Release Notes:
- N/A
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Replace hardcoded 0.10 threshold with configurable parameter and set
0.05 default for most tests, with 0.2 for from_pixels_constructor
eval that produces more mismatched tags.
Release Notes:
- N/A
We run the unit evals once a day in the middle of the night, and trigger
a Slack post if it fails.
Release Notes:
- N/A
---------
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
Adds a validation step to docs preprocessing so that actions referenced
in docs are checked against the list of all registered actions in GPUI.
In order for this to work properly, all of the crates that register
actions had to be importable by the `docs_preprocessor` crate and
actually used (see [this
comment](ec16e70336 (diff-2674caf14ae6d70752ea60c7061232393d84e7f61a52915ace089c30a797a1c3))
for why this is challenging).
In order to accomplish this I have moved the entry point of zed into a
separate stub file named `zed_main.rs` so that `main.rs` is importable
by the `docs_preprocessor` crate, this is kind of gross, but ensures
that all actions that are registered in the application are registered
when checking them in `docs_preprocessor`. An alternative solution
suggested by @mikayla-maki was to separate out all our `::init()`
functions into a lib entry point in the `zed` crate that can be imported
instead, however, this turned out to be a far bigger refactor and is in
my opinion better to do in a follow up PR with significant testing to
ensure no regressions in behavior occur.
Release Notes:
- N/A
Allow colons after issue links and for them to in ul.
Change ci references from [self-hosted, test] to more explicit
[self-hosted, macOS]
Release Notes:
- N/A
---------
Co-authored-by: Ben Kunkle <ben.kunkle@gmail.com>
Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>
This should result in some additional cache hits as I personally use
garnix.
Also added `-v` cachix arg to try to figure out why CI jobs aren't
pushing any paths. Right now they just show ["Pushing is
disabled."](https://github.com/zed-industries/zed/actions/runs/15293723678/job/43018512167#step:13:3)
but I'm not sure if that's due to the `pushFilter` or misconfigured
secrets.
Release Notes:
- N/A
Now that the nix build is working again, re-enable nightly builds and
refactor the workflow for re-use between nightly releases and CI jobs.
Release Notes:
- N/A
---------
Co-authored-by: Rahul Butani <rrbutani@users.noreply.github.com>
This PR updates the Zed LLM provider to fetch the available models from
the server instead of hard-coding them in the binary.
Release Notes:
- Updated the Zed provider to fetch the list of available language
models from the server.
This is based on having observed that there is a lot of variation
between runs on `n=1` and `n=3`.
* With `n=8` two runs on the same branch give answers that seem close
enough to be reasonably consistent.
* With higher concurrency, trying to run this many repetitions seems to
lead language servers to time out a lot, causing evals to fail.
Release Notes:
- N/A
This PR adds a no-op job for the "Run Agent Eval" workflow.
This aims to avoid marking the check as failed on a PR that does not
include the `run-eval` label.
Release Notes:
- N/A
### Problem
We want to start continuously tracking our progress on agent evals over
time. As part of this, we'd like the *score* to have a clear,
interpretable meaning. Right now, it's a number from 0 to 5, but it's
not clear what any particular number works. In addition, scores vary
widely from run to run, because the agent's output is deterministic. We
try to stabilize the score using a panel of judges, but the behavior of
the agent itself varies much more widely than the judges' scores for a
given run.
### Solution
* **explicit meanings of scores** - In this PR, we're prescribing the
diff and thread criteria files so that they *must* be unordered lists of
assertions. For both the thread and the diff, rather than providing an
abstract score, the judge's task is simply to count how many of these
assertions are satisfied. A percentage score can be derived from this
number, divided by the total number of assertions.
* **repetitions** - Rather than running each example once, and judging
it N times, we'll **run** the example N times. Right now, I'm just
judging the output once per run, because I believe that with these more
clear scoring criteria, the main source of non-determinism will be the
*agent's* behavior, not the judge's
### Questions
* **accounting for diagnostic errors** - Previously, the judge was asked
to incorporate diagnostics into their abstract scores. Now that the
"score" is determined directly from the criteria, the diagnostic will
not be captured in the score. How should the diagnostics be accounted
for in the eval? One thought is - let's simply count and report the
number of errors remaining after the agent finishes, as a separate field
of the run (along with diff score and thread score). We could consider
normalizing it using the total lines of added code (like errors per 100
lines of code added) in order to give it some semblance of stability
between examples.
* **repetitions** - How many repetitions should we run on CI? Each
repetition takes significant time, but I think running more than one
repetition will make the scores significantly less volatile.
### Todo
* [x] Fix `--concurrency` implementation so that only N tasks are
spawned
* [x] Support `--repetitions` efficiently (re-using the same worktree)
* [x] Restructure judge prompts to count passing criteria, not compute
abstract score
* [x] Report total number of diagnostics in some way
* [x] Format output nicely
Release Notes:
- N/A *or* Added/Fixed/Improved ...
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
The old one wasn't linking, and
https://github.com/zed-industries/zed/pull/29081 has a bunch of merge
conflicts. Wanted to start simple/small.
## Todo
* [x] Remove low-signal examples
* [x] Make the eval run on a cron, on main, and on any PR with the
`run-eval` label
* [x] Noise in logs about failure to write settings
```
[2025-04-21T20:45:04Z ERROR settings] Failed to write settings to file
"/home/runner/.config/zed/settings.json"
Caused by:
No such file or directory (os error 2) at path
"/home/runner/.config/zed/.tmpLewFEs"
```
* [x] `Agentic loop stalled`
(https://github.com/zed-industries/zed/actions/runs/14581044243/job/40897622894)
* [x] Make sure that events are recorded in snowflake
* [ ] Change judge criteria to be more explicit about meanings of scores
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [actions/setup-node](https://redirect.github.com/actions/setup-node) |
action | digest | `cdca736` -> `49933ea` |
---
### Configuration
📅 **Schedule**: Branch creation - "after 3pm on Wednesday" in timezone
America/New_York, Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
Release Notes:
- N/A
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMzguMCIsInVwZGF0ZWRJblZlciI6IjM5LjIzOC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [actions/checkout](https://redirect.github.com/actions/checkout) |
action | pinDigest | -> `11bd719` |
---
### Configuration
📅 **Schedule**: Branch creation - "after 3pm on Wednesday" in timezone
America/New_York, Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
Release Notes:
- N/A
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMzguMCIsInVwZGF0ZWRJblZlciI6IjM5LjIzOC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Improve the logic in around release artifact bundling.
- Suppress a harmless "error: no such command: `about`" from
script/generate-licenes output
- Remove checks for main branch (which will never be true)
- Only run `Upload Artifacts to release` when not using `run-bundling`.
Prevents the creation of draft releases with just linux remote server binaries)
Release Notes:
- N/A
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
|
[actions/dependency-review-action](https://redirect.github.com/actions/dependency-review-action)
| action | digest | `3b139cf` -> `67d4f4b` |
---
### Configuration
📅 **Schedule**: Branch creation - "after 3pm on Wednesday" in timezone
America/New_York, Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
Release Notes:
- N/A
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMzguMCIsInVwZGF0ZWRJblZlciI6IjM5LjIzOC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Make various improvements to our github issue templates.
- Adjust line lengths to not wrap in constrained new issue view (85
cols) not just full screen view (95 columns)
- Remove reference to drag/drop logs to upload (recently multiple issues
with dead upload links)
- Cleanup list view
Release Notes:
- N/A
This adds a nix CI job to build the flake in debug mode for
aarch64-darwin and x86-linux. For now this job will only run when the
`run-nix` label is added to a PR.
The CI job doesn't push to cachix for now, so every build is a clean
build.
I also added a condition to the garbage collection step so it only runs
when the nix store is >50GB.
Release Notes:
- N/A
This adds a "workspace-hack" crate, see
[mozilla's](https://hg.mozilla.org/mozilla-central/file/3a265fdc9f33e5946f0ca0a04af73acd7e6d1a39/build/workspace-hack/Cargo.toml#l7)
for a concise explanation of why this is useful. For us in practice this
means that if I were to run all the tests (`cargo nextest r
--workspace`) and then `cargo r`, all the deps from the previous cargo
command will be reused. Before this PR it would rebuild many deps due to
resolving different sets of features for them. For me this frequently
caused long rebuilds when things "should" already be cached.
To avoid manually maintaining our workspace-hack crate, we will use
[cargo hakari](https://docs.rs/cargo-hakari) to update the build files
when there's a necessary change. I've added a step to CI that checks
whether the workspace-hack crate is up to date, and instructs you to
re-run `script/update-workspace-hack` when it fails.
Finally, to make sure that people can still depend on crates in our
workspace without pulling in all the workspace deps, we use a `[patch]`
section following [hakari's
instructions](https://docs.rs/cargo-hakari/0.9.36/cargo_hakari/patch_directive/index.html)
One possible followup task would be making guppy use our
`rust-toolchain.toml` instead of having to duplicate that list in its
config, I opened an issue for that upstream: guppy-rs/guppy#481.
TODO:
- [x] Fix the extension test failure
- [x] Ensure the dev dependencies aren't being unified by Hakari into
the main dependencies
- [x] Ensure that the remote-server binary continues to not depend on
LibSSL
Release Notes:
- N/A
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
- bump our livekit version to include a fix for a crane bug (TODO: add
link when an issue is filed on crane)
- switch to a clang stdenv for both linux and macos
- manually unify versions of our notify crate
- remove old linker flags which were only needed for livekit
- fix an issue where RUSTFLAGS shadowed the rustflags from cargo configs
Release Notes:
- N/A
Swift bindings BEGONE
Release Notes:
- Switched from using the Swift LiveKit bindings, to the Rust bindings,
fixing https://github.com/zed-industries/zed/issues/9396, a crash when
leaving a collaboration session, and making Zed easier to build.
---------
Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
Co-authored-by: Michael Sloan <michael@zed.dev>