agent: Migrate thread storage to SQLite with zstd compression (#31741)

Previously, LMDB was used for storing threads, but it consumed excessive
disk space and was capped at 1GB.

This change migrates thread storage to an SQLite database. Thread JSON
objects are now compressed using zstd.

I considered training a custom zstd dictionary and storing it in a
separate table. However, the additional complexity outweighed the modest
space savings (up to 20%). I ended up using the default dictionary
stored with data.

Threads can be exported relatively easily from outside the application:

```
$ sqlite3 threads.db "SELECT hex(data) FROM threads LIMIT 5;" |
    xxd -r -p |
    zstd -d |
    fx
```

Benchmarks:
- Original heed database: 200MB
- Sqlite uncompressed: 51MB
- sqlite compressed (this PR): 4.0MB
- sqlite compressed with a trained dictionary: 3.8MB


Release Notes:

- Migrated thread storage to SQLite with compression
This commit is contained in:
Oleksiy Syvokon 2025-06-02 17:01:34 +03:00 committed by GitHub
parent 9a9e96ed5a
commit c874f1fa9d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 215 additions and 56 deletions

View file

@ -46,6 +46,7 @@ git.workspace = true
gpui.workspace = true
heed.workspace = true
html_to_markdown.workspace = true
indoc.workspace = true
http_client.workspace = true
indexed_docs.workspace = true
inventory.workspace = true
@ -78,6 +79,7 @@ serde_json.workspace = true
serde_json_lenient.workspace = true
settings.workspace = true
smol.workspace = true
sqlez.workspace = true
streaming_diff.workspace = true
telemetry.workspace = true
telemetry_events.workspace = true
@ -97,6 +99,7 @@ workspace-hack.workspace = true
workspace.workspace = true
zed_actions.workspace = true
zed_llm_client.workspace = true
zstd.workspace = true
[dev-dependencies]
buffer_diff = { workspace = true, features = ["test-support"] }