Reduce memory usage to represent buffers by up to 50% (#10321)

This should help with some of the memory problems reported in https://github.com/zed-industries/zed/issues/8436, especially the ones related to large files (see: https://github.com/zed-industries/zed/issues/8436#issuecomment2037442695), by **reducing the memory required to represent a buffer in Zed by ~50%.** ### How? Zed's memory consumption is dominated by the in-memory representation of buffer contents. On the lowest level, the buffer is represented as a [Rope](https://en.wikipedia.org/wiki/Rope_(data_structure)) and that's where the most memory is used. The layers above — buffer, syntax map, fold map, display map, ... — basically use "no memory" compared to the Rope. Zed's `Rope` data structure is itself implemented as [a `SumTree` of `Chunks`](8205c52d2b/crates/rope/src/rope.rs (L35-L38)). An important constant at play here is `CHUNK_BASE`: `CHUNK_BASE` is the maximum length of a single text `Chunk` in the `SumTree` underlying the `Rope`. In other words: It determines into how many pieces a given buffer is split up. By changing `CHUNK_BASE` we can adjust the level of granularity withwhich we index a given piece of text. Theoretical maximum is the length of the text, theoretical minimum is 1. Sweet spot is somewhere inbetween, where memory use and performance of write & read access are optimal. We started with `16` as the `CHUNK_BASE`, but that wasn't the result of extensive benchmarks, more the first reasonable number that came to mind. ### What This changes `CHUNK_BASE` from `16` to `64`. That reduces the memory usage, trading it in for slight reduction in performance in certain benchmarks. ### Benchmarks I added a benchmark suite for `Rope` to determine whether we'd regress in performance as `CHUNK_BASE` goes up. I went from `16` to `32` and then to `64`. While `32` increased performance and reduced memory usage, `64` had one slight drop in performance, increases in other benchmarks and substantial memory savings. | `CHUNK_BASE` from `16` to `32` | `CHUNK_BASE` from `16` to `64` | |-------------------|--------------------| | ![chunk_base_16_to_32](https://github.com/zed-industries/zed/assets/1185253/fcf1f9c6-4f43-4e44-8ef5-29c1e5d8e2b9) | ![chunk_base_16_to_64](https://github.com/zed-industries/zed/assets/1185253/d82a0478-eeef-43d0-9240-e0aa9df8d946) | ### Real World Results We tested this by loading a 138 MB `*.tex` file (parsed as plain text) into Zed and measuring in `Instruments.app` the allocation. #### standard allocator Before, with `CHUNK_BASE: 16`, the memory usage was ~827MB after loading the buffer. | `CHUNK_BASE: 16` | |---------------------| | ![memory_consumption_chunk_base_16_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c1e04c34-7d1a-49fa-bb3c-6ad10aec6e26) | After, with `CHUNK_BASE: 64`, the memory usage was ~396MB after loading the buffer. | `CHUNK_BASE: 64` | |---------------------| | ![memory_consumption_chunk_base_64_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c728e134-1846-467f-b20f-114a582c7b5a) | #### `mimalloc` `MiMalloc` by default and that seems to be pretty aggressive when it comes to growing memory. Whereas the std allocator would go up to ~800mb, MiMalloc would jump straight to 1024MB. I also can't get `MiMalloc` to work properly with `Instruments.app` (it always shows 15MB of memory usage) so I had to use these `Activity Monitor` screenshots: | `CHUNK_BASE: 16` | |---------------------| | ![memory_consumption_chunk_base_16_mimalloc](https://github.com/zed-industries/zed/assets/1185253/1e6e05e9-80c2-4ec7-9b0e-8a6fa78836eb) | | `CHUNK_BASE: 64` | |---------------------| | ![memory_consumption_chunk_base_64_mimalloc](https://github.com/zed-industries/zed/assets/1185253/8a47e982-a675-4db0-b690-d60f1ff9acc8) | ### Release Notes Release Notes: - Reduced memory usage for files by up to 50%. --------- Co-authored-by: Antonio <antonio@zed.dev>
2024-04-09 18:07:53 +02:00 · 2024-04-09 18:07:53 +02:00 · 0533923f91
commit 0533923f91
parent b6857ca469
4 changed files with 286 additions and 1 deletions
--- a/crates/rope/Cargo.toml
+++ b/crates/rope/Cargo.toml
@ -23,3 +23,8 @@ util.workspace = true
 gpui = { workspace = true, features = ["test-support"] }
 rand.workspace = true
 util = { workspace = true, features = ["test-support"] }
+criterion = { version = "0.4", features = ["html_reports"] }
+
+[[bench]]
+name = "rope_benchmark"
+harness = false
--- a/crates/rope/benches/rope_benchmark.rs
+++ b/crates/rope/benches/rope_benchmark.rs
@ -0,0 +1,144 @@
+use std::ops::Range;
+
+use criterion::{criterion_group, criterion_main, BatchSize, BenchmarkId, Criterion, Throughput};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rope::Rope;
+use util::RandomCharIter;
+
+fn generate_random_text(mut rng: StdRng, text_len: usize) -> String {
+    RandomCharIter::new(&mut rng).take(text_len).collect()
+}
+
+fn generate_random_rope(rng: StdRng, text_len: usize) -> Rope {
+    let text = generate_random_text(rng, text_len);
+    let mut rope = Rope::new();
+    rope.push(&text);
+    rope
+}
+
+fn generate_random_rope_ranges(mut rng: StdRng, rope: &Rope) -> Vec<Range<usize>> {
+    let range_max_len = 50;
+    let num_ranges = rope.len() / range_max_len;
+
+    let mut ranges = Vec::new();
+    let mut start = 0;
+    for _ in 0..num_ranges {
+        let range_start = rope.clip_offset(
+            rng.gen_range(start..=(start + range_max_len)),
+            sum_tree::Bias::Left,
+        );
+        let range_end = rope.clip_offset(
+            rng.gen_range(range_start..(range_start + range_max_len)),
+            sum_tree::Bias::Right,
+        );
+
+        let range = range_start..range_end;
+        if !range.is_empty() {
+            ranges.push(range);
+        }
+
+        start = range_end + 1;
+    }
+
+    ranges
+}
+
+fn rope_benchmarks(c: &mut Criterion) {
+    static SEED: u64 = 9999;
+    static KB: usize = 1024;
+
+    let rng = StdRng::seed_from_u64(SEED);
+    let sizes = [4 * KB, 64 * KB];
+
+    let mut group = c.benchmark_group("push");
+    for size in sizes.iter() {
+        group.throughput(Throughput::Bytes(*size as u64));
+        group.bench_with_input(BenchmarkId::from_parameter(size), &size, |b, &size| {
+            let text = generate_random_text(rng.clone(), *size);
+
+            b.iter(|| {
+                let mut rope = Rope::new();
+                for _ in 0..10 {
+                    rope.push(&text);
+                }
+            });
+        });
+    }
+    group.finish();
+
+    let mut group = c.benchmark_group("append");
+    for size in sizes.iter() {
+        group.throughput(Throughput::Bytes(*size as u64));
+        group.bench_with_input(BenchmarkId::from_parameter(size), &size, |b, &size| {
+            let mut random_ropes = Vec::new();
+            for _ in 0..5 {
+                random_ropes.push(generate_random_rope(rng.clone(), *size));
+            }
+
+            b.iter(|| {
+                let mut rope_b = Rope::new();
+                for rope in &random_ropes {
+                    rope_b.append(rope.clone())
+                }
+            });
+        });
+    }
+    group.finish();
+
+    let mut group = c.benchmark_group("slice");
+    for size in sizes.iter() {
+        group.throughput(Throughput::Bytes(*size as u64));
+        group.bench_with_input(BenchmarkId::from_parameter(size), &size, |b, &size| {
+            let rope = generate_random_rope(rng.clone(), *size);
+
+            b.iter_batched(
+                || generate_random_rope_ranges(rng.clone(), &rope),
+                |ranges| {
+                    for range in ranges.iter() {
+                        rope.slice(range.clone());
+                    }
+                },
+                BatchSize::SmallInput,
+            );
+        });
+    }
+    group.finish();
+
+    let mut group = c.benchmark_group("bytes_in_range");
+    for size in sizes.iter() {
+        group.throughput(Throughput::Bytes(*size as u64));
+        group.bench_with_input(BenchmarkId::from_parameter(size), &size, |b, &size| {
+            let rope = generate_random_rope(rng.clone(), *size);
+
+            b.iter_batched(
+                || generate_random_rope_ranges(rng.clone(), &rope),
+                |ranges| {
+                    for range in ranges.iter() {
+                        let bytes = rope.bytes_in_range(range.clone());
+                        assert!(bytes.into_iter().count() > 0);
+                    }
+                },
+                BatchSize::SmallInput,
+            );
+        });
+    }
+    group.finish();
+
+    let mut group = c.benchmark_group("chars");
+    for size in sizes.iter() {
+        group.throughput(Throughput::Bytes(*size as u64));
+        group.bench_with_input(BenchmarkId::from_parameter(size), &size, |b, &size| {
+            let rope = generate_random_rope(rng.clone(), *size);
+
+            b.iter_with_large_drop(|| {
+                let chars = rope.chars().count();
+                assert!(chars > 0);
+            });
+        });
+    }
+    group.finish();
+}
+
+criterion_group!(benches, rope_benchmarks);
+criterion_main!(benches);
--- a/crates/rope/src/rope.rs
+++ b/crates/rope/src/rope.rs
@ -23,7 +23,7 @@ pub use unclipped::Unclipped;
 const CHUNK_BASE: usize = 6;

 #[cfg(not(test))]
-const CHUNK_BASE: usize = 16;
+const CHUNK_BASE: usize = 64;

 /// Type alias to [`HashMatrix`], an implementation of a homomorphic hash function. Two [`Rope`] instances
 /// containing the same text will produce the same fingerprint. This hash function is special in that