Improve fuzzy match performance and fix corner case that omits results (#22524)

* Removes `max_results` from the matcher interface as this is better
dealt with in consumers once all results are known. The current
implementation was quite inefficient as it was using binary search to
find insertion points and then doing an insert which copies the entire
suffix each time.

* There was a corner case where if the binary search found a match
candidate with the same score, it was dropped. Now fixed.

* Uses of `util::extend_sorted` when merging results from worker threads
also repeatedly uses binary search and insertion which copies the entire
suffix. A followup will remove that and its usage.

* Adds `util::truncate_to_bottom_n_sorted_by` which uses quickselect +
sort to efficiently get a sorted count limited result.

* Improves interface of Matcher::match_candidates by providing the match
positions to the build function. This allows for removal of the `Match`
trait. It also fixes a bug where the Match's own Ord wasn't being used,
which seems relevant to PathMatch for cases where scores are the same.

Release Notes:

- N/A
This commit is contained in:
Michael Sloan 2024-12-31 13:56:23 -07:00 committed by GitHub
parent f912c545e7
commit 6ef5d8f748
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 50 additions and 96 deletions

View file

@ -8,7 +8,6 @@ pub mod test;
use anyhow::{anyhow, Context as _, Result};
use futures::Future;
use itertools::Either;
use regex::Regex;
use std::sync::{LazyLock, OnceLock};
@ -111,6 +110,27 @@ where
}
}
pub fn truncate_to_bottom_n_sorted_by<T, F>(items: &mut Vec<T>, limit: usize, compare: &F)
where
F: Fn(&T, &T) -> Ordering,
{
if limit == 0 {
items.truncate(0);
}
if items.len() < limit {
return;
}
// When limit is near to items.len() it may be more efficient to sort the whole list and
// truncate, rather than always doing selection first as is done below. It's hard to analyze
// where the threshold for this should be since the quickselect style algorithm used by
// `select_nth_unstable_by` makes the prefix partially sorted, and so its work is not wasted -
// the expected number of comparisons needed by `sort_by` is less than it is for some arbitrary
// unsorted input.
items.select_nth_unstable_by(limit, compare);
items.truncate(limit);
items.sort_by(compare);
}
#[cfg(unix)]
pub fn load_shell_from_passwd() -> Result<()> {
let buflen = match unsafe { libc::sysconf(libc::_SC_GETPW_R_SIZE_MAX) } {