ZIm/crates/collab/src/user_backfiller.rs
Kirill Bulatov 0199eca289
Allow filling co-authors in the git panel's commit input (#23329)
https://github.com/user-attachments/assets/78db908e-cfe5-4803-b0dc-4f33bc457840


* starts to extract usernames out of `users/` GitHub API responses, and
pass those along with e-mails in the collab sessions as part of the
`User` data

* adjusts various prefill and seed test methods so that the new data can
be retrieved from GitHub properly

* if there's an active call, where guests have write permissions and
e-mails, allow to trigger `FillCoAuthors` action in the context of the
git panel, that will fill in `co-authored-by:` lines, using e-mail and
names (or GitHub handle names if name is absent)

* the action tries to not duplicate such entries, if any are present
already, and adds those below the rest of the commit input's text

Concerns:

* users with write permissions and no e-mails will be silently omitted
— adding odd entries that try to indicate this or raising pop-ups is
very intrusive (maybe, we can add `#`-prefixed comments?), logging seems
pointless

* it's not clear whether the data prefill will run properly on the
existing users — seems tolerable now, as it seems that we get e-mails
properly already, so we'll see GitHub handles instead of names in the
worst case. This can be prefilled better later.

* e-mails and names for a particular project may be not what the user
wants.
E.g. my `.gitconfig` has
```
[user]
    email = mail4score@gmail.com

# .....snip

[includeif "gitdir:**/work/zed/**/.git"]
    path = ~/.gitconfig.work
```

and that one has

```
[user]
    email = kirill@zed.dev
```

while my GitHub profile is configured so, that `mail4score@gmail.com` is
the public, commit e-mail.

So, when I'm a participant in a Zed session, wrong e-mail will be
picked.
The problem is, it's impossible for a host to get remote's collaborator
git metadata for a particular project, as that might not even exist on
disk for the client.

Seems that we might want to add some "project git URL <-> user name and
email" mapping in the settings(?).
The design of this is not very clear, so the PR concentrates on the
basics for now.

When https://github.com/zed-industries/zed/pull/23308 lands, most of the
issues can be solved by collaborators manually, before committing.

Release Notes:

- N/A
2025-01-18 22:57:17 +02:00

164 lines
5 KiB
Rust

use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
use chrono::{DateTime, Utc};
use util::ResultExt;
use crate::db::Database;
use crate::executor::Executor;
use crate::{AppState, Config};
pub fn spawn_user_backfiller(app_state: Arc<AppState>) {
let Some(user_backfiller_github_access_token) =
app_state.config.user_backfiller_github_access_token.clone()
else {
log::info!("no USER_BACKFILLER_GITHUB_ACCESS_TOKEN set; not spawning user backfiller");
return;
};
let executor = app_state.executor.clone();
executor.spawn_detached({
let executor = executor.clone();
async move {
let user_backfiller = UserBackfiller::new(
app_state.config.clone(),
user_backfiller_github_access_token,
app_state.db.clone(),
executor,
);
log::info!("backfilling users");
user_backfiller
.backfill_github_user_created_at()
.await
.log_err();
}
});
}
const GITHUB_REQUESTS_PER_HOUR_LIMIT: usize = 5_000;
const SLEEP_DURATION_BETWEEN_USERS: std::time::Duration = std::time::Duration::from_millis(
(GITHUB_REQUESTS_PER_HOUR_LIMIT as f64 / 60. / 60. * 1000.) as u64,
);
struct UserBackfiller {
config: Config,
github_access_token: Arc<str>,
db: Arc<Database>,
http_client: reqwest::Client,
executor: Executor,
}
impl UserBackfiller {
fn new(
config: Config,
github_access_token: Arc<str>,
db: Arc<Database>,
executor: Executor,
) -> Self {
Self {
config,
github_access_token,
db,
http_client: reqwest::Client::new(),
executor,
}
}
async fn backfill_github_user_created_at(&self) -> Result<()> {
let initial_channel_id = self.config.auto_join_channel_id;
let users_missing_github_user_created_at =
self.db.get_users_missing_github_user_created_at().await?;
for user in users_missing_github_user_created_at {
match self
.fetch_github_user(&format!(
"https://api.github.com/user/{}",
user.github_user_id
))
.await
{
Ok(github_user) => {
self.db
.get_or_create_user_by_github_account(
&user.github_login,
github_user.id,
user.email_address.as_deref(),
user.name.as_deref(),
github_user.created_at,
initial_channel_id,
)
.await?;
log::info!("backfilled user: {}", user.github_login);
}
Err(err) => {
log::error!("failed to fetch GitHub user {}: {err}", user.github_login);
}
}
self.executor.sleep(SLEEP_DURATION_BETWEEN_USERS).await;
}
Ok(())
}
async fn fetch_github_user(&self, url: &str) -> Result<GithubUser> {
let response = self
.http_client
.get(url)
.header(
"authorization",
format!("Bearer {}", self.github_access_token),
)
.header("user-agent", "zed")
.send()
.await
.with_context(|| format!("failed to fetch '{url}'"))?;
let rate_limit_remaining = response
.headers()
.get("x-ratelimit-remaining")
.and_then(|value| value.to_str().ok())
.and_then(|value| value.parse::<i32>().ok());
let rate_limit_reset = response
.headers()
.get("x-ratelimit-reset")
.and_then(|value| value.to_str().ok())
.and_then(|value| value.parse::<i64>().ok())
.and_then(|value| DateTime::from_timestamp(value, 0));
if rate_limit_remaining == Some(0) {
if let Some(reset_at) = rate_limit_reset {
let now = Utc::now();
if reset_at > now {
let sleep_duration = reset_at - now;
log::info!(
"rate limit reached. Sleeping for {} seconds",
sleep_duration.num_seconds()
);
self.executor.sleep(sleep_duration.to_std().unwrap()).await;
}
}
}
let response = match response.error_for_status() {
Ok(response) => response,
Err(err) => return Err(anyhow!("failed to fetch GitHub user: {err}")),
};
response
.json()
.await
.with_context(|| format!("failed to deserialize GitHub user from '{url}'"))
}
}
#[derive(serde::Deserialize)]
struct GithubUser {
id: i32,
created_at: DateTime<Utc>,
name: Option<String>,
}