ZIm/crates/language_models/src/provider
Elijah McMorris 52fa7ababb
lmstudio: Fill max_tokens using the response from /models (#25606)
The info for `max_tokens` for the model is included in
`{api_url}/models`
I don't think this needs to be `.clamp` like in
`crates/ollama/src/ollama.rs` `get_max_tokens`, but it might need to be

## Before:
Every model shows 2k

![image](https://github.com/user-attachments/assets/676075c8-0ceb-44b1-ae27-72ed6a6d783c)

## After:

![image](https://github.com/user-attachments/assets/8291535b-976e-4601-b617-1a508bf44e12)

### Json from `{api_url}/models` with model not loaded
```json
  {
      "id": "qwen2.5-coder-1.5b-instruct-mlx",
      "object": "model",
      "type": "llm",
      "publisher": "lmstudio-community",
      "arch": "qwen2",
      "compatibility_type": "mlx",
      "quantization": "4bit",
      "state": "not-loaded",
      "max_context_length": 32768
    },
```

## Notes
The response from `{api_url}/models` seems to return the `max_tokens`
for the model, not the currently configured context length, but I think
showing the `max_tokens` for the model is better than setting 2k for
everything

`loaded_context_length` exists, but only if the model is loaded at the
startup of zed, which usually isn't the case

maybe `fetch_models` should be rerun when swapping lmstudio models

### Currently configured context
this isn't shown in `{api_url}/models`

![image](https://github.com/user-attachments/assets/8511cb9d-914b-4065-9eba-c0b086ad253b)

### Json from `{api_url}/models` with model loaded
```json
  {
     "id": "qwen2.5-coder-1.5b-instruct-mlx",
      "object": "model",
      "type": "llm",
      "publisher": "lmstudio-community",
      "arch": "qwen2",
      "compatibility_type": "mlx",
      "quantization": "4bit",
      "state": "loaded",
      "max_context_length": 32768,
      "loaded_context_length": 4096
    },
```

Release Notes:

- lmstudio: Fixed showing `max_tokens` in the assistant panel

---------

Co-authored-by: Peter Tripp <peter@zed.dev>
2025-06-06 20:21:23 +00:00
..
anthropic.rs anthropic: Fix error when attaching multiple images (#32092) 2025-06-05 16:29:49 +00:00
bedrock.rs bedrock: Fix cross-region inference (#30659) 2025-06-03 15:46:35 +00:00
cloud.rs Add thinking budget for Gemini custom models (#31251) 2025-06-03 13:40:20 +02:00
copilot_chat.rs Add UI for configuring the API Url directly (#32248) 2025-06-06 18:05:40 +02:00
deepseek.rs Add tool support for DeepSeek (#30223) 2025-06-03 10:59:36 +02:00
google.rs google: Add latest versions of Gemini 2.5 Pro and Flash Preview (#32183) 2025-06-05 19:30:34 +00:00
lmstudio.rs lmstudio: Fill max_tokens using the response from /models (#25606) 2025-06-06 20:21:23 +00:00
mistral.rs language_models: Fix Mistral tool->user message sequence handling (#31736) 2025-06-06 12:35:22 +03:00
ollama.rs Remove unused load_model method from LanguageModelProvider (#32070) 2025-06-04 14:07:01 +00:00
open_ai.rs Pass up intent with completion requests (#31710) 2025-05-29 20:43:12 +00:00
open_router.rs Add support for OpenRouter as a language model provider (#29496) 2025-06-03 15:59:46 +00:00