- Evals returning an error (e.g., LLM API format mismatch) were silently
skipped in the aggregated results. Now we count them as a failure (0%
success score).
- Setting the `VERBOSE` environment variable to something non-empty
disables string truncation
Release Notes:
- N/A