![]() This is based on having observed that there is a lot of variation between runs on `n=1` and `n=3`. * With `n=8` two runs on the same branch give answers that seem close enough to be reasonably consistent. * With higher concurrency, trying to run this many repetitions seems to lead language servers to time out a lot, causing evals to fail. Release Notes: - N/A |
||
---|---|---|
.. | ||
actions | ||
ISSUE_TEMPLATE | ||
workflows | ||
cherry-pick-bot.yml | ||
pull_request_template.md |