
Now that we've established a proper eval in tree, this PR is reboots of our agent loop back to a set of minimal tools and simpler prompts. We should aim to get this branch feeling subjectively competitive with what's on main and then merge it, and build from there. Let's invest in our eval and use it to drive better performance of the agent loop. How you can help: Pick an example, and then make the outcome faster or better. It's fine to even use your own subjective judgment, as our evaluation criteria likely need tuning as well at this point. Focus on making the agent work better in your own subjective experience first. Let's focus on simple/practical improvements to make this thing work better, then determine how we can craft our judgment criteria to lock those improvements in. Release Notes: - N/A --------- Co-authored-by: Max <max@zed.dev> Co-authored-by: Antonio <antonio@zed.dev> Co-authored-by: Agus <agus@zed.dev> Co-authored-by: Richard <richard@zed.dev> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Michael Sloan <mgsloan@gmail.com>
1.3 KiB
1.3 KiB
- Introduce a new docker-compose.yml file in the integration tests directory for the monitoring daemon test suite. This file defines two services: a PostgreSQL database with test credentials exposed on port 5432, and a localstack S3 service exposed on port 4566. These services provide the necessary infrastructure for running the monitoring tests.
- Shows significant modifications to the test_monitoring.py file, including new imports (boto3, Path, and docker_compose_cm), removal of the dagster_aws tests import, and the addition of new fixtures. The new fixtures handle docker-compose setup, provide hostnames for services, configure AWS environment variables with test credentials, and initialize an S3 bucket for testing purposes. The changes reflect a shift from using external AWS credentials to using localstack for S3 testing.
- Reveals structural changes to the test file, where the aws_env fixture has been moved from the bottom of the file to be grouped with other fixtures. The original implementation that relied on get_aws_creds() has been replaced with a new implementation that uses localstack with hardcoded test credentials, and the test_docker_monitoring_run_out_of_attempts function remains at the end of the file but now uses the new aws_env fixture implementation.