# Common Examples Examples below use POSIX line continuation. On PowerShell, replace `\` with `^`. ## Basic Benchmark Run ```bash python benchmark_agent.py \ --input data/input/input.csv \ --model gpt-4o-mini \ --temperature 0.0 \ --top_p 1.0 \ --prompt_layout compact \ --request_interval_ms 250 \ --threads 4 \ --system_prompt "You are a linguistic classifier." \ --enable_cot \ --few_shot_examples 5 \ --calibration ``` ## Multiple Input Files ```bash python benchmark_agent.py \ --input data/input/input.csv data/input/input_extra.csv \ --model gpt-4o-mini \ --output data/output/ ``` When `--output` points to a directory, each input gets its own timestamped output file. ## Resume An Existing Output ```bash python benchmark_agent.py \ --input data/input/input.csv \ --output data/output/task__openai__gpt4omini__2026-03-20-18-21.csv \ --resume ``` The agent skips rows whose `ID` already exists in the output CSV and continues in input order. ## Re-run Only `unclassified` Predictions ```bash python benchmark_agent.py \ --input data/input/input.csv \ --output data/output/task__openai__gpt4omini__2026-03-20-18-21.csv \ --resume \ --unclassified ``` To keep iterating until unresolved rows stabilize: ```bash python benchmark_agent.py \ --input data/input/input.csv \ --output data/output/task__openai__gpt4omini__2026-03-20-18-21.csv \ --resume \ --repeat_unclassified ``` ## Metrics-Only Recompute Refresh only the aggregate agreement files from existing `*_metrics.json` artifacts: ```bash python benchmark_agent.py --metrics_only ``` Recompute metrics for an existing output CSV: ```bash python benchmark_agent.py \ --metrics_only \ --input data/output/task__openai__gpt4omini__2026-03-20-18-21.csv ``` Override truth labels from a separate file: ```bash python benchmark_agent.py \ --metrics_only \ --input data/output/task__openai__gpt4omini__2026-03-20-18-21.csv \ --labels data/new_truth_labels.csv ``` ## Refresh The GUI Model Catalog ```bash python benchmark_agent.py --update-models ``` To update only selected providers: ```bash python benchmark_agent.py --update-models --models-providers openai requesty vertex ``` ## Summarize Prompt-Log Errors ```bash python benchmark_agent.py \ --summarize-log-errors data/logs/task__openai__gpt4omini__2026-03-20-18-21.log ``` ## Timeout Diagnosis ```bash python benchmark_agent.py \ --input data/input/input.csv \ --model gpt-4o-mini \ --timeout_probe ```