# Getting Started ## Requirements - Python 3.9+ - `openai` - `matplotlib` for confusion heatmaps and optional calibration plots Install dependencies: ```bash python -m pip install openai matplotlib ``` ## Repository Basics - `benchmark_agent.py` is the main entry point. - `example_input.csv` is the smallest reference dataset in the repo. - `config_gui.html` builds CLI commands in the browser. - `web/` contains the static metrics dashboard. ## Dataset Format Input files must be semicolon-delimited CSVs. | Column | Required | Description | | --- | --- | --- | | `ID` | Yes | Unique example identifier | | `leftContext` | No | Text before the node token | | `node` | Yes | Target word to classify | | `rightContext` | No | Text after the node token | | `info` | No | Supplemental instructions or metadata | | `truth` | No | Gold label, unless supplied via `--labels` | When you use `--labels`, provide a separate CSV with `ID;truth`. Extra input columns are carried into the output CSV. ## Credentials By default the agent reads `OPENAI_API_KEY`. For non-default providers it can infer environment variable names from the provider slug: - `_API_KEY` or `_ACCESS_TOKEN` - `_BASE_URL` Examples: - `REQUESTY_API_KEY` - `REQUESTY_BASE_URL` - `VERTEX_ACCESS_TOKEN` - `VERTEX_BASE_URL` You can override discovery with `--api_key_var` and `--api_base_var`. ## First Run ```bash python benchmark_agent.py \ --input example_input.csv \ --model gpt-4o-mini \ --temperature 0.0 \ --top_p 1.0 \ --output data/output/ ``` On PowerShell, replace `\` with `^`. If `--output` is omitted, the agent writes: - `data/output/______.csv` - `data/metrics/__metrics.json` - `data/logs/.log` Set `DHAIBENCH_DATA_ROOT` if you want a different root for `data/`. ## What To Read Next - [Common Examples](examples.md) for recurring workflows - [Providers and Authentication](providers.md) for Vertex, Gemini, Requesty, and custom endpoints - [Outputs and Metrics](outputs-and-metrics.md) for artifact layout