# Validators

## Why Validators Exist

Some tasks have a very large label space, such as lemmatization across a full lexicon. In those cases it may be inefficient to put every valid label into every prompt.

The agent can delegate post-checks to an external NDJSON validator that can:

- accept a prediction, optionally normalizing it
- request a retry and provide a smaller `allowed_labels` set
- provide a custom retry message that is appended on top of the rebuilt base prompt
- abort the run with a clear reason

## Protocol

- The agent starts the validator once per run.
- The agent sends one JSON object per attempt over stdin.
- The validator must return one JSON object on stdout.
- Validators must reserve stdout for protocol messages and write logs to stderr.
- Retries rebuild the original prompt from scratch, then append the validator retry message as an extra user message.

If `--validator_cmd` points to a `.py` file, the agent runs it with the current Python interpreter.

In the bundled lemmatization validators, validator-side `--max_distance 0` disables the distance threshold. Returned candidates are still capped by the lexicon and any validator-side `--max_suggestions` limit.

## Example: Lemmatization Validator

This repository includes `validators/lemmatization_validator.py`.

```bash
python benchmark_agent.py \
  --input data/input.csv \
  --model gpt-4o-mini \
  --validator_cmd validators/lemmatization_validator.py \
  --validator_args "--lexicon data/lemmata.txt --max_distance 2 --max_distance_per_retry 1 --max_suggestions 30"
```

## Using `info` Metadata

The dataset `info` column is forwarded to the validator unchanged.

The reference lemmatization validator can use it to restrict candidates by part of speech:

- put POS into `info`, for example `pos=NOUN` or `part-of-speech:VERB`
- provide a lexicon with an optional second POS column
- choose the correct separator with `--lexicon_field_sep`

Example:

```bash
python benchmark_agent.py \
  --input data/input.csv \
  --model gpt-4o-mini \
  --validator_cmd validators/lemmatization_validator.py \
  --validator_args "--lexicon data/lemmata_with_pos.tsv --lexicon_field_sep tab --use_pos"
```

## Validator Flags

- `--validator_cmd`: enable validator-driven checking
- `--validator_args`: extra validator arguments as one quoted string
- `--validator_timeout`: timeout per validator roundtrip
- `--validator_prompt_max_candidates`: cap rendered retry candidates after any validator-side limit such as `--max_suggestions`
- `--validator_prompt_max_chars`: cap retry-instruction size
- `--validator_exhausted_policy`: choose the outcome when retries are exhausted
- `--validator_debug`: log raw NDJSON send/receive payloads at `DEBUG`
- validator-side `--max_distance_per_retry`: increase the validator threshold by a fixed amount starting with the second retry, so the third overall attempt is the first one with a higher threshold

Validator-side candidate limits such as `--max_suggestions` control how many labels the validator returns in `retry.allowed_labels`. The benchmark-side `--validator_prompt_max_candidates` then controls how many of those returned labels are actually rendered into the retry prompt.

## Output Impact

When validation is enabled, the output CSV gains:

- `validatorStatus`
- `validatorReason`

Prompt-log entries also include validator metadata, with full request and response payloads in `full` prompt-log mode.