evaluate¶
The evaluate command evaluates a fine-tuned model on tool-calling tasks using either local transformers inference or Ollama.
Usage¶
Arguments¶
| Argument | Description |
|---|---|
MODEL_PATH |
Path to base model or fine-tuned model (local directory or HuggingFace Hub ID) |
DATASET_PATH |
Path to evaluation dataset (JSONL format) |
Options¶
| Option | Type | Default | Description |
|---|---|---|---|
-o, --output |
PATH | None | Path to save evaluation results (JSON) |
--adapter-path |
PATH | None | Path to PEFT/LoRA adapter |
--batch-size |
INT | 1 | Batch size for evaluation |
--max-samples |
INT | All | Maximum number of samples to evaluate |
--temperature |
FLOAT | 0.7 | Sampling temperature |
--max-tokens |
INT | 2048 | Maximum tokens to generate |
--top-p |
FLOAT | 0.9 | Nucleus sampling top-p |
--backend |
CHOICE | transformers | Inference backend: transformers or ollama |
--device |
TEXT | Auto | Device to use (cuda, cpu, mps) - transformers only |
--no-save-predictions |
FLAG | False | Don't save individual predictions to output |
Examples¶
Quick Iteration
Use --max-samples 50 during development for faster feedback loops.