Home

Synthetic Training Data for Agents

Quick Start¶

pip install deepfabric
export OPENAI_API_KEY="your-key"

deepfabric generate \
  --topic-prompt "DevOps and Platform Engineering" \
  --generation-system-prompt "You are an expert in DevOps and Platform Engineering" \
  --mode graph \
  --depth 2 \
  --degree 2 \
  --provider openai \
  --model gpt-4o \
  --num-samples 2 \
  --batch-size 1 \
  --conversation-type chain_of_thought \
  --reasoning-style freetext \
  --output-save-as dataset.jsonl \

What DeepFabric Does¶

Generates topic hierarchies from a root prompt
Creates training samples for each topic
Outputs JSONL compatible with HuggingFace and training frameworks

Dataset Types¶

Type	Description	Use Case
Basic	Simple Q&A pairs	Instruction following
Reasoning	Chain-of-thought traces	Step-by-step problem solving
Agent	Tool-calling with real execution	Building agents

Key Features¶

Topic-driven generation ensures diverse, non-redundant samples. Each training example maps to a specific subtopic, avoiding the repetition common in naive generation.

Real tool execution via Spin. Agent datasets include actual tool results from isolated WebAssembly sandboxes, not simulated outputs.

Training integration with TRL, Unsloth, and HuggingFace. Use apply_chat_template to format for any model.

Built-in evaluation for testing fine-tuned models on tool-calling tasks with metrics for accuracy and correctness.

Documentation¶

Getting Started - Installation and first dataset
Dataset Generation - Types and configuration
Tools - Spin components and MCP integration
Training - Loading datasets and formatting
Evaluation - Testing fine-tuned models
CLI Reference - Command documentation