Skip to content

Home

DeepFabric Logo

Synthetic Training Data for Agents

Good First Issues   Join Discord

License CI Status PyPI Version Downloads Discord

Quick Start

pip install deepfabric
export OPENAI_API_KEY="your-key"

deepfabric generate \
  --topic-prompt "DevOps and Platform Engineering" \
  --generation-system-prompt "You are an expert in DevOps and Platform Engineering" \
  --mode graph \
  --depth 2 \
  --degree 2 \
  --provider openai \
  --model gpt-4o \
  --num-samples 2 \
  --batch-size 1 \
  --conversation-type chain_of_thought \
  --reasoning-style freetext \
  --output-save-as dataset.jsonl \

What DeepFabric Does

  1. Generates topic hierarchies from a root prompt
  2. Creates training samples for each topic
  3. Outputs JSONL compatible with HuggingFace and training frameworks

Dataset Types

Type Description Use Case
Basic Simple Q&A pairs Instruction following
Reasoning Chain-of-thought traces Step-by-step problem solving
Agent Tool-calling with real execution Building agents

Key Features

Topic-driven generation ensures diverse, non-redundant samples. Each training example maps to a specific subtopic, avoiding the repetition common in naive generation.

Real tool execution via Spin. Agent datasets include actual tool results from isolated WebAssembly sandboxes, not simulated outputs.

Training integration with TRL, Unsloth, and HuggingFace. Use apply_chat_template to format for any model.

Built-in evaluation for testing fine-tuned models on tool-calling tasks with metrics for accuracy and correctness.

Documentation