Training¶

DeepFabric datasets integrate directly with popular training frameworks. This section covers loading datasets, formatting with chat templates, and integrating with TRL and Unsloth.

Workflow¶

1. Generate dataset      →  deepfabric generate config.yaml
2. Upload to Hub         →  deepfabric upload dataset.jsonl --repo user/dataset
3. Load in training      →  load_dataset("user/dataset")
4. Format with template  →  tokenizer.apply_chat_template()
5. Train                 →  SFTTrainer or Unsloth

Quick Example¶

from datasets import load_dataset
from transformers import AutoTokenizer
from trl import SFTTrainer, SFTConfig

# Load dataset
dataset = load_dataset("your-username/my-dataset", split="train")

# Format with tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

def format_sample(example):
    text = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

formatted = dataset.map(format_sample)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=formatted,
    args=SFTConfig(output_dir="./output"),
)
trainer.train()

Key Concepts¶

Chat templates convert message arrays into model-specific formats. Each model family (Qwen, Llama, Mistral) has its own template.

Tool formatting differs by model. Some models expect tools in the system message, others in a separate parameter.

Reasoning traces can be included in training or used as auxiliary data.

Next Steps¶

Loading Datasets - HuggingFace integration
Chat Templates - Formatting for different models
Training Frameworks - TRL and Unsloth patterns