Skip to content

Tools

DeepFabric uses Spin, a WebAssembly framework, to execute tools during dataset generation. Tools run in isolated sandboxes, producing authentic training data based on real execution results.

Why Real Execution Matters

Traditional synthetic data generators simulate tool outputs, which creates unrealistic training data. With Spin, tools execute against real state:

# Simulated (unrealistic)
Agent: read_file("config.json")
Result: {"setting": "value"}  # LLM hallucinated this

# Real execution (accurate)
Agent: read_file("config.json")
Result: FileNotFound  # Actual state
Agent: write_file("config.json", content)
Result: Written 42 bytes  # Real operation

Architecture

┌─────────────────┐     ┌─────────────────┐
│   DeepFabric    │────▶│   Spin Service  │
│   (Python)      │     │   (WASM Host)   │
└─────────────────┘     └────────┬────────┘
              ┌──────────────────┼──────────────────┐
              ▼                  ▼                  ▼
        ┌──────────┐      ┌──────────┐      ┌──────────┐
        │   VFS    │      │   Mock   │      │  GitHub  │
        │Component │      │Component │      │Component │
        └──────────┘      └──────────┘      └──────────┘

Components are WebAssembly modules that handle specific tool categories:

Component Purpose Tools
VFS Virtual filesystem read_file, write_file, list_files, delete_file
Mock Dynamic mock execution Any tool loaded via MCP
GitHub GitHub API (experimental) Issues, PRs, commits

Quick Start

# Install Spin (macOS)
brew install fermyon/tap/spin

# Build and run
cd tools-sdk
spin build
spin up

The service runs at http://localhost:3000.

Configure DeepFabric to use it:

generation:
  tools:
    spin_endpoint: "http://localhost:3000"

Session Isolation

Each dataset sample gets an isolated session. Files created during one sample don't affect others:

# Session A: Creates config.json
# Session B: config.json doesn't exist

# After sample generation, session is cleaned up

Next Steps