Skip to content

validate

The validate command performs comprehensive analysis of DeepFabric configuration files, identifying potential issues before expensive generation processes begin.

Save Time and Resources

Catch configuration problems, authentication issues, and parameter incompatibilities early in the development cycle.

Basic Usage

Validate a configuration file for common issues:

Basic validation
deepfabric validate config.yaml

The command analyzes your configuration structure, checks parameter values, and reports any problems with clear descriptions and suggested fixes.

Validation Categories

The validation process examines multiple aspects of your configuration:

Structural Validation : Ensures all required sections (topics, generation, output) are present and properly formatted.

Parameter Compatibility : Checks that parameter values are within acceptable ranges and compatible with each other.

Provider Authentication : Verifies that required environment variables are set for the specified model providers.

Logical Consistency : Examines relationships between configuration sections, ensuring file paths and dependencies are coherent.

Validation Output

Successful validation output
Configuration is valid

Configuration Summary:
  Topics: mode=tree, depth=3, degree=4, estimated_paths=64 (4^3)
  Output: num_samples=500, concurrency=5, checkpoint_interval=100
    → Cycles needed: 8 (500 samples ÷ 64 unique topics)
    → Final cycle: 52 topics (partial)
  Hugging Face: repo=username/dataset-name

Warnings:
  High temperature value (0.95) may produce inconsistent results
  No save_as path defined for topic tree

The summary shows cycle-based generation info, including how many times the generator will iterate through unique topics and whether the final cycle is partial.

Understanding Cycles

DeepFabric uses cycle-based generation where each unique topic is processed once per cycle. When num_samples exceeds the number of unique topics, multiple cycles are needed. The concurrency setting controls parallel LLM calls.

Error Reporting

Validation error output
Configuration validation failed:
  - topics section is required
  - generation section is required
  - output section is required

Each error includes sufficient detail to identify the problem location and suggested corrections.

Configuration Analysis

Beyond basic validation, the command provides insights into your configuration choices:

Configuration analysis output
Configuration Analysis:
  Unique topics: 64 (from tree with degree=4, depth=3)
  Requested samples: 500
  Cycles needed: 8 (each cycle processes all 64 topics)
  Final cycle: 52 topics (partial)
  Concurrency: 5 parallel LLM calls

This analysis helps you understand the generation model:

  • Unique topics: Deduplicated count from your topic tree/graph
  • Cycles: Number of complete passes through all topics
  • Concurrency: How many LLM calls run in parallel

Provider-Specific Validation

The validation process includes provider-specific checks based on your configuration:

Verifies model name formats and availability.

Checks Claude model specifications.

Attempts to verify local model availability.

Provider validation output
Provider Validation:
   OpenAI API key detected (OPENAI_API_KEY)
   Model gpt-4 is available
   Model gpt-4 has higher costs than gpt-4-turbo

Development Workflow Integration

Integrate validation into your development workflow to catch issues early:

Validate before generation
deepfabric validate config.yaml && deepfabric generate config.yaml

Best Practice

This pattern ensures configuration problems are identified before expensive generation processes begin.

Batch Validation

Validate multiple configurations simultaneously:

Batch validation
for config in configs/*.yaml; do
  echo "Validating $config"
  deepfabric validate "$config"
done

Common Issues

Missing Required Sections

Configurations lacking essential components like topics, generation, or output sections are flagged immediately.

Parameter Range Issues

Values outside reasonable ranges, such as negative depths or extremely high temperatures, are identified with suggested corrections.

Provider Mismatches

Inconsistencies between specified providers and model names are detected and reported with compatible alternatives.

File Path Problems

Invalid or potentially conflicting output paths are identified to prevent generation failures or accidental overwrites.

Validation Exit Codes

The validate command uses standard exit codes for scripting integration:

Exit Code Meaning
0 Configuration is valid and ready for generation
1 Configuration has errors that prevent generation
2 Configuration file not found or unreadable
Continuous Validation Strategy

Consider adding configuration validation to your version control hooks or CI pipeline. This practice catches configuration regressions and ensures all committed configurations are functional.