validate¶
The validate command performs comprehensive analysis of DeepFabric configuration files, identifying potential issues before expensive generation processes begin.
Save Time and Resources
Catch configuration problems, authentication issues, and parameter incompatibilities early in the development cycle.
Basic Usage¶
Validate a configuration file for common issues:
The command analyzes your configuration structure, checks parameter values, and reports any problems with clear descriptions and suggested fixes.
Validation Categories¶
The validation process examines multiple aspects of your configuration:
Structural Validation
: Ensures all required sections (topics, generation, output) are present and properly formatted.
Parameter Compatibility : Checks that parameter values are within acceptable ranges and compatible with each other.
Provider Authentication : Verifies that required environment variables are set for the specified model providers.
Logical Consistency : Examines relationships between configuration sections, ensuring file paths and dependencies are coherent.
Validation Output¶
Successful validation output
Configuration is valid
Configuration Summary:
Topics: mode=tree, depth=3, degree=4, estimated_paths=64 (4^3)
Output: num_samples=500, concurrency=5, checkpoint_interval=100
→ Cycles needed: 8 (500 samples ÷ 64 unique topics)
→ Final cycle: 52 topics (partial)
Hugging Face: repo=username/dataset-name
Warnings:
High temperature value (0.95) may produce inconsistent results
No save_as path defined for topic tree
The summary shows cycle-based generation info, including how many times the generator will iterate through unique topics and whether the final cycle is partial.
Understanding Cycles
DeepFabric uses cycle-based generation where each unique topic is processed once per cycle. When num_samples exceeds the number of unique topics, multiple cycles are needed. The concurrency setting controls parallel LLM calls.
Error Reporting¶
Validation error output
Each error includes sufficient detail to identify the problem location and suggested corrections.
Configuration Analysis¶
Beyond basic validation, the command provides insights into your configuration choices:
Configuration analysis output
This analysis helps you understand the generation model:
- Unique topics: Deduplicated count from your topic tree/graph
- Cycles: Number of complete passes through all topics
- Concurrency: How many LLM calls run in parallel
Provider-Specific Validation¶
The validation process includes provider-specific checks based on your configuration:
Verifies model name formats and availability.
Checks Claude model specifications.
Attempts to verify local model availability.
Provider validation output
Development Workflow Integration¶
Integrate validation into your development workflow to catch issues early:
Best Practice
This pattern ensures configuration problems are identified before expensive generation processes begin.
Batch Validation¶
Validate multiple configurations simultaneously:
for config in configs/*.yaml; do
echo "Validating $config"
deepfabric validate "$config"
done
Common Issues¶
Missing Required Sections
Configurations lacking essential components like topics, generation, or output sections are flagged immediately.
Parameter Range Issues
Values outside reasonable ranges, such as negative depths or extremely high temperatures, are identified with suggested corrections.
Provider Mismatches
Inconsistencies between specified providers and model names are detected and reported with compatible alternatives.
File Path Problems
Invalid or potentially conflicting output paths are identified to prevent generation failures or accidental overwrites.
Validation Exit Codes¶
The validate command uses standard exit codes for scripting integration:
| Exit Code | Meaning |
|---|---|
| 0 | Configuration is valid and ready for generation |
| 1 | Configuration has errors that prevent generation |
| 2 | Configuration file not found or unreadable |
Continuous Validation Strategy
Consider adding configuration validation to your version control hooks or CI pipeline. This practice catches configuration regressions and ensures all committed configurations are functional.