upload-hf¶
The upload-hf command publishes datasets to Hugging Face Hub.
Basic Usage¶
Upload a dataset file to Hugging Face Hub:
This command uploads the dataset file and creates a dataset card with automatically generated metadata.
Authentication Methods¶
The upload command supports multiple authentication approaches:
Most secure approach for production environments:
Token specification directly in the command:
Repository Management¶
The upload-hf command handles repository creation and updates automatically:
New Repositories : Created automatically when uploading to non-existent repositories.
Existing Repositories : Receive updates to both dataset files and dataset card.
Repository Naming
: Follows Hugging Face conventions: username/dataset-name or organization/dataset-name.
Dataset Tagging¶
Customize dataset discoverability through tag specification:
deepfabric upload-hf dataset.jsonl \
--repo username/educational-content \
--tags educational \
--tags programming \
--tags synthetic
Generated Documentation¶
Dataset Card
The upload process creates a basic dataset card if one doesn't already exist.
File Organization¶
The upload process organizes files according to Hugging Face Hub conventions:
repository-name/
├── README.md # Generated dataset card
├── dataset.jsonl # Your uploaded dataset
└── .gitattributes # LFS configuration for large files
Large Files
Large dataset files are automatically configured for Git LFS to ensure efficient storage and retrieval.
Batch Upload Operations¶
Upload multiple related datasets to maintain organized dataset collections:
# Upload training and validation sets
deepfabric upload-hf train_dataset.jsonl --repo myorg/comprehensive-dataset --tags training
deepfabric upload-hf val_dataset.jsonl --repo myorg/comprehensive-dataset --tags validation
This approach creates dataset repositories with multiple related files and appropriate metadata for each component.