Skip to content

upload-hf

The upload-hf command publishes datasets to Hugging Face Hub.

Basic Usage

Upload a dataset file to Hugging Face Hub:

Basic upload
deepfabric upload-hf dataset.jsonl --repo username/dataset-name

This command uploads the dataset file and creates a dataset card with automatically generated metadata.

Authentication Methods

The upload command supports multiple authentication approaches:

Most secure approach for production environments:

export HF_TOKEN="your-huggingface-token"
deepfabric upload-hf dataset.jsonl --repo username/dataset-name

Token specification directly in the command:

deepfabric upload-hf dataset.jsonl --repo username/dataset-name --token your-token

Works automatically if you've previously authenticated:

huggingface-cli login
deepfabric upload-hf dataset.jsonl --repo username/dataset-name

Repository Management

The upload-hf command handles repository creation and updates automatically:

New Repositories : Created automatically when uploading to non-existent repositories.

Existing Repositories : Receive updates to both dataset files and dataset card.

Repository Naming : Follows Hugging Face conventions: username/dataset-name or organization/dataset-name.

Dataset Tagging

Customize dataset discoverability through tag specification:

Add tags
deepfabric upload-hf dataset.jsonl \
  --repo username/educational-content \
  --tags educational \
  --tags programming \
  --tags synthetic

Generated Documentation

Dataset Card

The upload process creates a basic dataset card if one doesn't already exist.

File Organization

The upload process organizes files according to Hugging Face Hub conventions:

repository-name/
├── README.md          # Generated dataset card
├── dataset.jsonl      # Your uploaded dataset
└── .gitattributes     # LFS configuration for large files

Large Files

Large dataset files are automatically configured for Git LFS to ensure efficient storage and retrieval.

Batch Upload Operations

Upload multiple related datasets to maintain organized dataset collections:

Multiple uploads
# Upload training and validation sets
deepfabric upload-hf train_dataset.jsonl --repo myorg/comprehensive-dataset --tags training
deepfabric upload-hf val_dataset.jsonl --repo myorg/comprehensive-dataset --tags validation

This approach creates dataset repositories with multiple related files and appropriate metadata for each component.