Hub documentation
File formats
Single Sign-On (SSO) Audit Logs Storage Regions Data Studio for Private datasets Resource Groups (Access Control) Advanced Compute Options Advanced Security Tokens Management Publisher Analytics Gating Group Collections Network Security Rate Limits Blog Articles
PRO Plan Repositories Getting Started with Repositories Repository Settings Storage Limits Storage Backend (Xet) Local Cache Pull Requests & Discussions Notifications Collections Webhooks GitHub Actions Notebooks Next Steps Licenses
Models The Model Hub Model Cards Eval Results Leaderboard Data Gated Models Uploading Models Downloading Models Integrated Libraries Model Widgets Model Inference Models Download Stats Model Release Checklist Local Apps Frequently Asked Questions Advanced Topics
Datasets Datasets Overview Dataset Cards Gated Datasets Uploading Datasets Uploading Datasets (for LLMs) Downloading Datasets Streaming Datasets Integrated Libraries
Spaces Argilla Daft Dask Datasets Data Designer Distilabel DuckDB Embedding Atlas fenic FiftyOne Lance Pandas Polars
Data Studio Datasets Download Stats Authentication for private and gated datasets Supported file formats Performing data transformations Performance optimizations
Spark WebDataset Spaces Overview Spaces GPU Upgrades Spaces ZeroGPU Spaces Dev Mode Spaces Disk Usage & Storage Spaces Custom Domain Spaces as MCP servers Spaces as Agent Tools Spaces as API Endpoints Gradio Spaces Streamlit Spaces Static HTML Spaces Docker Spaces Embed your Space Run Spaces with Docker Spaces Configuration Reference Sign-In with HF button Featured Spaces Spaces Changelog Advanced Topics
Storage Buckets new Jobs Jobs Overview Quickstart Pricing and Billing Manage Jobs Configuration Popular Images Examples & Tutorials Schedule Jobs Webhook Automation Reference
Agents Agents Overview Hugging Face CLI for AI Agents Hugging Face MCP Server Hugging Face Agent Skills Building agents with the HF SDK Local Agents with llama.cpp Agent Libraries
Other File formats
Polars supports the following file formats when reading from Hugging Face:
The examples below show the default settings only. Use the links above to view all available parameters in the API reference guide.
Parquet
Parquet is the preferred file format as it stores the schema with type information within the file. This avoids any ambiguity with parsing and speeds up reading. To read a Parquet file in Polars, use the read_parquet function:
pl.read_parquet("hf://datasets/roneneldan/TinyStories/data/train-00000-of-00004-2d5a1467fff1081b.parquet")CSV
The read_csv function can be used to read a CSV file:
pl.read_csv("hf://datasets/lhoestq/demo1/data/train.csv")JSON
Polars supports reading new line delimited JSON — also known as json lines — with the read_ndjson function:
pl.read_ndjson("hf://datasets/proj-persona/PersonaHub/persona.jsonl")