<!doctype html>

Pashto Datasets
Pashto dataset directory

Datasets for Pashto speech, text, OCR, and translation work.

Start from filtered catalog views, then open each original source to inspect licensing, download instructions, and dataset quality.

Dataset Paths

Use these entry points when you already know the kind of data you need.

Speech Recognition

Find speech corpora, audio-text pairs, and ASR evaluation resources.

ASR Speech
Open ASR datasets

Text to Speech

Find voice data and references useful for training or evaluating Pashto TTS systems.

TTS Voice
Open TTS datasets

Translation and Text

Find parallel corpora, text collections, dictionaries, and NLP datasets.

MT NLP
Open MT datasets

OCR and Documents

Find image, script, and document resources for Pashto OCR and text extraction.

OCR Documents
Open OCR datasets

Raw Index

Open the repository dataset index when you need the Markdown source used by maintainers.

Repository Markdown
Open dataset index

Add a Dataset

Use the contribution notes and catalog rules before adding a new resource.

Quality Metadata
Contribute