NVIDIA’s Granary launches as Europe’s most extensive open speech dataset

Written by Michael Anthony Bitoon

Published 18 Aug 2025

Fact checked by

Sophia Feona Cantiller

Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

nvidia granary open speech dataset

Building speech technology for European languages just got twice as fast after NVIDIA shared a massive audio collection with developers worldwide.

The tech giant released Granary on August 15, containing one million hours of audio across 25 European languages. The dataset includes 650,000 hours for speech recognition and 350,000 hours for translation tasks.

Granary needed roughly half the training data compared to competing datasets to reach target accuracy levels. This efficiency breakthrough could dramatically reduce costs for companies building multilingual artificial intelligence (AI) applications.

“These tools will enable developers to more easily scale AI applications to support global users,” said Jonathan Cohen from NVIDIA’s blog team.

The dataset covers all 24 official European Union languages plus Russian and Ukrainian. Languages like Croatian, Estonian, and Maltese previously had limited AI support due to data scarcity.

NVIDIA developed Granary with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The team processed unlabeled public audio using NVIDIA’s NeMo Speech Data Processor toolkit, avoiding expensive human annotation.

Two AI models accompany the dataset release. Canary-1b-v2 handles complex transcription tasks with billion-parameter processing power. The model delivers quality comparable to systems three times larger while running inference up to ten times faster.

Parakeet-tdt-0.6b-v3 focuses on high-speed transcription. This 600-million-parameter model can process 24-minute audio segments in single passes while automatically detecting input languages.

Both models provide accurate punctuation, capitalization, and word-level timestamps. They’re available under permissive licensing for commercial and research use.

The release addresses a critical gap in AI language support. Less than one percent of the world’s 7,000 languages currently have robust AI backing.

European businesses could benefit significantly from reduced development costs. Companies building customer service chatbots or translation services previously needed extensive datasets for each target language.

The efficiency gains extend beyond cost savings. Faster training times mean quicker deployment of multilingual AI services across European markets. Small companies with limited computing resources can now compete with larger rivals in developing language-specific AI tools.

NVIDIA will present its research paper at the Interspeech conference in the Netherlands from August 17-21. The dataset and models are now available on Hugging Face for immediate download.

As European regulators push for more inclusive AI systems, the dataset’s open-source nature will allow developers to customize models for specific regional dialects and use cases.

Future applications could include real-time translation devices, enhanced virtual assistants, and automated transcription services for European parliament proceedings and business meetings.