Speech Resource Finder

Almost 4 billion people speak languages with little or no speech technology support. This tool makes visible which languages have resources available and which communities are being left behind in the speech AI revolution. Built by CLEAR Global to support language inclusion and help close the digital language divide.

Select Language

Type to search for a language

Select a language to see resource classification

Commercial Services

HuggingFace Models

Keep only the model with most downloads for each base name

Loading...

HuggingFace Datasets

Loading...

Speech Resource Finder

Description

Almost 4 billion people speak languages with little or no speech technology support. This tool makes visible which languages have resources available and which communities are being left behind in the speech AI revolution.

Built by CLEAR Global to support language inclusion and help close the digital language divide.

How to Use

  1. Select a language from the dropdown (type to search by name or ISO code)
  2. Toggle model deduplication if desired (enabled by default)
  3. Review results: commercial availability, models, and datasets
  4. Click model/dataset names to open on HuggingFace

Data Sources

Commercial Speech Services

Commercial service support is automatically pulled from the language support page of each service provider.

Open Source Resources

Language Resource Classification

The resource classification shown for each language is based on Joshi et al.'s 2020 research on linguistic diversity in NLP. This study categorized languages into 6 levels based on their representation in language technology resources:

  • Level 5: The Winners - Languages with the most resources
  • Level 4: The Underdogs - Languages with moderate resources
  • Level 3: The Rising Stars - Languages with growing resources
  • Level 2: The Hopefuls - Languages with limited resources
  • Level 1: The Scraping-Bys - Languages with very few resources
  • Level 0: The Left-Behinds - Languages with almost no resources

Note: This classification is from 2020 research and may not reflect the current state of resources for all languages. The landscape of speech technology is rapidly evolving, and some languages have surely gained more resources since this study was conducted.

Disclaimer

  • The language list only contains 487 languages and is taken from this Github repository.
  • This is not an exhaustive list of speech and language technology resources. There are other commercial voice technology providers and dataset/model resources that this app doesn't cover.
  • Data fetched in real-time and can change.
  • Model deduplication discards models with same name uploaded by others and keeps only the most downloaded version in the list.
  • A maximum of 100 dataset and model entries from Hugging Face are shown.

Feedback

We would love to hear your feedback and suggestions. Please write us at tech@clearglobal.org.