HOUSING INTELLIGENCE

Supply IoT devices in housing scenes with automatic speech recognition, natural language processing, and other more diverse human-computer interaction technologies and services.
Giga Speech
Category : ASR Corpus
Datasets Source : Tsinghua University
Language : English
Content : Various topics
Tags : English
Size : 435GB
File Format : OPUS
License : TERMS OF ACCESS
CN-Celeb Speech Recognition Corpus
Category : ASR Corpus
Datasets Source : Tsinghua University
Language : Chinese
Content : free text
Tags : Mandarin Chinese
Size : 97G
File Format : WAV (PCM) TXT (UTF8)
License : Attribution-ShareAlike 4.0 International
Wu-Accented Mandarin Conversational Speech Corpus
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN,
Wu-accented Mandarin (Wu areas, China)
Content : themed conversations
Tags : Accented Mandarin
Size : 158.12 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Korean Conversational Speech Corpus
Category : ASR Corpus
Datasets Source : MagicData
Language : ko-KR, Korean (South Korea)
Content : themed conversations
Tags : Korean
Size : 421.95 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Russian Scripted Speech Corpus - Daily Use Sentence
Category : ASR Corpus
Datasets Source : MagicData
Language : ru-RU,
Russian (Russia)
Content : daily use sentences
Tags : Russian
Size : 625.59 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Wuhan Dialect Scripted Speech Corpus - Daily Use Sentence
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Wuhan,
Mandarin Chinese (Wuhan, China)
Content : daily use sentences
Tags : Chinese Dialect
Size : 436.14 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
English Speech Corpus from TED-LIUM
Category : ASR Corpus
Datasets Source :
Language : English
Content : daily use sentences
Tags : English
Size : 21G
File Format : Not Described.
License : Creative Commons BY-NC-ND 3.0
Iban Speech Corpora for ASR
Category : ASR Corpus
Datasets Source : Iban
Language : Iban
Content : News
Tags : Iban
Size : 913MB
File Format : Not Described.
License : Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)
Transcribed speech data in Amharic and Swahili and Wolof
Category : ASR Corpus
Datasets Source :
Language : Not Described.
Content : Not Described.
Tags : Not Described.
Size : Not Described.
File Format : Not Described.
License : Not Described.
Sinhalese multi-speaker TTS corpora
Category : TTS Corpus
Datasets Source : Sinhala TTS
Language : Sinhalese
Content : Not Described.
Tags : Sinhala
Size : 699MB
File Format : TXT
License : Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)