Datasets
Chinese

CN-Celeb Speech Recognition Corpus
Category : ASR Corpus
Datasets Source : Tsinghua University
Language : Chinese
Content : free text
Tags : Mandarin Chinese
Size : 97G
File Format : WAV (PCM) TXT (UTF8)
License : Attribution-ShareAlike 4.0 International
Chinese Address POI Corpus
Category : NLP Corpus
Datasets Source : MagicData
Language : zh, Chinese
Content : POI, Chinese Addresses
Tags : Mandarin Chinese
Size : 1.81 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license
Wuhan Dialect Scripted Speech Corpus - Daily Use Sentence
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Wuhan,
Mandarin Chinese (Wuhan, China)
Content : daily use sentences
Tags : Chinese Dialect
Size : 436.14 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Chinese Speech Corpus from Tsinghua University
Category : Not Described.
Datasets Source : THCHS-30
Language : Chinese
Content : Not Described.
Tags : Mandarin Chinese
Size : 8.3GB
File Format : Not Described.
License : Apache License v.2.0
Zhengzhou Dialect Scripted Speech Corpus - Daily Use Sentence
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Zhengzhou,
Mandarin Chinese (Zhengzhou, China)
Content : daily use sentences
Tags : Chinese Dialect
Size : 437 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Zhengzhou Dialect Conversational Speech Corpus
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Zhengzhou,
Mandarin Chinese (Zhengzhou, China)
Content : themed conversations
Tags : Chinese Dialect
Size : 308 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Mandarin Chinese Conversational Speech Corpus - Web Meeting
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN,
Mandarin Chinese (China)
Content : conversations
(web meetings)
Tags : Mandarin Chinese
Size : 202 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Chinese-English Parallel Corpus - Finance
Category : NLP Corpus
Datasets Source : MagicData
Language : zh & en,
Chinese and English
Content : Chinese-English
parallel corpus
on finance-related
daily use sentences
Tags : English, Mandarin Chinese
Size : 8 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license
Mandarin Chinese Scripted Speech Corpus - Daily Use Sentence / Command and Query / SMS
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN, Mandarin Chinese (China)
Content : daily use sentences,
commands and queries,
SMS
Tags : Mandarin Chinese
Size : 59 GB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Mandarin Chinese Scripted Speech Corpus - in-Vehicle Scene
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN, Mandarin Chinese (China)
Content : commands and queries
in vehicle-related scenes
Tags : Mandarin Chinese
Size : 3.09 GB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license