Datasets
Chinese

Zhengzhou Dialect Scripted Speech Corpus - Daily Use Sentence
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Zhengzhou,
Mandarin Chinese (Zhengzhou, China)
Content : daily use sentences
Tags : Chinese Dialect
Size : 437 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Zhengzhou Dialect Conversational Speech Corpus
Category : ASR Corpus
Datasets Source : MagicData
Language : cmn-Zhengzhou,
Mandarin Chinese (Zhengzhou, China)
Content : themed conversations
Tags : Chinese Dialect
Size : 308 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Mandarin Chinese Conversational Speech Corpus - Web Meeting
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN,
Mandarin Chinese (China)
Content : conversations
(web meetings)
Tags : Mandarin Chinese
Size : 202 MB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Chinese-English Parallel Corpus - Finance
Category : NLP Corpus
Datasets Source : MagicData
Language : zh & en,
Chinese and English
Content : Chinese-English
parallel corpus
on finance-related
daily use sentences
Tags : Mandarin Chinese, English
Size : 8 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license
Chinese Mandarin Speech Corpus from Aishell
Category : ASR Corpus
Datasets Source : Aishell
Language : Mandarin Chinese (China)
Content : daily use sentences
Tags : Mandarin Chinese
Size : 15GB
File Format : WAV (PCM) TXT (UTF8)
License : Apache License v.2.0
Mandarin Chinese Scripted Speech Corpus - Daily Use Sentence / Command and Query / SMS
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN, Mandarin Chinese (China)
Content : daily use sentences,
commands and queries,
SMS
Tags : Mandarin Chinese
Size : 59 GB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Mandarin Chinese Scripted Speech Corpus - in-Vehicle Scene
Category : ASR Corpus
Datasets Source : MagicData
Language : zh-CN, Mandarin Chinese (China)
Content : commands and queries
in vehicle-related scenes
Tags : Mandarin Chinese
Size : 3.09 GB
File Format : WAV (PCM)
TXT (UTF8)
License : Magic Data
open-source license
Chinese Customer Service Scenario Text Corpus - Education
Category : NLP Corpus
Datasets Source : MagicData
Language : zh-CN,
Mandarin Chinese (China)
Content : dialogical texts on
education-related
customer service
Tags : Mandarin Chinese
Size : 28 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license
Chinese Customer Service Scenario Text Corpus - Healthcare
Category : NLP Corpus
Datasets Source : MagicData
Language : zh-CN,
Mandarin Chinese (China)
Content : dialogical texts on
healthcare-related
customer service
Tags : Mandarin Chinese
Size : 31 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license
Chinese Customer Service Scenario Text Corpus - Finance
Category : NLP Corpus
Datasets Source : MagicData
Language : zh-CN,
Mandarin Chinese (China)
Content : dialogical texts on
finance-related
customer service
Tags : Mandarin Chinese
Size : 34 KB
File Format : TXT (UTF8)
License : Magic Data
open-source license