1 research outputs found
Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia
Chinese dynastic histories form a large continuous linguistic space of
approximately 2000 years, from the 3rd century BCE to the 18th century CE. The
histories are documented in Classical (Literary) Chinese in a corpus of over 20
million characters, suitable for the computational analysis of historical
lexicon and semantic change. However, there is no freely available open-source
corpus of these histories, making Classical Chinese low-resource. This project
introduces a new open-source corpus of twenty-four dynastic histories covered
by Creative Commons license. An original list of Classical Chinese
gender-specific terms was developed as a case study for analyzing the
historical linguistic use of male and female terms. The study demonstrates
considerable stability in the usage of these terms, with dominance of male
terms. Exploration of word meanings uses keyword analysis of focus corpora
created for genderspecific terms. This method yields meaningful semantic
representations that can be used for future studies of diachronic semantics.Comment: 12th Conference on Language Resources and Evaluation (LREC 2020), 9
pages, 7 table