The stability of segmental properties across genre and corpus types in low-resource languages

Cohen Priva, Uriel; Strand, Emily; Yang, Shiying

The stability of segmental properties across genre and corpus types in low-resource languages

Authors: Uriel Cohen Priva
Emily Strand
Shiying Yang
Publication date: 1 January 2020
Publisher: ScholarWorks@UMass Amherst

Abstract

Are written corpora useful for phonological research? Word frequency lists for low-resource languages have become ubiquitous in recent years (Scannell, 2007). For many languages there is direct correspondence between their written forms and their alphabets, but it is not clear whether written corpora can adequately represent language use. We use 15 low-resource languages and compare several information-theoretic properties across three corpus types. We show that despite differences in origin and genre, estimates in one corpus are highly correlated with estimates in other corpora

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

ScholarWorks@UMass Amherst

oai:scholarworks.umass.edu:sci...

Last time updated on 25/12/2019