Towards a sustainable handling of inter-linear-glossed text in language documentation

Abstract

Efforts on language documentation have been increasing in the past. While the amount of digital data of the world's languages is increasing, only a small amount of the data is sustainable, since data reuse is often exacerbated by idiosyncratic formats and a negligence of standards that could help to increase the comparability of linguistic data. The sustainability problem is nicely reflected in the current practice of handling inter-linear-glossed text, one of the crucial resources produced in language documentation. Although large collections of glossed texts have been produced so far, the current practice of data handling greatly exacerbates the reuse of data. In order to address this problem, we propose a first framework for the computer-assisted, sustainable handling of inter-linear-glossed text resources. Building on recent standardization proposals for word lists and structural datasets, combined with state-of-the-art methods for automated sequence comparison in historical linguistics, we show how our workflow can be used to lift a collection of inter-linear-glossed Qiang texts (an endangered language spoken in Sichuan, China), and how the lifted data can assist linguists in their research

    Similar works