Going beyond archiving - a collaborative tool for typological research

Abstract

The work described in this paper aims to outline some of the design aspects for a collaborative tool for typological research. This tool is designed to allow for the collation, from multiple contributors, of linguistic examples and their analysis with regards to an open set of variation dimensions of both onomasiological and semasiological nature. The resulting knowledge base combines linguistically relevant categories of human conceptualisation (e.g. in-group, such as ethnic or family group, categories) together with their linguistic coding (e.g. in gender affixes, verbal agreement), all based on actual linguistic examples from diverse natural languages as its underlying data-driven foundation. The system is based on Semantic Web technology and hence can be queried in a flexible way that allows for combining any variation dimensions within a query (e.g. it allows to answer questions such as which languages exhibit joint attention marking by way of verbal suffixing). We will focus on design aspects relating to sustainable data. How can sustainable data for such a project be delimited? Surely, this encompasses commonly accepted aspects such as standards conformity, longevity, and accessibility, which we will address in the paper. Additionally and in particular, however, we will argue that user orientation and involvement is a critical factor. Following on from this, the tool is designed in a way that it (i) does not require linguistic users to be trained extensively in system usage, (ii) allows linguists to deploy their standard methods of data entry (e.g. interlinear glossing), and (iii) provides contributors with immediate integration of their own with previously entered data and access to the resulting analysis (i.e. querying) and research potential. The paper will roughly be structured as follows: We will describe the background and aims of the project, and contextualise it in relation to other similar projects. We will then concentrate on how sustainability is addressed, discussing a number of different facets of sustainability. This includes data storage formats, user interface and workflow modelling, knowledge base design, and system features (in particular system output). We will also outline some problems that have arisen so far and close with an outlook on future development.PARADISEC (Pacific And Regional Archive for Digital Sources in Endangered Cultures), Australian Partnership for Sustainable Repositories, Ethnographic E-Research Project and Sydney Object Repositories for Research and Teaching

    Similar works