    Seven Dimensions of Portability for Language Documentation and Description

    The process of documenting and describing the world's languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and dissemination. However, uncritical adoption of new tools and technologies is leading to resources that are difficult to reuse and which are less portable than the conventional printed resources they replace. We begin by reviewing current uses of software tools and digital technologies for language documentation and description. This sheds light on how digital language documentation and description are created and managed, leading to an analysis of seven portability problems under the following headings: content, format, discovery, access, citation, preservation and rights. After characterizing each problem we provide a series of value statements, and this provides the framework for a broad range of best practice recommendations.Comment: 8 page

    Digital archives: essential elements in the workflow for endangered languages documentation

    Thanks for not throwing that away: How archival data (unexpectedly) inform the linguistic and ethnographic record

    Witnessing the explosion in the amount of digital data over the past decade many authors have concluded that not everything can be preserved, that we must instead develop strategies for prioritizing objects for digital preservation (Ooghe and Moreels 2009). Digital language archives have been at least partly immune to these arguments, owing both to the nature of the data they preserve and to their status as early adopters. From the outset language archives have worked closely with the documentary linguistics community to develop standards for data portability which greatly simplify preservation and access (Bird and Simons 2003). The products of modern language documentation are by design much easier to archive than, say, eBooks or video games. Moreover, digital language archives have generally had privileged access to large computing infrastructures, often through particular arrangements with cyber-infrastructure built for hard science data storage and analyses. As digital archiving comes of age and digital language archives are brought within the fold of larger digital preservation efforts, the pressure to prioritize preservation goals will increase. Before we decide to discard materials as superfluous, it is useful to consider some of the ways language archives are being used. In this paper I review some current uses of materials housed at the Alaska Native Language Archive (ANLA). Though designed exclusively as a repository of linguistic knowledge, ANLA is now increasingly recognized by its user community as a rich source of ethnographic information. Language documentation is for the most part a holistic effort, and though language documenters may not be specialists in topics such as botany, kinship, or geography, they are often the only ones to record this knowledge. Hence the value of language archives as repositories of traditional knowledge. Of course, ANLA is also a rich source of more traditional linguistic documentation. This is not surprising in cases where little or no published documentation exists. However, increasingly we are discovering important information which was excluded from published reference works, ostensibly because it was not thought to be important at the time. Archival documents have revealed errors and oversights in the published records for even the most well-documented Alaskan languages. While anecdotal, these experiences demonstrate the value of preserving all linguistic data, even in cases where good published documentation exists. Digital language archives must resist pressure from the wider library and archives community to prioritize preservation efforts and triage collection. Fortunately, digital language archives are already ahead of the curve, having developed inter-institutional frameworks which stress regional focus and avoid duplication of preservation efforts (Barwick 2004, AIMS Working Group 2012). On this tenth anniversary of PARADISEC it is encouraging to note the great progress which has been made in the development of digital ethnographic archives; however, we must also be prepared for a new era in which digital archiving is a quotidian effort and we face increasing pressure to discard materials. References AIMS Working Group. 2012. AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship. Barwick, Linda. 2004. Turning It All Upside Down . . . Imagining a distributed digital audiovisual archive. Literary and Linguistic Computing 19.253-63. Bird, Steven and Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3).557-82. Ooghe, Bart, Heritage Cell Waasland and Dries Moreels. 2009. Analysing selection for digitisation. D-Lib Magazine 15(9/10).1082-987

    Peran teknologi dan kerja sama tim dalam pendokumentasian bahasa

    Jika dibandingkan dengan pendeskripsian bahasa dengan pendekatan yang lebih tradisional, ada dua ciri penting pendokumentasian bahasa, yaitu perhatian eksplisit terhadap teknologi yang dapat membuat pekerjaan pendokumentasian bahasa yang belum terdokumentasi dengan baik menjadi lebih efisien dan penekanan tentang perlunya tim yang terdiri atas orang-orang dengan keahlian yang berbeda-beda dan jenis keahlian berbeda untuk bekerja bersama dalam usaha pendokumentasian dan konservasi. Jika dilihat dari kontribusinya saat ini, kedua hal tersebut berkaitan erat satu sama lain dan saling mendukung satu sama lain. Repositori-bahasa daring (arsip) memungkinkan kelompok pengguna yang berbeda untuk mengakses dan bekerja dengan materi yang dikumpulkan pada pendokumentasian bahasa. Pada saat yang bersamaan, arsip ini memungkinkan pengguna untuk memperbarui dokumentasi dengan cara menambahkan data baru atau membuat anotasi dan analisis lanjutan. Sejalan dengan itu, untuk mendapatkan dokumentasi yang baik, menjadikan penutur asli dan komunitas tutur sebagai partner aktif dalam proyek dokumentasi merupakan hal yang esensial. Hal itu bahkan lebih esensial bagi upaya revitalisasi. Dalam hal ini, teknologi berbasis telepon pintar bisa membantu memperluas kemungkinan partisipasi penutur asli dan pihak lain yang tertarik

    Multiliteracy, past and present, in the Karaim communities

    Open source software (OSS) is one of the emerging areas in software engineering, and is gaining the interest of the software development community. OSS was started as a movement, and for many years software developers contributed to it as their hobby (non commercial purpose). Now, OSS components are being reused in CBSD (commercial purpose). However, recently, the use of OSS in SPL is envisioned recently by software engineering researchers, thus bringing it into a new arena. Being an emerging research area, it demands exploratory study to explore the dimensions of this phenomenon. Furthermore, there is a need to assess the reusability of OSS which is the focal point of these disciplines (CBSE, SPL, and OSS). In this research, a mixed method based approach is employed which is specifically 'partially mixed sequential dominant study'. It involves both qualitative (interviews) and quantitative phases (survey and experiment). During the qualitative phase seven respondents were involved, sample size of survey was 396, and three experiments were conducted. The main contribution of this study is results of exploration of the phenomenon 'reuse of OSS in reuse intensive software development'. The findings include 7 categories and 39 dimensions. One of the dimension factors affecting reusability was carried to the quantitative phase (survey and experiment). On basis of the findings, proposal for reusability attribute model was presented at class and package level. Variability is one of the newly identified attribute of reusability. A comprehensive theoretical analysis of variability implementation mechanisms is conducted to propose metrics for its assessment. The reusability attribute model is validated by statistical analysis of I 03 classes and 77 packages. An evolutionary reusability analysis of two open source software was conducted, where different versions of software are analyzed for their reusability. The results show a positive correlation between variability and reusability at package level and validate the other identified attributes. The results would be helpful to conduct further studies in this area

    Designing interoperable museum information systems

    Museum collections are characterized by heterogeneity, since they usually host a plethora of objects of categories, while each of them requires different description policies and metadata standards. Moreover the museum records, which keep the history and evolution of the hosted collections, request proactive curation in order to preserve this rich and diverse information. In this paper, the architecture of an innovative museum information system, as well as its implementation details is presented. In particular the requirements and the system architecture are presented along with the problems that were encountered. The main directions of the system design are (a) to increase interoperability levels and therefore assist proactive curation and (b) to enhance navigation by the usage of handheld devices. The first direction is satisfied by the design of a rich metadata schema based on the CIDOC/CRM standard. The second direction is fulfilled by the implementation of a module, which integrates the museum database with a subsystem appropriate to support user navigation into the museum floors and rooms. The module is expressed as a navigation functionality, which is accessed through handheld devices and peripherals, such as PDAs and RFID tags. The proposed system is functional and operates into the Solomos Museum, situated in Zakynthos island, Greece

    A survey of current reproducibility practices in linguistics publications

    Poster: In order to move forward toward reproducible research in linguistics, we first need to know where we are now with regard to our practices for methodological clarity and data citation in publications. In this poster we share the results of a study of over 370 journal articles, dissertations, and grammars, which is taken as a sample of current practices in the field. The publications all come from a ten-year span. The journals were selected for broad coverage. Grammars included published grammars and dissertations written as grammars, with broad geographic coverage, both in terms of subject language and publisher or university.These publications are critiqued on the basis of transparency of data source, data collection methods, analysis, and storage. While we find examples of transparent reporting, most of the surveyed research does not include key metadata, methodological information, or citations that are resolvable to the data on which the analyses are based.This material is based upon work supported by the National Science Foundation under grant SMA-1447886