25 research outputs found

    A sustainable archiving software solution for The Language Archive

    Get PDF
    [Archive X] has been developing a language archiving solution for more than 15 years now. The software is not only aimed at archiving and access but also integrates with a range of exploitation tools. This in house built solution was created from the ground up, since at the time no mature open source repository solutions were available. The situation today is rather different, with several widely used repository system solutions available, including open source solutions that are maintained by communities of developers. Since [Archive X] is now in a situation where it needs to reduce the number of staff required for the maintenance of its archiving software, it was decided to develop a new system based on one of the widely used open source repository solutions such as Fedora Commons (1) or DSpace (2). In this paper we will describe the process of selecting the most suitable open source repository solution as the basis for [Archive X]. This includes the specification of the functional and technical requirements and their prioritization, as well as the evaluation of a number of repository solutions. This evaluation also includes an assessment of the long-term perspective of those solutions. None of the existing repository solutions can provide the complete minimal functionality that [Archive X] requires from its archiving software. This means that additional components or modules need to be developed or adapted from the current software, regardless of the chosen repository solution. Still, we expect that using an existing extensible repository system as a basis will be less costly in the long run. Several language archives, in particular those that serve as centers (3) within the CLARIN consortium, have already implemented different repository systems based on either DSpace or Fedora Commons. Their experiences and recommendations are also taken into account for the evaluation of the various options. The final decision on which repository system will form the basis of the new archiving software will be taken by the end of September 2014. The development of the new archiving software will then start soon after that and a production-ready version will need to be finished by October 2016 at the latest. (1) http://fedorarepository.org/ (2) http://www.dspace.org/ (3) https://centerregistry-clarin.esc.rzg.mpg.de

    Vulnerability in Acquisition, Language Impairments in Dutch: Creating a VALID Data Archive

    Get PDF
    Vulnerability in Acquisition, Language Impairments in Dutch: Creating a VALID Data Archive Klatter, J.; van Hout, R.; van den Heuvel, H.; Fikkert, P.; Baker, A.E.; de Jong, J.; Wijnen, F.; Sanders, E.; Trilsbeek, P. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. Abstract The VALID Data Archive is an open multimedia data archive (under construction) with data from speakers suffering from language impairments. We report on a pilot project in the CLARIN-NL framework in which five data resources were curated. For all data sets concerned, written informed consent from the participants or their caretakers has been obtained. All materials were anonymized. The audio files were converted into wav (linear PCM) files and the transcriptions into CHAT or ELAN format. Research data that consisted of test, SPSS and Excel files were documented and converted into CSV files. All data sets obtained appropriate CMDI metadata files. A new CMDI metadata profile for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimize interoperability. After curation all data are deposited at the Max Planck Institute for Psycholinguistics Nijmegen where persistent identifiers are linked to all resources. The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engine

    Public access to research data in language documentation: Challenges and possible strategies

    Get PDF
    The Open Access Movement promotes free and unfettered access to research publications and, increasingly, to the primary data which underly those publications. As the field of documentary linguistics seeks to record and preserve culturally and linguistically relevant materials, the question of how openly accessible these materials should be becomes increasingly important. This paper aims to guide researchers and other stakeholders in finding an appropriate balance between accessibility and confidentiality of data, addressing community questions and legal, institutional, and intellectual issues that pose challenges to accessible dat

    Report from DoBeS training week

    Get PDF

    DoBeS Training Course

    No full text

    ELAN Audio Playback

    No full text

    The ‘Language Archiving Technology’ solutions for sustainable data from digital research

    No full text
    Since the late 1990s, the technical group at the Max-Planck-Institute for Psycholinguistics has worked on solutions for several of the questions addressed in this paradisec-meeting, in particular, how to guarantee long-time-availability of digital research data for future research. The support for the well-known DOBES (Documentation of Endangered Languages) programme has greatly inspired and advanced this work, and lead to the ongoing development of a whole suite of tools for annotating, cataloguing and archiving multi-media data. At the core of the LAT tools is the IMDI metadata schema, now being integrated into a larger network of digital resources in the European CLARIN project. The multi-media annotator ELAN (with its web-based cousin ANNEX) is now well known not only among documentary linguists. Other tools such as the lexical database tool LEXUS, the related knowledge-space builder VICOS and others are not yet widely used. With further development and integration with other tools they also have the potential for being useful tools for representing non-time-related linguistic data. We aim at present an overview of the solutions, both achieved and in development, for creating and exploiting sustainable digital data, in particular in the area of documenting languages and cultures, and their interfaces with related other developments.PARADISEC (Pacific And Regional Archive for Digital Sources in Endangered Cultures), Australian Partnership for Sustainable Repositories, Ethnographic E-Research Project and Sydney Object Repositories for Research and Teaching
    corecore