263 research outputs found

    Archiving and accessing language resources

    Get PDF
    Languages are among the most complex systems that evolution has created. With an unforeseen speed many of these unique results of evolution are currently disappearing: every two weeks one of the 6500 still spoken languages is dying and many are subject to extreme changes due to globalization. Experts understood the need to document the languages and preserve the cultural and linguistic treasures embedded in them for future generations. Also linguistic theory will need to consider the variation of the linguistic systems encoded in languages to improve our understanding of how human minds process language material, thus accessibility to all types of resources is increasingly crucial. Deeper insights into human language processing and a higher degree of integration and interoperability between resources will also improve our language processing technology. The DOBES programme is focussing on the documentation and preservation of language material. The Max Planck Institute developed the Language Archiving Technology to help researchers when creating, archiving and accessing language resources. The recently started CLARIN research infrastructure has as main goals to achieve a broad visibility and an easy accessibility of language resources

    Report on the 2015 NSF Workshop on Unified Annotation Tooling

    Get PDF
    On March 30 & 31, 2015, an international group of twenty-three researchers with expertise in linguistic annotation convened in Sunny Isles Beach, Florida to discuss problems with and potential solutions for the state of linguistic annotation tooling. The participants comprised 14 researchers from the U.S. and 9 from outside the U.S., with 7 countries and 4 continents represented, and hailed from fields and specialties including computational linguistics, artificial intelligence, speech processing, multi-modal data processing, clinical & medical natural language processing, linguistics, documentary linguistics, sign-language linguistics, corpus linguistics, and the digital humanities. The motivating problem of the workshop was the balkanization of annotation tooling, namely, that even though linguistic annotation requires sophisticated tool support to efficiently generate high-quality data, the landscape of tools for the field is fractured, incompatible, inconsistent, and lacks key capabilities. The overall goal of the workshop was to chart the way forward, centering on five key questions: (1) What are the problems with current tool landscape? (2) What are the possible benefits of solving some or all of these problems? (3) What capabilities are most needed? (4) How should we go about implementing these capabilities? And, (5) How should we ensure longevity and sustainability of the solution? I surveyed the participants before their arrival, which provided significant raw material for ideas, and the workshop discussion itself resulted in identification of ten specific classes of problems, five sets of most-needed capabilities. Importantly, we identified annotation project managers in computational linguistics as the key recipients and users of any solution, thereby succinctly addressing questions about the scope and audience of potential solutions. We discussed management and sustainability of potential solutions at length. The participants agreed on sixteen recommendations for future work. This technical report contains a detailed discussion of all these topics, a point-by-point review of the discussion in the workshop as it unfolded, detailed information on the participants and their expertise, and the summarized data from the surveys

    Audiovisual Media Annotation Using Qualitative Data Analysis Software: A Comparative Analysis

    Get PDF
    The variety of specialized tools designed to facilitate analysis of audio-visual (AV) media are useful not only to media scholars and oral historians but to other researchers as well. Both Qualitative Data Analysis Software (QDAS) packages and dedicated systems created for specific disciplines, such as linguistics, can be used for this purpose. Software proliferation challenges researchers to make informed choices about which package will be most useful for their project. This paper aims to present an information science perspective of the scholarly use of tools in qualitative research of audio-visual sources. It provides a baseline of affordances based on functionalities with the goal of making the types of research tasks that they support more explicit (e.g., transcribing, segmenting, coding, linking, and commenting on data). We look closely at how these functionalities relate to each other, and at how system design influences research tasks

    Creating multimedia dictionaries of endangered languages using LEXUS

    No full text
    This paper reports on the development of a flexible web based lexicon tool, LEXUS. LEXUS is targeted at linguists involved in language documentation (of endangered languages). It allows the creation of lexica within the structure of the proposed ISO LMF standard and uses the proposed concept naming conventions from the ISO data categories, thus enabling interoperability, search and merging. LEXUS also offers the possibility to visualize language, since it provides functionalities to include audio, video and still images to the lexicon. With LEXUS it is possible to create semantic network knowledge bases, using typed relations. The LEXUS tool is free for use. Index Terms: lexicon, web based application, endangered languages, language documentation

    New and future developments in EXMARaLDA

    Get PDF
    We present some recent and planned future developments in EXMARaLDA, a system for creating, managing, analysing and publishing spoken language corpora. The new functionality concerns the areas of transcription and annotation, corpus management, query mechanisms, interoperability and corpus deployment. Future work is planned in the areas of automatic annotation, standardisation and workflow management

    Transcribing and annotating spoken language with EXMARaLDA

    Get PDF
    This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA “single timeline, multiple tiers” data model and format is presented alongside with the EXMARaLDA Partitur-Editor, a tool for inputting and visualizing such data. This is followed by a discussion of the interaction of EXMARaLDA with other frameworks and tools that work with similar data models. Finally, this paper presents an extension of the “single timeline, multiple tiers” data model and describes its application within the EXMARaLDA system

    Architectures For Dynamic Terrain And Dynamic Environments In Distributed Interactive Simulation

    Get PDF
    Report on several computer architectures which can support dynamic entity/environment interaction in the distributed interactive simulation paradigm

    Diagnosing interoperability problems and debugging models by enhancing constraint satisfaction with case -based reasoning

    Get PDF
    Modeling, Diagnosis, and Model Debugging are the three main areas presented in this dissertation to automate the process of Interoperability Testing of networking protocols. The dissertation proposes a framework that uses the Constraint Satisfaction Problem (CSP) paradigm to define a modeling language and problem solving mechanism for interoperability testing, and uses Case-Based Reasoning (CBR) for debugging interoperability test cases. The dissertation makes three primary contributions: (1) Definition of a new modeling language using CSP and Object-Oriented Programming. This language is simple, declarative, and transparent. It provides a tool for testers to implement models of interoperability test cases. The dissertation introduces the notions of metavariables, metavalues and optional metavariables to improve the modeling language capabilities. It proposes modeling of test cases from test suite specifications that are usually used in interoperability testing performed manually by testers. Test suite specifications are written by organizations or individuals and break down the testing into modules of test cases that make diagnosis of problems more meaningful to testers. (2) Diagnosis of interoperability problems using search supplemented by consistency inference methods in a CSP context to support explanations of the problem solving behavior. These methods are adapted to the OO-based CSP context. Testers can then generate reports for individual test cases and for test groups from a test suite specification. (3) Detection and debugging of incompleteness and incorrectness in CSP models of interoperability test cases. This is done through the integration of two modes of reasoning, namely CBR and CSP. CBR manages cases that store information about updating models as well as cases that are related to interoperability problems where diagnosis fails to generate a useful explanation. For the latter cases, CBR recalls previous similar useful explanations
    • …
    corecore