2,299 research outputs found

    Data formats for phonological corpora

    Get PDF
    The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardised formats that allow a high re-use rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.Comment: Handbook of Corpus Phonology Oxford University Press (Ed.) (2012

    Enroller: an experiment in aggregating resources

    Get PDF
    This chapter describes a collaborative project between e-scientists and humanists working to create an online repository of linguistic data sets and tools. Corpora, dictionaries, and a thesaurus are brought together to enable a new method of research. It combines our most advanced knowledge in both computing and linguistic research techniques

    A new web interface to facilitate access to corpora: development of the ASLLRP data access interface

    Full text link
    A significant obstacle to broad utilization of corpora is the difficulty in gaining access to the specific subsets of data and annotations that may be relevant for particular types of research. With that in mind, we have developed a web-based Data Access Interface (DAI), to provide access to the expanding datasets of the American Sign Language Linguistic Research Project (ASLLRP). The DAI facilitates browsing the corpora, viewing videos and annotations, searching for phenomena of interest, and downloading selected materials from the website. The web interface, compared to providing videos and annotation files off-line, also greatly increases access by people that have no prior experience in working with linguistic annotation tools, and it opens the door to integrating the data with third-party applications on the desktop and in the mobile space. In this paper we give an overview of the available videos, annotations, and search functionality of the DAI, as well as plans for future enhancements. We also summarize best practices and key lessons learned that are crucial to the success of similar projects

    The Validation of Speech Corpora

    Get PDF
    1.2 Intended audience........................

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

    Full text link
    Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines areas of common need among empirical linguists and computational linguists. After reviewing examples of data and tools used or under development for each of several areas, it proposes a common framework for future tool development, data annotation and resource sharing based upon annotation graphs and servers.Comment: 8 pages, 6 figure

    Phonology and intonation

    Get PDF
    The encoding standards for phonology and intonation are designed to facilitate consistent annotation of the phonological and intonational aspects of information structure, in languages across a range ofprosodic types. The guidelines are designed with the aim that a nonspecialist in phonology can both implement and interpret the resulting annotation

    Production Methods

    Get PDF

    EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

    Get PDF
    This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA
    corecore