8,509 research outputs found

    Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies

    Full text link
    This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out as separate projects that were dispersed both geographically and chronologically. The TDT2 corpus has also received a variety of annotations, but all directly created or managed by a core group. In both cases, issues arise involving the propagation of repairs, consistency of references, and the ability to integrate annotations having different formats and levels of detail. We describe a general framework whereby these issues can be addressed successfully.Comment: 7 pages, 2 figure

    Transparency in planning, warranting and interpreting research

    Get PDF

    Diversity and distribution of polyphagan water beetles (Coleoptera) in the Lake St Lucia system, South Africa

    Get PDF
    Water beetles belonging to the suborder Polyphaga vary greatly in larval and adult ecologies, and fulfil important functional roles in shallow-water ecosystems by processing plant material, scavenging and through predation. This study investigates the species richness and composition of aquatic polyphagan assemblages in and around the St Lucia estuarine lake (South Africa), within the iSimangaliso Wetland Park, a UNESCO World Heritage Site. A total of 32 sites were sampled over three consecutive collection trips between 2013 and 2015. The sites encompassed a broad range of aquatic habitats, being representative of the variety of freshwater and estuarine environments present on the St Lucia coastal plain. Thirty-seven polyphagan taxa were recorded during the dedicated surveys of this study, in addition to seven species-level records from historical collections. Most beetles recorded are relatively widespread Afrotropical species and only three are endemic to South Africa. Samples were dominated by members of the Hydrophilidae (27 taxa), one of which was new to science (Hydrobiomorpha perissinottoi Bilton, 2016). Despite the fauna being dominated by relatively widespread taxa, five represent new records for South Africa, highlighting the poor state of knowledge on water beetle distribution patterns in the region. Wetlands within the dense woodland characterising the False Bay region of St Lucia supported a distinct assemblage of polyphagan beetles, whilst sites occurring on the Eastern and Western Shores of Lake St Lucia were very similar in their beetle composition. In line with the Afrotropical region as a whole, the aquatic Polyphaga of St Lucia appear to be less diverse than the Hydradephaga, for which 68 species were recorded during the same period. However, the results of the present study, in conjunction with those for Hydradephaga, show that the iSimangaliso Wetland Park contains a high beetle diversity. The ongoing and future ecological protection of not only the estuarine lake itself, but also surrounding freshwater wetlands, is imperative and should be taken into consideration during future management planning for the park

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
    corecore