8,509 research outputs found
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies
This paper discusses the challenges that arise when large speech corpora
receive an ever-broadening range of diverse and distinct annotations. Two case
studies of this process are presented: the Switchboard Corpus of telephone
conversations and the TDT2 corpus of broadcast news. Switchboard has undergone
two independent transcriptions and various types of additional annotation, all
carried out as separate projects that were dispersed both geographically and
chronologically. The TDT2 corpus has also received a variety of annotations,
but all directly created or managed by a core group. In both cases, issues
arise involving the propagation of repairs, consistency of references, and the
ability to integrate annotations having different formats and levels of detail.
We describe a general framework whereby these issues can be addressed
successfully.Comment: 7 pages, 2 figure
Diversity and distribution of polyphagan water beetles (Coleoptera) in the Lake St Lucia system, South Africa
Water beetles belonging to the suborder Polyphaga vary greatly in larval and adult ecologies, and fulfil important functional roles in shallow-water ecosystems by processing plant material, scavenging and through predation. This study investigates the species richness and composition of aquatic polyphagan assemblages in and around the St Lucia estuarine lake (South Africa), within the iSimangaliso Wetland Park, a UNESCO World Heritage Site. A total of 32 sites were sampled over three consecutive collection trips between 2013 and 2015. The sites encompassed a broad range of aquatic habitats, being representative of the variety of freshwater and estuarine environments present on the St Lucia coastal plain. Thirty-seven polyphagan taxa were recorded during the dedicated surveys of this study, in addition to seven species-level records from historical collections. Most beetles recorded are relatively widespread Afrotropical species and only three are endemic to South Africa. Samples were dominated by members of the Hydrophilidae (27 taxa), one of which was new to science (Hydrobiomorpha perissinottoi Bilton, 2016). Despite the fauna being dominated by relatively widespread taxa, five represent new records for South Africa, highlighting the poor state of knowledge on water beetle distribution patterns in the region. Wetlands within the dense woodland characterising the False Bay region of St Lucia supported a distinct assemblage of polyphagan beetles, whilst sites occurring on the Eastern and Western Shores of Lake St Lucia were very similar in their beetle composition. In line with the Afrotropical region as a whole, the aquatic Polyphaga of St Lucia appear to be less diverse than the Hydradephaga, for which 68 species were recorded during the same period. However, the results of the present study, in conjunction with those for Hydradephaga, show that the iSimangaliso Wetland Park contains a high beetle diversity. The ongoing and future ecological protection of not only the estuarine lake itself, but also surrounding freshwater wetlands, is imperative and should be taken into consideration during future management planning for the park
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
Recovery as a social phenomenon : what is the role of the community in supporting and enabling recovery?
- …
