122 research outputs found
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
The CLEAR 2007 Evaluation
Abstract. This paper is a summary of the 2007 CLEAR Evaluation on the Classification of Events, Activities, and Relationships which took place in early 2007 and culminated with a two-day workshop held in May 2007. CLEAR is an international effort to evaluate systems for the perception of people, their activities, and interactions. In its second year, CLEAR has developed a following from the computer vision and speech communities, spawning a more multimodal perspective of research eval-uation. This paper describes the evaluation tasks, including metrics and databases used, and discusses the results achieved. The CLEAR 2007 tasks comprise person, face, and vehicle tracking, head pose estimation, as well as acoustic scene analysis. These include subtasks performed in the visual, acoustic and audio-visual domains for meeting room and surveillance data.
The non-Verbal Structure of Patient Case Discussions in Multidisciplinary Medical Team Meetings
Meeting analysis has a long theoretical tradition in social psychology, with established practical rami?cations in computer science, especially in computer supported cooperative work. More recently, a good deal of research has focused on the issues of indexing and browsing multimedia records of meetings. Most research in this area, however, is still based on data collected in laboratories, under somewhat arti?cial conditions. This paper presents an analysis of the discourse structure and spontaneous interactions at real-life multidisciplinary medical team meetings held as part of the work routine in a major hospital. It is hypothesised that the conversational structure of these meetings, as indicated by sequencing and duration of vocalisations, enables segmentation into individual patient case discussions. The task of segmenting audio-visual records of multidisciplinary medical team meetings is described as a topic segmentation task, and a method for automatic segmentation is proposed. An empirical evaluation based on hand labelled data is presented which determines the optimal length of vocalisation sequences for segmentation, and establishes the competitiveness of the method with approaches based on more complex knowledge sources. The effectiveness of Bayesian classi?cation as a segmentation method, and its applicability to meeting segmentation in other domains are discusse
The TREC-6 Spoken Document Retrieval Track
The Text REtrieval Conference (TREC) workshops provide a forum for different groups to compare retrieval systems on common retrieval tasks. The 1997 TREC workshop will feature a Spoken Document Retrieval task for the first time. This paper motivates the task and describes the measures to be used to evaluate the effectiveness of the retrieval methodologies. 1. The Text REtrieval Conference The Text REtrieval Conference (TREC) series is cosponsored by the National Institute of Standards and Technology (NIST) and the Information Technology Office of the Defense Advanced Research Projects Agency (DARPA) as part of the TIPSTER Text Program. The series, which started in 1992, is designed to promote research in information retrieval by providing appropriate test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. Thirty-eight groups including representatives from nine different countries participated in TREC-5 in November, 1996. TRE..
CSR-IV HUB3
This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB3 Multi-Microphone tests. The data consists of digitized waveforms collected with eight different microphones simultaneously from 40 subjects reading 15 sentence articles drawn from various North American business news publications. The data is partitioned into development-test and evaluation-test sets. The test sets were collected with different subjects, prompts and microphones. No training data was collected for this corpus since a substantial amount of NAB acoustic training data was already available. Index files have been included that specify the exact subset of the evaluation test recordings which were used in the November 1995 tests. The software NIST used to process and score the output of the tests systems is also included.
The data is organized as follows:
CD26-3 Development-Test Data-Location 1, Adaptation and NAB recordings, Subjects:703-705, 707-70a, 70c, 70f, 70g
CD26-4 Development-Test Data-Location 2, NAB recordings, Subjects:70k, 70m, 70o, 70q-70s, 70u-70w
CD26-5 Development-Test Data-Location 2, Adaptation recordings, Subjects:70k 70m-70o, 70q-70s, 70u-70w
CD26-3 Development-Test Data-NAB recordings, Subjects:710-71j
As of September, 2007 this publication has been condensed to fit on a single DVD. The data on each CD resides in its own directory labeled with the above NIST labels
Recent improvements to the ATLAS architecture
We examine the recent improvements that were made to the ATLAS (Architecture and Tools for Linguistic Analysis Systems) architecture. We first introduce the architecture and the historical context for this work. Next, we describe NISTâs initial implementation of the framework before analyzing it. We then focus on three important improvements (relating to multi-dimensional signals, hierarchical structures and validation) we have made to the architecture to make it more usable. We conclude by summarizing the major points covered and discuss plans for future work
CSR-IV HUB4
This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB4 (Radio) Broadcast News tests. The data consists of digitized waveforms of MarketPlace (tm) business news radio shows provided by KUSC through an agreement with the Linguistic Data Consortium and detailed transcriptions of those broadcasts. The software NIST used to process and score the output of the test systems is also included.
The data is organized as follows:
CD26-1: Training Data-Ten complete half-hour broadcasts with minimal-verified transcripts. The transcripts are time aligned with the waveforms at the story-boundary level.
CD26-2: Development-Test Data-Six complete half-hour broadcasts with verified transcripts. The transcripts are time aligned with the waveforms at the story- and turn-boundary level. Index files have been included which specify how the data may be partitioned into 2 test sets.
CD26-6 Evaluation-Test Data-Five complete half-hour broadcasts with verified/adjudicated transcripts. The transcripts are time aligned with the waveforms at the story-, turn- and music-boundary level. An index file has been included which specifies how the data was partitioned into the test set used in the CSR 1995 HUB4 tests
The Rich Transcription 2006 Spring Meeting Recognition Evaluation
Abstract. We present the design and results of the Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation; the fourth in a series of community-wide evaluations of language technologies in the meeting domain. For 2006, we supported three evaluation tasks in two meeting sub-domains: the Speech-To-Text (STT) transcription task, and the âWho Spoke When â and âSpeech Activity Detection â diarization tasks. The meetings were from the Conference Meeting, and Lecture Meeting sub-domains. The lowest STT word error rate, with up to four simultaneous speakers, in the multiple distant micro-phone condition was 46.3 % for the conference sub-domain, and 53.4 % for the lecture sub-domain. For the âWho Spoke When â task, the lowest diarization er-ror rates for all speech were 35.8 % and 24.0 % for the conference and lecture sub-domains respectively. For the âSpeech Activity Detection â task, the lowest diarization error rates were 4.3 % and 8.0 % for the conference and lecture sub-domains respectively. 1
- âŠ