420 research outputs found

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Document Level Semantic Context for Retrieving OOV Proper Names

    Get PDF
    International audienceRecognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data.However, many PNs are Out-Of-Vocabulary (OOV) words nfor LVCSR systems used in these applications due to the diachronicnature of data. By exploiting semantic context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their context with document level semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level semantic context is reliable for recovery of OOV PNs

    On the voice-activated question answering

    Full text link
    [EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542

    Transformational tagging for topic tracking in natural language.

    Get PDF
    Ip Chun Wah Timmy.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 113-120).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Topic Detection and Tracking --- p.2Chapter 1.1.1 --- What is a Topic? --- p.3Chapter 1.1.2 --- What is Topic Tracking? --- p.4Chapter 1.2 --- Research Contributions --- p.4Chapter 1.2.1 --- Named Entity Tagging --- p.5Chapter 1.2.2 --- Handling Unknown Words --- p.6Chapter 1.2.3 --- Named-Entity Approach in Topic Tracking --- p.7Chapter 1.3 --- Organization of Thesis --- p.7Chapter 2 --- Background --- p.9Chapter 2.1 --- Previous Developments in Topic Tracking --- p.10Chapter 2.1.1 --- BBN's Tracking System --- p.10Chapter 2.1.2 --- CMU's Tracking System --- p.11Chapter 2.1.3 --- Dragon's Tracking System --- p.12Chapter 2.1.4 --- UPenn's Tracking System --- p.13Chapter 2.2 --- Topic Tracking in Chinese --- p.13Chapter 2.3 --- Part-of-Speech Tagging --- p.15Chapter 2.3.1 --- A Brief Overview of POS Tagging --- p.15Chapter 2.3.2 --- Transformation-based Error-Driven Learning --- p.18Chapter 2.4 --- Unknown Word Identification --- p.20Chapter 2.4.1 --- Rule-based approaches --- p.21Chapter 2.4.2 --- Statistical approaches --- p.23Chapter 2.4.3 --- Hybrid approaches --- p.24Chapter 2.5 --- Information Retrieval Models --- p.25Chapter 2.5.1 --- Vector-Space Model --- p.26Chapter 2.5.2 --- Probabilistic Model --- p.27Chapter 2.6 --- Chapter Summary --- p.28Chapter 3 --- System Overview --- p.29Chapter 3.1 --- Segmenter --- p.30Chapter 3.2 --- TEL Tagger --- p.31Chapter 3.3 --- Unknown Words Identifier --- p.32Chapter 3.4 --- Topic Tracker --- p.33Chapter 3.5 --- Chapter Summary --- p.34Chapter 4 --- Named Entity Tagging --- p.36Chapter 4.1 --- Experimental Data --- p.37Chapter 4.2 --- Transformational Tagging --- p.41Chapter 4.2.1 --- Notations --- p.41Chapter 4.2.2 --- Corpus Utilization --- p.42Chapter 4.2.3 --- Lexical Rules --- p.42Chapter 4.2.4 --- Contextual Rules --- p.47Chapter 4.3 --- Experiment and Result --- p.49Chapter 4.3.1 --- Lexical Tag Initialization --- p.50Chapter 4.3.2 --- Contribution of Lexical and Contextual Rules --- p.52Chapter 4.3.3 --- Performance on Unknown Words --- p.56Chapter 4.3.4 --- A Possible Benchmark --- p.57Chapter 4.3.5 --- Comparison between TEL Approach and the Stochas- tic Approach --- p.58Chapter 4.4 --- Chapter Summary --- p.59Chapter 5 --- Handling Unknown Words in Topic Tracking --- p.62Chapter 5.1 --- Overview --- p.63Chapter 5.2 --- Person Names --- p.64Chapter 5.2.1 --- Forming possible named entities from OOV by group- ing n-grams --- p.66Chapter 5.2.2 --- Overlapping --- p.69Chapter 5.3 --- Organization Names --- p.71Chapter 5.4 --- Location Names --- p.73Chapter 5.5 --- Dates and Times --- p.74Chapter 5.6 --- Chapter Summary --- p.75Chapter 6 --- Topic Tracking in Chinese --- p.77Chapter 6.1 --- Introduction of Topic Tracking --- p.78Chapter 6.2 --- Experimental Data --- p.79Chapter 6.3 --- Evaluation Methodology --- p.81Chapter 6.3.1 --- Cost Function --- p.82Chapter 6.3.2 --- DET Curve --- p.83Chapter 6.4 --- The Named Entity Approach --- p.85Chapter 6.4.1 --- Designing the Named Entities Set for Topic Tracking --- p.85Chapter 6.4.2 --- Feature Selection --- p.86Chapter 6.4.3 --- Integrated with Vector-Space Model --- p.87Chapter 6.5 --- Experimental Results and Analysis --- p.91Chapter 6.5.1 --- Notations --- p.92Chapter 6.5.2 --- Stopword Elimination --- p.92Chapter 6.5.3 --- TEL Tagging --- p.95Chapter 6.5.4 --- Unknown Word Identifier --- p.100Chapter 6.5.5 --- Error Analysis --- p.106Chapter 6.6 --- Chapter Summary --- p.108Chapter 7 --- Conclusions and Future Work --- p.110Chapter 7.1 --- Conclusions --- p.110Chapter 7.2 --- Future Work --- p.111Bibliography --- p.113Chapter A --- The POS Tags --- p.121Chapter B --- Surnames and transliterated characters --- p.123Chapter C --- Stopword List for Person Name --- p.126Chapter D --- Organization suffixes --- p.127Chapter E --- Location suffixes --- p.128Chapter F --- Examples of Feature Table (Train set with condition D410) --- p.12

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

    Get PDF
    International audienceOut-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered withthe help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how thechoice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. Wefirst present our diachronic French broadcast news datasets, which highlight the motivation of our study on OOV PNs. Then the effect ofusing diachronic text data from different sources and a different time span is analysed. With OOV PN retrieval experiments on Frenchbroadcast news videos, we conclude that a diachronic corpus with text from different sources leads to better retrieval performance thanone relying on text from single source or from a longer time span
    corecore