1,087 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Prototyping a Chatbot for Student Supervision in a Pre-registration Process

    Get PDF
    Developing a chatbot becomes a challenging task when it is built from scratch and independent of any Software as a Service (SaaS). Inspired by the idea of freeing lecturers from the burden of answering the same questions repetitively during the pre-registration process, this research has succeeded in building a textbased chatbot system. Further, this research has proved that the combination of keyword spotting technique for the Language Understanding component, Finite-State Transducer (FST) for the Dialogue Management, rulebased keyword matching for language generation, and the system-in-the-loop paradigm for system validation can produce an efficient chatbot. The chatbot efficiency is high enough as its score on Concept Efficiency (CE) reaches 0.946. It shows that users do not need to repeat their utterances several times to be understood. The chatbot performance on recognizing new concepts introduced by users is also more than satisfactory which is presented by its Query Density (QD) score of 0.80

    Speech and Speaker Recognition for Home Automation: Preliminary Results

    No full text
    International audienceIn voice controlled multi-room smart homes ASR and speaker identification systems face distance speech conditionswhich have a significant impact on performance. Regarding voice command recognition, this paper presents an approach whichselects dynamically the best channel and adapts models to the environmental conditions. The method has been tested on datarecorded with 11 elderly and visually impaired participants in a real smart home. The voice command recognition error ratewas 3.2% in off-line condition and of 13.2% in online condition. For speaker identification, the performances were below veryspeaker dependant. However, we show a high correlation between performance and training size. The main difficulty was the tooshort utterance duration in comparison to state of the art studies. Moreover, speaker identification performance depends on the sizeof the adapting corpus and then users must record enough data before using the system

    Building multi-domain conversational systems from single domain resources

    Get PDF
    Current Advances In The Development Of Mobile And Smart Devices Have Generated A Growing Demand For Natural Human-Machine Interaction And Favored The Intelligent Assistant Metaphor, In Which A Single Interface Gives Access To A Wide Range Of Functionalities And Services. Conversational Systems Constitute An Important Enabling Technology In This Paradigm. However, They Are Usually Defined To Interact In Semantic-Restricted Domains In Which Users Are Offered A Limited Number Of Options And Functionalities. The Design Of Multi-Domain Systems Implies That A Single Conversational System Is Able To Assist The User In A Variety Of Tasks. In This Paper We Propose An Architecture For The Development Of Multi-Domain Conversational Systems That Allows: (1) Integrating Available Multi And Single Domain Speech Recognition And Understanding Modules, (2) Combining Available System In The Different Domains Implied So That It Is Not Necessary To Generate New Expensive Resources For The Multi-Domain System, (3) Achieving Better Domain Recognition Rates To Select The Appropriate Interaction Management Strategies. We Have Evaluated Our Proposal Combining Three Systems In Different Domains To Show That The Proposed Architecture Can Satisfactory Deal With Multi-Domain Dialogs. (C) 2017 Elsevier B.V. All Rights Reserved.Work partially supported by projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02
    corecore