4,430 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Design of a Controlled Language for Critical Infrastructures Protection

    Get PDF
    We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen

    Multilingual speech recognition for the elderly: The AALFred personal life assistant

    Get PDF
    The PaeLife project is a European industry-academia collaboration in the framework of the Ambient Assisted Living Joint Programme (AAL JP), with a goal of developing a multimodal, multilingual virtual personal life assistant to help senior citizens remain active and socially integrated. Speech is one of the key interaction modalities of AALFred, the Windows application developed in the project; the application can be controlled using speech input in four European languages: French, Hungarian, Polish and Portuguese. This paper briefly presents the personal life assistant and then focuses on the speech-related achievements of the project. These include the collection, transcription and annotation of large corpora of elderly speech, the development of automatic speech recognisers optimised for elderly speakers, a speech modality component that can easily be reused in other applications, and an automatic grammar translation service that allows for fast expansion of the automatic speech recognition functionality to new languages.info:eu-repo/semantics/publishedVersio

    ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

    Full text link
    Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at https://github.com/idiap/atco2-corpu

    Spoken dialogue systems: architectures and applications

    Get PDF
    171 p.Technology and technological devices have become habitual and omnipresent. Humans need to learn tocommunicate with all kind of devices. Until recently humans needed to learn how the devices expressthemselves to communicate with them. But in recent times the tendency has become to makecommunication with these devices in more intuitive ways. The ideal way to communicate with deviceswould be the natural way of communication between humans, the speech. Humans have long beeninvestigating and designing systems that use this type of communication, giving rise to the so-calledSpoken Dialogue Systems.In this context, the primary goal of the thesis is to show how these systems can be implemented.Additionally, the thesis serves as a review of the state-of-the-art regarding architectures and toolkits.Finally, the thesis is intended to serve future system developers as a guide for their construction. For that
    corecore