4,430 research outputs found
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
Multilingual speech recognition for the elderly: The AALFred personal life assistant
The PaeLife project is a European industry-academia collaboration in the framework of the Ambient Assisted Living Joint Programme (AAL JP), with a goal of developing a multimodal, multilingual virtual personal life assistant to help senior citizens remain active and socially integrated. Speech is one of the key interaction modalities of AALFred, the Windows application developed in the project; the application can be controlled using speech input in four European languages: French, Hungarian, Polish and Portuguese. This paper briefly presents the personal life assistant and then focuses on the speech-related achievements of the project. These include the collection, transcription and annotation of large corpora of elderly speech, the development of automatic speech recognisers optimised for elderly speakers, a speech modality component that can easily be reused in other applications, and an automatic grammar translation service that allows for fast expansion of the automatic speech recognition functionality to new languages.info:eu-repo/semantics/publishedVersio
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Personal assistants, automatic speech recognizers and dialogue understanding
systems are becoming more critical in our interconnected digital world. A clear
example is air traffic control (ATC) communications. ATC aims at guiding
aircraft and controlling the airspace in a safe and optimal manner. These
voice-based dialogues are carried between an air traffic controller (ATCO) and
pilots via very-high frequency radio channels. In order to incorporate these
novel technologies into ATC (low-resource domain), large-scale annotated
datasets are required to develop the data-driven AI systems. Two examples are
automatic speech recognition (ASR) and natural language understanding (NLU). In
this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering
research on the challenging ATC field, which has lagged behind due to lack of
annotated data. The ATCO2 corpus covers 1) data collection and pre-processing,
2) pseudo-annotations of speech data, and 3) extraction of ATC-related named
entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set
corpus contains 4 hours of ATC speech with manual transcripts and a subset with
gold annotations for named-entity recognition (callsign, command, value). 2)
The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched
with automatic transcripts from an in-domain speech recognizer, contextual
information, speaker turn information, signal-to-noise ratio estimate and
English language detection score per sample. Both available for purchase
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3)
The ATCO2-test-set-1h corpus is a one-hour subset from the original test set
corpus, that we are offering for free at https://www.atco2.org/data. We expect
the ATCO2 corpus will foster research on robust ASR and NLU not only in the
field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at
https://github.com/idiap/atco2-corpu
Spoken dialogue systems: architectures and applications
171 p.Technology and technological devices have become habitual and omnipresent. Humans need to learn tocommunicate with all kind of devices. Until recently humans needed to learn how the devices expressthemselves to communicate with them. But in recent times the tendency has become to makecommunication with these devices in more intuitive ways. The ideal way to communicate with deviceswould be the natural way of communication between humans, the speech. Humans have long beeninvestigating and designing systems that use this type of communication, giving rise to the so-calledSpoken Dialogue Systems.In this context, the primary goal of the thesis is to show how these systems can be implemented.Additionally, the thesis serves as a review of the state-of-the-art regarding architectures and toolkits.Finally, the thesis is intended to serve future system developers as a guide for their construction. For that
- …