Search CORE

243 research outputs found

Voice-processing technologies--their application in telecommunications.

Author: J. G. Wilpon
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Overview of a media convergence centre (MC2)

Author: Chang Elizabeth
Dillon Tharam S.
Talevski Alex
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Organizational alliances are rapidly being formed as a means for effective cooperation with a common goal within a targeted value chain. The combination of such communication, coordination and cooperation leads to new organisational forms and scenarios within the Digital Ecosystem space that require technological support. Convergence refers to the move towards the use of a single united interaction medium and media. Such a solution enables telecommunications services that are concurrently coupled with enterprise and internet data. Due to the versatile nature of today's extended enterprise, a flexible, feature-rich, adaptive and widely accessible converged solution is required.This paper proposes a Media Convergence Centre (MC:) solution that allows users to participate in a converged multimedia collaboration network using a variety of interaction devices in an easy and convenient manner

Crossref

espace@Curtin

Sharing Human-Generated Observations by Integrating HMI and the Semantic Sensor Web

Author: Bizer
Bröring
David Conejero
David Díaz-Pardo
Goodchild
Harel
Hervás
Jesús Bernat
José Luis Blanco
Kuter
Luis Hernández Gómez
López de Ipiña
Sheth
Sigüenza
Sundmaeker
Vasile Vancea
Vollrath
Weiser
Álvaro Sigüenza
Publication venue: 'MDPI AG'
Publication date: 01/01/2012
Field of study

Current “Internet of Things” concepts point to a future where connected objects gather meaningful information about their environment and share it with other objects and people. In particular, objects embedding Human Machine Interaction (HMI), such as mobile devices and, increasingly, connected vehicles, home appliances, urban interactive infrastructures, etc., may not only be conceived as sources of sensor information, but, through interaction with their users, they can also produce highly valuable context-aware human-generated observations. We believe that the great promise offered by combining and sharing all of the different sources of information available can be realized through the integration of HMI and Semantic Sensor Web technologies. This paper presents a technological framework that harmonizes two of the most influential HMI and Sensor Web initiatives: the W3C’s Multimodal Architecture and Interfaces (MMI) and the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) with its semantic extension, respectively. Although the proposed framework is general enough to be applied in a variety of connected objects integrating HMI, a particular development is presented for a connected car scenario where drivers’ observations about the traffic or their environment are shared across the Semantic Sensor Web. For implementation and evaluation purposes an on-board OSGi (Open Services Gateway Initiative) architecture was built, integrating several available HMI, Sensor Web and Semantic Web technologies. A technical performance test and a conceptual validation of the scenario with potential users are reported, with results suggesting the approach is soun

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

Author: Hata Kazuya
Nakadai Kazuhiro
Sudo Yui
Publication venue
Publication date: 28/05/2023
Field of study

End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently. This often occurs with Japanese personal names that have the same pronunciation but different Kanji characters. Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation. Experimental results show that the proposed method improves the target NE character error rate by 35.7% on average relative to the conventional E2E-ASR model when selecting personal names as a target NE.Comment: accepted by INTERSPEECH202

arXiv.org e-Print Archive

Speech-centric multimodal interaction for easy-to-access online services: A personal life assistant for the elderly

Author: Almeida N.
Avelar J.
Csapó T.
Dias M. S.
Fegyó T.
Hämäläinen A.
Németh G.
Oliveira A.
Teixeira A.
Tóth B.
Zainkó C.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of services from weather information to social networking. This paper presents the multimodal architecture of the PLA, the services provided by the PLA, and the work done in the area of speech input and output modalities, which play a key role in the application.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

Speech-centric multimodal interaction for easy-to-access online services: A personal life assistant for the elderly

Author: Almeida N.
Avelar J.
Csapó T.
Dias M. S.
Fegyó T.
Hämäläinen A.
Németh G.
Oliveira A.
Teixeira A.
Tóth B.
Zainkó C.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Elsevier - Publisher Connector

Repositório Institucional do ISCTE-IUL

Speaker diarization and speech recognition in the semi-automatization of audio description : an exploratory study on future possibilities?

Author: Delgado Flores Héctor
Matamala Anna
Serrano García Javier
Publication venue: 'Universidade Federal de Santa Catarina (UFSC)'
Publication date: 01/01/2015
Field of study

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarizedEste artículo presenta una visión panorámica de los componentes tecnológicos usados en el proceso de audiodescripción y propone un nuevo escenario en el que se aplicarían el reconocimiento de habla, la traducción automática y la síntesis de habla, con su correspondiente revisión humana, para incrementar la cantidad de audiodescripciones disponibles. El artículo describe un proceso en el que la diarización y el reconocimiento de habla permiten obtener una transcripción semiautomática de la audiodescripción. El artículo presenta detalladamente el proceso técnico así como un resumen de los resultados experimentales.- In a second languag

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

DIALNET

Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu

Author: Nik Nurhidayat Nik Him
Publication venue
Publication date: 01/01/2015
Field of study

Automatic speech recognition (ASR) is potentially helpful for children who suffer from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM

Universiti Utara Malaysia: UUM eTheses

Improvements of Hungarian Hidden Markov Model-based text-to-speech synthesis

Author: Németh Géza
Tóth Bálint
Publication venue
Publication date: 01/01/2010
Field of study

Statistical parametric, especially Hidden Markov Model-based, text-to-speech (TTS) synthesis has received much attention recently. The quality of HMM-based speech synthesis approaches that of the state-of-the-art unit selection systems and possesses numerous favorable features, e.g. small runtime footprint, speaker interpolation, speaker adaptation. This paper presents the improvements of a Hungarian HMM-based speech synthesis system, including speaker dependent and adaptive training, speech synthesis with pulse-noise and mixed excitation. Listening tests and their evaluation are also described

University of Szeged

Thousands of Voices for HMM-Based Speech Synthesis-Analysis and Application of TTS Systems Built on Various ASR Corpora

Author: Dines John
Guan Yong
Hu Rile
Karhila Reima
King Simon
Kurimo Mikko
Oura Keiichiro
Tian Jilei
Tokuda Keiichi
Usabaev Bela
Watts Oliver
Wu Yi-Jian
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2010
Field of study

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an "average voice model" plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on "non-TTS" corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues

Edinburgh Research Archive

Edinburgh Research Explorer