160 research outputs found
Driving ROVER with Segment-based ASR Quality Estimation
ROVER is a widely used method to
combine the output of multiple automatic
speech recognition (ASR) systems.
Though effective, the basic approach and
its variants suffer from potential drawbacks:
i) their results depend on the order
in which the hypotheses are used to feed
the combination process, ii) when applied
to combine long hypotheses, they disregard
possible differences in transcription
quality at local level, iii) they often rely on
word confidence information. We address
these issues by proposing a segment-based
ROVER in which hypothesis ranking is
obtained from a confidence-independent
ASR quality estimation method. Our results
on English data from the IWSLT2012
and IWSLT2013 evaluation campaigns
significantly outperform standard ROVER
and approximate two strong oracles
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
On Distant Speech Recognition for Home Automation
The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms
Distant speech recognition for home automation: Preliminary experimental results in a smart home
International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER
Private Incentives to Innovate: Interplay of New Products and Brand-Name Reputation
This paper studies the introduction of new products (increase in product variety) in the automobile industry. The focus is on the two sources of market power that may allow the firms to get higher profits (and, thus, recoup investments): new products and brand-name reputation. The effects of new products on the private incentives to innovate are investigated on the basis of the dataset for the German car industry for 2003. The dataset is rather unique in the sense that it contains detailed information on the technical characteristics of cars, prices and sales as well as information on the introduction of new car models (including new variants and versions) into the German car market at a very disaggregate level. It has been found that both a new model and brand-name reputation may allow the innovative firms to get some market power and recoup their investments. Competition is, however, not localized within a market segment and the class of new or old models, i.e., products from different market segments, new and old products compete with each other (coexisting and not eliminating each other) and do not constitute separate market niches. On the other hand, new (old) models are perceived to be closer substitutes than old (new) models. Consumer preferences towards brand and new products vary depending on their age. --discrete choice models,automobile industry,new products,innovations,brandname reputation
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
transcrater a tool for automatic speech recognition quality estimation
We present TranscRater, an open-source tool for automatic speech recognition (ASR) quality estimation (QE). The tool allows users to perform ASR evaluation bypassing the need of reference transcripts and confidence information, which is common to current assessment protocols. TranscRater includes: i) methods to extract a variety of quality indicators from (signal, transcription) pairs and ii) machine learning algorithms which make possible to build ASR QE models exploiting the extracted features. Confirming the positive results of previous evaluations, new experiments with TranscRater indicate its effectiveness both in WER prediction and transcription ranking tasks
Private incentives to innovate : interplay of new products and brand-name reputation
This paper studies the introduction of new products (increase in product variety) in the automobile industry. The focus is on the two sources of market power that may allow the firms to get higher profits (and, thus, recoup investments): new products and brand-name reputation. The effects of new products on the private incentives to innovate are investigated on the basis of the dataset for the German car industry for 2003. The dataset is rather unique in the sense that it contains detailed information on the technical characteristics of cars, prices and sales as well as information on the introduction of new car models (including new variants and versions) into the German car market at a very disaggregate level. It has been found that both a new model and brand-name reputation may allow the innovative firms to get some market power and recoup their investments. Competition is, however, not localized within a market segment and the class of new or old models, i.e., products from different market segments, new and old products compete with each other (coexisting and not eliminating each other) and do not constitute separate market niches. On the other hand, new (old) models are perceived to be closer substitutes than old (new) models. Consumer preferences towards brand and new products vary depending on their age
- …