27 research outputs found
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation
Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends
Exploration and Optimization of Noise Reduction Algorithms for Speech Recognition in Embedded Devices
Environmental noise present in real-life applications substantially degrades the performance of speech recognition systems. An example is an in-car scenario where a speech recognition system has to support the man-machine interface. Several sources of noise coming from the engine, wipers, wheels etc., interact with speech. Special challenge is given in an open window scenario, where noise of traffic, park noise, etc., has to be regarded. The main goal of this thesis is to improve the performance of a speech recognition system based on a state-of-the-art hidden Markov model (HMM) using noise reduction methods. The performance is measured with respect to word error rate and with the method of mutual information. The noise reduction methods are based on weighting rules. Least-squares weighting rules in the frequency domain have been developed to enable a continuous development based on the existing system and also to guarantee its low complexity and footprint for applications in embedded devices. The weighting rule parameters are optimized employing a multidimensional optimization task method of Monte Carlo followed by a compass search method. Root compression and cepstral smoothing methods have also been implemented to boost the recognition performance. The additional complexity and memory requirements of the proposed system are minimum. The performance of the proposed system was compared to the European Telecommunications Standards Institute (ETSI) standardized system. The proposed system outperforms the ETSI system by up to 8.6 % relative increase in word accuracy and achieves up to 35.1 % relative increase in word accuracy compared to the existing baseline system on the ETSI Aurora 3 German task. A relative increase of up to 18 % in word accuracy over the existing baseline system is also obtained from the proposed weighting rules on large vocabulary databases. An entropy-based feature vector analysis method has also been developed to assess the quality of feature vectors. The entropy estimation is based on the histogram approach. The method has the advantage to objectively asses the feature vector quality regardless of the acoustic modeling assumption used in the speech recognition system
Recommended from our members
A Cognitive TV White Space Access Framework
Given the current boom in applications and services for mobile devices, data traffic is rapidly expanding, with the consequence that increasing spectrum capacity is being mandated. Following the switchover from analogue to digital platforms, Television White Space (TVWS) affords a fertile opportunity to supplement existing licensed spectrum to ease this scarcity. There are however, a number of obstacles to wide-scale TVWS adoption, including the accurate detection of primary users (PU), the hidden node problem and bandwidth availability for unlicensed secondary users (SU). Regulatory and industry bodies have sought to address some of these issues using a static database for spectrum access decisions, though this involves manual maintenance and accuracy can be compromised due to a lack of real-time information. While the new IEEE802.11af wireless local area network (WLAN) standard attempts to resolve some SU access issues, there remain many challenges, such as the critical asymmetry between mobile and base station power resources.
This thesis presents a new cognitive TVWS access framework encompassing a real-time sensing paradigm for TVWS deployment that uses a spectrum-efficient scheme to uphold quality-of-service (QoS) for both PU and SU. A novel dynamic spectrum allocation (DSA) model has been formulated allied with a resilient interference management system which exploits the unique way digital terrestrial TV channels are allocated in different geographical areas. A margin strategy has been framed to support efficient TVWS channel reuse, with an exclusion zone established to overcome the hidden node problem, while an innovative routing algorithm using cross-layer information, both extends coverage capacity and maximises QoS provision by ensuring a more balanced resource allocation.
Critical evaluation of the new access framework confirms that significant QoS improvements for SU are achieved compared to existing TVWS techniques. It importantly embodies a generic, practical, resource-efficient solution for TVWS deployment, which is compliant with current PU regulatory requirements