43 research outputs found
Speech, Speaker and Speaker\'s Gender Identification in Automatically Processed Broadcast Stream
This paper presents a set of techniques for classification of audiosegments in a system for automatic transcription of broadcast programs. The task consists in deciding a) whether the segment is to be labeled as speech or a non-speech one, and in the former case, b) whether the talking person is one of the speakers in the database, and if not, c) which gender the speaker belongs to. The result of the classification is used to extend the information provided by the transcription system and also to enhance the performance of the speech recognition module. Like the most of the state-of-the-art speaker recognition systems, the proposed one is based on Gaussian Mixture Models (GMM). As the number of the database speakers can be large, we introduce a technique that speeds up the identification process in significant way. Furthermore, we compare several approaches to the estimation of GMM parameters. Finally, we present the results achieved in classification of 230 minutes of real broadcast data
Fast Keyword Spotting in Telephone Speech
In the paper, we present a system designed for detecting keywords in telephone speech. We focus not only on achieving high accuracy but also on very short processing time. The keyword spotting system can run in three modes: a) an off-line mode requiring less than 0.1xRT, b) an on-line mode with minimum (2 s) latency, and c) a repeated spotting mode, in which pre-computed values allow for additional acceleration. Its performance is evaluated on recordings of Czech spontaneous telephone speech using rather large and complex keyword lists
MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech
The paper deals with the problem of efficient adaptation of speech recognition systems to individual users. The goal is to achieve better performance in specific applications where one known speaker is expected. In our approach we adopt the MAP (Maximum A Posteriori) method for this purpose. The MAP based formulae for the adaptation of the HMM (Hidden Markov Model) parameters are described. Several alternative versions of this method have been implemented and experimentally verified in two areas, first in the isolated-word recognition (IWR) task and later also in the large vocabulary continuous speech recognition (LVCSR) system, both developed for the Czech language. The results show that the word error rate (WER) can be reduced by more than 20% for a speaker who provides tens of words (in case of IWR) or tens of sentences (in case of LVCSR) for the adaptation. Recently, we have used the described methods in the design of two practical applications: voice dictation to a PC and automatic transcription of radio and TV news
Automatic Classifiers for Medical Data from Doppler Unit
Nowadays, hand-held ultrasonic Doppler units are often used for noninvasive screening of atherosclerosis in arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. This project presents software that is able to analyze such data and classify it in real time into selected diagnostic classes. It is also capable of giving a notice of some errors encountered during measuring. At the Department of Functional Diagnostics in the Regional Hospital of Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. Consequently selected signal features were extracted and used for training a distance and a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifiers. This paper compares the results of the software with those provided by a human expert. They agreed in 89 % cases
Very Fast Keyword Spotting System with Real Time Factor below 0.01
In the paper we present an architecture of a keyword spotting (KWS) system
that is based on modern neural networks, yields good performance on various
types of speech data and can run very fast. We focus mainly on the last aspect
and propose optimizations for all the steps required in a KWS design: signal
processing and likelihood computation, Viterbi decoding, spot candidate
detection and confidence calculation. We present time and memory efficient
modelling by bidirectional feedforward sequential memory networks (an
alternative to recurrent nets) either by standard triphones or so called
quasi-monophones, and an entirely forward decoding of speech frames (with
minimal need for look back). Several variants of the proposed scheme are
evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17
hours in total) and their performance is compared by Detection Error Tradeoff
(DET) diagrams and real-time (RT) factors. We demonstrate that the complete
system can run in a single pass with a RT factor close to 0.001 if all
optimizations (including a GPU for likelihood computation) are applied.Comment: 11 pages, 3 figure
A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users
Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related phonopathies, most approaches available are targeted to a specific pathology. This may improve their accuracy for some users, but makes them unsuitable for others. In this paper, we present a cross-lingual approach to adapt a general-purpose modular speech recognizer for learning disabled people. The main advantage of this approach is that it allows rapid and cost-effective development by taking the already built speech recognition engine and its modules, and utilizing existing resources for standard speech in different languages for the recognition of the users’ atypical voices. Although the recognizers built with the proposed technique obtain lower accuracy rates than those trained for specific pathologies, they can be used by a wide population and developed more rapidly, which makes it possible to design various types of speech-based applications accessible to learning disabled users.This research was supported by the project ‘Favoreciendo la vida autónoma de discapacitados intelectuales con problemas de comunicación oral mediante interfaces personalizados de reconocimiento automático del habla’, financed by the Centre of Initiatives for Development Cooperation (Centro de Iniciativas de Cooperación al Desarrollo, CICODE), University of Granada, Spain. This research was supported by the Student Grant Scheme 2014 (SGS) at the Technical University of Liberec
SPEECH AND COMPUTER Principles of speech communication, tasks, methods and applications
The aim of the proceedings is to supply a detailed insight into computer speech processing. The publication results in the framework of the program „Support of the targeted research“ at the AS CR, in the project „Assistance, information and communication services based on advanced voice technology“