257 research outputs found
Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech
Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques
Towards the automated analysis of simple polyphonic music : a knowledge-based approach
PhDMusic understanding is a process closely related to the knowledge and experience
of the listener. The amount of knowledge required is relative to the
complexity of the task in hand.
This dissertation is concerned with the problem of automatically decomposing
musical signals into a score-like representation. It proposes that, as
with humans, an automatic system requires knowledge about the signal and
its expected behaviour to correctly analyse music.
The proposed system uses the blackboard architecture to combine the
use of knowledge with data provided by the bottom-up processing of the
signal's information. Methods are proposed for the estimation of pitches,
onset times and durations of notes in simple polyphonic music.
A method for onset detection is presented. It provides an alternative to
conventional energy-based algorithms by using phase information. Statistical
analysis is used to create a detection function that evaluates the expected
behaviour of the signal regarding onsets.
Two methods for multi-pitch estimation are introduced. The first concentrates
on the grouping of harmonic information in the frequency-domain.
Its performance and limitations emphasise the case for the use of high-level
knowledge.
This knowledge, in the form of the individual waveforms of a single
instrument, is used in the second proposed approach. The method is based
on a time-domain linear additive model and it presents an alternative to
common frequency-domain approaches.
Results are presented and discussed for all methods, showing that, if
reliably generated, the use of knowledge can significantly improve the quality
of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela
PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION
Práce pojednává o fonotaktickĂ©m a akustickĂ©m pĹ™Ăstupu pro automatickĂ© rozpoznávánĂ jazyka. Prvnà část práce pojednává o fonotaktickĂ©m pĹ™Ăstupu zaloĹľenĂ©m na vĂ˝skytu fonĂ©movĂ˝ch sekvenci v Ĺ™eÄŤi. NejdĹ™Ăve je prezentován popis vĂ˝voje fonĂ©movĂ©ho rozpoznávaÄŤe jako techniky pro pĹ™epis Ĺ™eÄŤi do sekvence smysluplnĂ˝ch symbolĹŻ. HlavnĂ dĹŻraz je kladen na dobrĂ© natrĂ©novánĂ fonĂ©movĂ©ho rozpoznávaÄŤe a kombinaci vĂ˝sledkĹŻ z nÄ›kolika fonĂ©movĂ˝ch rozpoznávaÄŤĹŻ trĂ©novanĂ˝ch na rĹŻznĂ˝ch jazycĂch (ParalelnĂ fonĂ©movĂ© rozpoznávánĂ následovanĂ© jazykovĂ˝mi modely (PPRLM)). Práce takĂ© pojednává o novĂ© technice anti-modely v PPRLM a studuje pouĹľitĂ fonĂ©movĂ˝ch grafĹŻ mĂsto nejlepšĂho pĹ™episu. Na závÄ›r práce jsou porovnány dva pĹ™Ăstupy modelovánĂ vĂ˝stupu fonĂ©movĂ©ho rozpoznávaÄŤe -- standardnĂ n-gramovĂ© jazykovĂ© modely a binárnĂ rozhodovacĂ stromy. HlavnĂ pĹ™Ănos v akustickĂ©m pĹ™Ăstupu je diskriminativnĂ modelovánĂ cĂlovĂ˝ch modelĹŻ jazykĹŻ a prvnĂ experimenty s kombinacĂ diskriminativnĂho trĂ©novánĂ a na pĹ™ĂznacĂch, kde byl odstranÄ›n vliv kanálu. Práce dále zkoumá rĹŻznĂ© druhy technik fĂşzi akustickĂ©ho a fonotaktickĂ©ho pĹ™Ăstupu. Všechny experimenty jsou provedeny na standardnĂch datech z NIST evaluaci konanĂ© v letech 2003, 2005 a 2007, takĹľe jsou pĹ™Ămo porovnatelnĂ© s vĂ˝sledky ostatnĂch skupin zabĂ˝vajĂcĂch se automatickĂ˝m rozpoznávánĂm jazyka. S fĂşzĂ uvedenĂ˝ch technik jsme posunuli state-of-the-art vĂ˝sledky a dosáhli vynikajĂcĂch vĂ˝sledkĹŻ ve dvou NIST evaluacĂch.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.
Dynamic Protocol Reverse Engineering a Grammatical Inference Approach
Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration
Fluvial Processes in Motion: Measuring Bank Erosion and Suspended Sediment Flux using Advanced Geomatic Methods and Machine Learning
Excessive erosion and fine sediment delivery to river corridors and receiving waters degrade aquatic habitat, add to nutrient loading, and impact infrastructure. Understanding the sources and movement of sediment within watersheds is critical for assessing ecosystem health and developing management plans to protect natural and human systems. As our changing climate continues to cause shifts in hydrological regimes (e.g., increased precipitation and streamflow in the northeast U.S.), the development of tools to better understand sediment dynamics takes on even greater importance. In this research, advanced geomatics and machine learning are applied to improve the (1) monitoring of streambank erosion, (2) understanding of event sediment dynamics, and (3) prediction of sediment loading using meteorological data as inputs.
Streambank movement is an integral part of geomorphic changes along river corridors and also a significant source of fine sediment to receiving waters. Advances in unmanned aircraft systems (UAS) and photogrammetry provide opportunities for rapid and economical quantification of streambank erosion and deposition at variable scales. We assess the performance of UAS-based photogrammetry to capture streambank topography and quantify bank movement. UAS data were compared to terrestrial laser scanner (TLS) and GPS surveying from Vermont streambank sites that featured a variety of bank conditions and vegetation. Cross-sectional analysis of UAS and TLS data revealed that the UAS reliably captured the bank surface and was able to quantify the net change in bank area where movement occurred. Although it was necessary to consider overhanging bank profiles and vegetation, UAS-based photogrammetry showed significant promise for capturing bank topography and movement at fine resolutions in a flexible and efficient manner.
This study also used a new machine-learning tool to improve the analysis of sediment dynamics using three years of high-resolution suspended sediment data collected in the Mad River watershed. A restricted Boltzmann machine (RBM), a type of artificial neural network (ANN), was used to classify individual storm events based on the visual hysteresis patterns present in the suspended sediment-discharge data. The work expanded the classification scheme typically used for hysteresis analysis. The results provided insights into the connectivity and sources of sediment within the Mad River watershed and its tributaries. A recurrent counterpropagation network (rCPN) was also developed to predict suspended sediment discharge at ungauged locations using only local meteorological data as inputs. The rCPN captured the nonlinear relationships between meteorological data and suspended sediment discharge, and outperformed the traditional sediment rating curve approach. The combination of machine-learning tools for analyzing storm-event dynamics and estimating loading at ungauged locations in a river network provides a robust method for estimating sediment production from catchments that informs watershed management
Deep Learning Methods for Instrument Separation and Recognition
This thesis explores deep learning methods for timbral information processing in polyphonic music analysis. It encompasses two primary tasks: Music Source Separation (MSS) and Instrument Recognition, with focus on applying domain knowledge and utilising dense arrangements of skip-connections in the frameworks in order to reduce the number of trainable parameters and create more efficient models. Musically-motivated Convolutional Neural Network (CNN) architectures are introduced, emphasizing kernels with vertical, square, and horizontal shapes. This design choice allows for the extraction of essential harmonic and percussive features, which enhances the discrimination of different instruments. Notably, this methodology proves valuable for Harmonic-Percussive Source Separation (HPSS) and instrument recognition tasks. A significant challenge in MSS is generalising to new instrument types and music styles. To address this, a versatile framework for adversarial unsupervised domain adaptation for source separation is proposed, particularly beneficial when labeled data for specific instruments is unavailable. The curation of the Tap & Fiddle dataset is another contribution of the research, offering mixed and isolated stem recordings of traditional Scandinavian fiddle tunes, along with foot-tapping accompaniments, fostering research in source separation and metrical expression analysis within these musical styles. Since our perception of timbre is affected in different ways by transient and stationary parts of sound, the research investigates the potential of Transient Stationary-Noise Decomposition (TSND) as a preprocessing step for frame-level recognition. A method that performs TSND of spectrograms and feeds the decomposed spectrograms to a neural classifier is proposed. Furthermore, this thesis introduces a novel deep learning-based approach for pitch streaming, treating the task as a note-level instrument classification. Such an approach is modular, meaning that it can also successfully stream predicted note-events and not only labelled ground truth note-event information to corresponding instruments. Therefore, the proposed pitch streaming method enables third-party multi-pitch estimation algorithms to perform multi-instrument AMT
- …