257 research outputs found

    Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech

    Get PDF
    Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques

    Towards the automated analysis of simple polyphonic music : a knowledge-based approach

    Get PDF
    PhDMusic understanding is a process closely related to the knowledge and experience of the listener. The amount of knowledge required is relative to the complexity of the task in hand. This dissertation is concerned with the problem of automatically decomposing musical signals into a score-like representation. It proposes that, as with humans, an automatic system requires knowledge about the signal and its expected behaviour to correctly analyse music. The proposed system uses the blackboard architecture to combine the use of knowledge with data provided by the bottom-up processing of the signal's information. Methods are proposed for the estimation of pitches, onset times and durations of notes in simple polyphonic music. A method for onset detection is presented. It provides an alternative to conventional energy-based algorithms by using phase information. Statistical analysis is used to create a detection function that evaluates the expected behaviour of the signal regarding onsets. Two methods for multi-pitch estimation are introduced. The first concentrates on the grouping of harmonic information in the frequency-domain. Its performance and limitations emphasise the case for the use of high-level knowledge. This knowledge, in the form of the individual waveforms of a single instrument, is used in the second proposed approach. The method is based on a time-domain linear additive model and it presents an alternative to common frequency-domain approaches. Results are presented and discussed for all methods, showing that, if reliably generated, the use of knowledge can significantly improve the quality of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    Dynamic Protocol Reverse Engineering a Grammatical Inference Approach

    Get PDF
    Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration

    Fluvial Processes in Motion: Measuring Bank Erosion and Suspended Sediment Flux using Advanced Geomatic Methods and Machine Learning

    Get PDF
    Excessive erosion and fine sediment delivery to river corridors and receiving waters degrade aquatic habitat, add to nutrient loading, and impact infrastructure. Understanding the sources and movement of sediment within watersheds is critical for assessing ecosystem health and developing management plans to protect natural and human systems. As our changing climate continues to cause shifts in hydrological regimes (e.g., increased precipitation and streamflow in the northeast U.S.), the development of tools to better understand sediment dynamics takes on even greater importance. In this research, advanced geomatics and machine learning are applied to improve the (1) monitoring of streambank erosion, (2) understanding of event sediment dynamics, and (3) prediction of sediment loading using meteorological data as inputs. Streambank movement is an integral part of geomorphic changes along river corridors and also a significant source of fine sediment to receiving waters. Advances in unmanned aircraft systems (UAS) and photogrammetry provide opportunities for rapid and economical quantification of streambank erosion and deposition at variable scales. We assess the performance of UAS-based photogrammetry to capture streambank topography and quantify bank movement. UAS data were compared to terrestrial laser scanner (TLS) and GPS surveying from Vermont streambank sites that featured a variety of bank conditions and vegetation. Cross-sectional analysis of UAS and TLS data revealed that the UAS reliably captured the bank surface and was able to quantify the net change in bank area where movement occurred. Although it was necessary to consider overhanging bank profiles and vegetation, UAS-based photogrammetry showed significant promise for capturing bank topography and movement at fine resolutions in a flexible and efficient manner. This study also used a new machine-learning tool to improve the analysis of sediment dynamics using three years of high-resolution suspended sediment data collected in the Mad River watershed. A restricted Boltzmann machine (RBM), a type of artificial neural network (ANN), was used to classify individual storm events based on the visual hysteresis patterns present in the suspended sediment-discharge data. The work expanded the classification scheme typically used for hysteresis analysis. The results provided insights into the connectivity and sources of sediment within the Mad River watershed and its tributaries. A recurrent counterpropagation network (rCPN) was also developed to predict suspended sediment discharge at ungauged locations using only local meteorological data as inputs. The rCPN captured the nonlinear relationships between meteorological data and suspended sediment discharge, and outperformed the traditional sediment rating curve approach. The combination of machine-learning tools for analyzing storm-event dynamics and estimating loading at ungauged locations in a river network provides a robust method for estimating sediment production from catchments that informs watershed management

    Deep Learning Methods for Instrument Separation and Recognition

    Get PDF
    This thesis explores deep learning methods for timbral information processing in polyphonic music analysis. It encompasses two primary tasks: Music Source Separation (MSS) and Instrument Recognition, with focus on applying domain knowledge and utilising dense arrangements of skip-connections in the frameworks in order to reduce the number of trainable parameters and create more efficient models. Musically-motivated Convolutional Neural Network (CNN) architectures are introduced, emphasizing kernels with vertical, square, and horizontal shapes. This design choice allows for the extraction of essential harmonic and percussive features, which enhances the discrimination of different instruments. Notably, this methodology proves valuable for Harmonic-Percussive Source Separation (HPSS) and instrument recognition tasks. A significant challenge in MSS is generalising to new instrument types and music styles. To address this, a versatile framework for adversarial unsupervised domain adaptation for source separation is proposed, particularly beneficial when labeled data for specific instruments is unavailable. The curation of the Tap & Fiddle dataset is another contribution of the research, offering mixed and isolated stem recordings of traditional Scandinavian fiddle tunes, along with foot-tapping accompaniments, fostering research in source separation and metrical expression analysis within these musical styles. Since our perception of timbre is affected in different ways by transient and stationary parts of sound, the research investigates the potential of Transient Stationary-Noise Decomposition (TSND) as a preprocessing step for frame-level recognition. A method that performs TSND of spectrograms and feeds the decomposed spectrograms to a neural classifier is proposed. Furthermore, this thesis introduces a novel deep learning-based approach for pitch streaming, treating the task as a note-level instrument classification. Such an approach is modular, meaning that it can also successfully stream predicted note-events and not only labelled ground truth note-event information to corresponding instruments. Therefore, the proposed pitch streaming method enables third-party multi-pitch estimation algorithms to perform multi-instrument AMT
    • …
    corecore