48 research outputs found

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Audio Event Classification for Urban Soundscape Analysis

    Get PDF
    The study of urban soundscapes has gained momentum in recent years as more people become concerned with the level of noise around them and the negative impact this can have on comfort. Monitoring the sounds present in a sonic environment can be a laborious and time–consuming process if performed manually. Therefore, techniques for automated signal identification are gaining importance if soundscapes are to be objectively monitored. This thesis presents a novel approach to feature extraction for the purpose of classifying urban audio events, adding to the library of techniques already established in the field. The research explores how techniques with their origins in the encoding of speech signals can be adapted to represent the complex everyday sounds all around us to allow accurate classification. The analysis methods developed herein are based on the zero–crossings information contained within a signal. Originally developed for the classification of bioacoustic signals, the codebook of Time–Domain Signal Coding (TDSC) has its band–limited restrictions removed to become more generic. Classification using features extracted with the new codebook achieves accuracies of over 80% when combined with a Multilayer Perceptron classifier. Further advancements are made to the standard TDSC algorithm, drawing inspiration from wavelets, resulting in a novel dyadic representation of time–domain features. Carrying the label of Multiscale TDSC (MTDSC), classification accuracies of 70% are achieved using these features. Recommendations for further work focus on expanding the library of training data to improve the accuracy of the classification system. Further research into classifier design is also suggested

    Continuous speech phoneme recognition using neural networks and grammar correction.

    Get PDF
    by Wai-Tat Fu.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 104-[109]).Chapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Problem of Speech Recognition --- p.1Chapter 1.2 --- Why continuous speech recognition? --- p.5Chapter 1.3 --- Current status of continuous speech recognition --- p.6Chapter 1.4 --- Research Goal --- p.10Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Current Approaches to Continuous Speech Recognition --- p.12Chapter 2.1 --- BASIC STEPS FOR CONTINUOUS SPEECH RECOGNITION --- p.12Chapter 2.2 --- THE HIDDEN MARKOV MODEL APPROACH --- p.16Chapter 2.2.1 --- Introduction --- p.16Chapter 2.2.2 --- Segmentation and Pattern Matching --- p.18Chapter 2.2.3 --- Word Formation and Syntactic Processing --- p.22Chapter 2.2.4 --- Discussion --- p.23Chapter 2.3 --- NEURAL NETWORK APPROACH --- p.24Chapter 2.3.1 --- Introduction --- p.24Chapter 2.3.2 --- Segmentation and Pattern Matching --- p.25Chapter 2.3.3 --- Discussion --- p.27Chapter 2.4 --- MLP/HMM HYBRID APPROACH --- p.28Chapter 2.4.1 --- Introduction --- p.28Chapter 2.4.2 --- Architecture of Hybrid MLP/HMM Systems --- p.29Chapter 2.4.3 --- Discussions --- p.30Chapter 2.5 --- SYNTACTIC GRAMMAR --- p.30Chapter 2.5.1 --- Introduction --- p.30Chapter 2.5.2 --- Word formation and Syntactic Processing --- p.31Chapter 2.5.3 --- Discussion --- p.32Chapter 2.6 --- SUMMARY --- p.32Chapter 3 --- Neural Network As Pattern Classifier --- p.34Chapter 3.1 --- INTRODUCTION --- p.34Chapter 3.2 --- TRAINING ALGORITHMS AND TOPOLOGIES --- p.35Chapter 3.2.1 --- Multilayer Perceptrons --- p.35Chapter 3.2.2 --- Recurrent Neural Networks --- p.39Chapter 3.2.3 --- Self-organizing Maps --- p.41Chapter 3.2.4 --- Learning Vector Quantization --- p.43Chapter 3.3 --- EXPERIMENTS --- p.44Chapter 3.3.1 --- The Data Set --- p.44Chapter 3.3.2 --- Preprocessing of the Speech Data --- p.45Chapter 3.3.3 --- The Pattern Classifiers --- p.50Chapter 3.4 --- RESULTS AND DISCUSSIONS --- p.53Chapter 4 --- High Level Context Information --- p.56Chapter 4.1 --- INTRODUCTION --- p.56Chapter 4.2 --- HIDDEN MARKOV MODEL APPROACH --- p.57Chapter 4.3 --- THE DYNAMIC PROGRAMMING APPROACH --- p.59Chapter 4.4 --- THE SYNTACTIC GRAMMAR APPROACH --- p.60Chapter 5 --- Finite State Grammar Network --- p.62Chapter 5.1 --- INTRODUCTION --- p.62Chapter 5.2 --- THE GRAMMAR COMPILATION --- p.63Chapter 5.2.1 --- Introduction --- p.63Chapter 5.2.2 --- K-Tails Clustering Method --- p.66Chapter 5.2.3 --- Inference of finite state grammar --- p.67Chapter 5.2.4 --- Error Correcting Parsing --- p.69Chapter 5.3 --- EXPERIMENT --- p.71Chapter 5.4 --- RESULTS AND DISCUSSIONS --- p.73Chapter 6 --- The Integrated System --- p.81Chapter 6.1 --- INTRODUCTION --- p.81Chapter 6.2 --- POSTPROCESSING OF NEURAL NETWORK OUTPUT --- p.82Chapter 6.2.1 --- Activation Threshold --- p.82Chapter 6.2.2 --- Duration Threshold --- p.85Chapter 6.2.3 --- Merging of Phoneme boundaries --- p.88Chapter 6.3 --- THE ERROR CORRECTING PARSER --- p.90Chapter 6.4 --- RESULTS AND DISCUSSIONS --- p.96Chapter 7 --- Conclusions --- p.101Bibliography --- p.10
    corecore