249 research outputs found

    Telephone speech enhancement for the hearing impaired

    Get PDF
    This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2009.Cataloged from PDF version of thesis report.Includes bibliographical references (page 48).Bony TasnimC. Z. MurshedB. Computer Science and Engineerin

    Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

    Get PDF
    In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro-Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ?MFCC, ??MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000% for studio environment and 82.33% for office environmental conditions have been achieved in the close set text dependent speaker identification system

    A Retrospective Assessment of Fuzzy Logic Applications in Voice Communications and Speech Analytics

    Get PDF
    Voice and speech communication is a major topic covering simultaneously ’communication’, ’control’ (because it often involves control in the coding algorithms), and ’computing’ - from speech analysis and recognition, to speech analytics and to speech coding over communication channels. While fuzzy logic was specifically con- ceived to deal with language and reasoning, it has yet a limited use in the referred field. We discuss some of the main current applications from the perspective of half a century since fuzzy logic inception

    Psychoacoustics Modelling and the Recognition of Silence in Recorded Speech

    Get PDF
    Ph. D. Thesis.Over many years, a variety of different computer models purposed to encapsulate the essential differences between silence and speech have been investigated; but that notwithstanding, research into a different audio model may provide fresh insight. So, inspired by the unsurpassed human capability to differentiate between silence and speech under virtually any conditions, a dynamic psychoacoustics model, with a temporal resolution of an order of magnitude greater than that of the typical Mel Frequency Cepstral Coefficients model, and which implemented simultaneous masking around the most powerful harmonic in each of 24 Bark frequency bands, was evaluated within a two stage binary speech/silence non-linear classification system. The first classification stage (deterministic) was purposed to provide training data for the second stage (heuristic) — which was implemented using a Deep Neural Network (DNN). It is authoritatively asserted in the Literature — in a context of speech processing and DNNs — that performance improvements experienced with a ‘standard’ speech corpus do not always generalise. Accordingly, six new test-cases were recorded; and as this corpus implicitly included frequency normalisation it was feasible to assess whether the solution generalised, and it was found that all of the test-cases could be successfully processed by any of the six trained DNNs. In other tests, the performance of the two stage silence/speech classifier was found to exceed that of the silence/speech classifiers discussed in the Literature Review; but it was interesting to note that the Split Sample Technique for neural net training did not always identify the optimal trained network — and to correct this, an additional step in the training process was devised and tested. Overall, the results conclusively demonstrate that the combination of the dynamic psychoacoustics model with the two stage binary speech/silence non-linear classification system provides a viable alternative to existing methods of detecting silence in speech

    Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry

    Get PDF
    Objectives: Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Study design: Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Methods: Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results: Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). Conclusions: These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies

    Infant Cry Signal Processing, Analysis, and Classification with Artificial Neural Networks

    Get PDF
    As a special type of speech and environmental sound, infant cry has been a growing research area covering infant cry reason classification, pathological infant cry identification, and infant cry detection in the past two decades. In this dissertation, we build a new dataset, explore new feature extraction methods, and propose novel classification approaches, to improve the infant cry classification accuracy and identify diseases by learning infant cry signals. We propose a method through generating weighted prosodic features combined with acoustic features for a deep learning model to improve the performance of asphyxiated infant cry identification. The combined feature matrix captures the diversity of variations within infant cries and the result outperforms all other related studies on asphyxiated baby crying classification. We propose a non-invasive fast method of using infant cry signals with convolutional neural network (CNN) based age classification to diagnose the abnormality of infant vocal tract development as early as 4-month age. Experiments discover the pattern and tendency of the vocal tract changes and predict the abnormality of infant vocal tract by classifying the cry signals into younger age category. We propose an approach of generating hybrid feature set and using prior knowledge in a multi-stage CNNs model for robust infant sound classification. The dominant and auxiliary features within the set are beneficial to enlarge the coverage as well as keeping a good resolution for modeling the diversity of variations within infant sound and the experimental results give encouraging improvements on two relative databases. We propose an approach of graph convolutional network (GCN) with transfer learning for robust infant cry reason classification. Non-fully connected graphs based on the similarities among the relevant nodes are built to consider the short-term and long-term effects of infant cry signals related to inner-class and inter-class messages. With as limited as 20% of labeled training data, our model outperforms that of the CNN model with 80% labeled training data in both supervised and semi-supervised settings. Lastly, we apply mel-spectrogram decomposition to infant cry classification and propose a fusion method to further improve the infant cry classification performance

    Application of Computational Intelligence in Cognitive Radio Network for Efficient Spectrum Utilization, and Speech Therapy

    Get PDF
    communication systems utilize all the available frequency bands as efficiently as possible in time, frequency and spatial domains. Society requires more high capacity and broadband wireless connectivity, demanding greater access to spectrum. Most of the licensed spectrums are grossly underutilized while some spectrum (licensed and unlicensed) are overcrowded. The problem of spectrum scarcity and underutilization can be minimized by adopting a new paradigm of wireless communication scheme. Advanced Cognitive Radio (CR) network or Dynamic Adaptive Spectrum Sharing is one of the ways to optimize our wireless communications technologies for high data rates while maintaining users’ desired quality of service (QoS) requirements. Scanning a wideband spectrum to find spectrum holes to deliver to users an acceptable quality of service using algorithmic methods requires a lot of time and energy. Computational Intelligence (CI) techniques can be applied to these scenarios to predict the available spectrum holes, and the expected RF power in the channels. This will enable the CR to predictively avoid noisy channels among the idle channels, thus delivering optimum QoS at less radio resources. In this study, spectrum holes search using artificial neural network (ANN) and traditional search methods were simulated. The RF power traffic of some selected channels ranging from 50MHz to 2.5GHz were modelled using optimized ANN and support vector machine (SVM) regression models for prediction of real world RF power. The prediction accuracy and generalization was improved by combining different prediction models with a weighted output to form one model. The meta-parameters of the prediction models were evolved using population based differential evolution and swarm intelligence optimization algorithms. The success of CR network is largely dependent on the overall world knowledge of spectrum utilization in both time, frequency and spatial domains. To identify underutilized bands that can serve as potential candidate bands to be exploited by CRs, spectrum occupancy survey based on long time RF measurement using energy detector was conducted. Results show that the average spectrum utilization of the bands considered within the studied location is less than 30%. Though this research is focused on the application of CI with CR as the main target, the skills and knowledge acquired from the PhD research in CI was applied in ome neighbourhood areas related to the medical field. This includes the use of ANN and SVM for impaired speech segmentation which is the first phase of a research project that aims at developing an artificial speech therapist for speech impaired patients.Petroleum Technology Development Fund (PTDF) Scholarship Board, Nigeri

    Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

    Get PDF
    Speech recognition is a very useful technology because of its potential to develop applications, which are suitable for various needs of users. This research is an attempt to enhance the performance of a speech recognition system by combining the visual features (lip movement) with audio features. The results were calculated using utterances of numerals collected from participants inclusive of both male and female genders. Discrete Cosine Transform (DCT) coefficients were used for computing visual features and Mel Frequency Cepstral Coefficients (MFCC) were used for computing audio features. The classification was then carried out using Support Vector Machine (SVM). The results obtained from the combined/fused system were compared with the recognition rates of two standalone systems (Audio only and visual only)

    Automatic Emotion Recognition from Mandarin Speech

    Get PDF
    corecore