5,975 research outputs found

    Speech and crosstalk detection in multichannel audio

    Get PDF
    The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation

    Speaker specific feature based clustering and its applications in language independent forensic speaker recognition

    Get PDF
    Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectivel

    Automatic analysis and classification of cardiac acoustic signals for long term monitoring

    Get PDF
    Objective: Cardiovascular diseases are the leading cause of death worldwide resulting in over 17.9 million deaths each year. Most of these diseases are preventable and treatable, but their progression and outcomes are significantly more positive with early-stage diagnosis and proper disease management. Among the approaches available to assist with the task of early-stage diagnosis and management of cardiac conditions, automatic analysis of auscultatory recordings is one of the most promising ones, since it could be particularly suitable for ambulatory/wearable monitoring. Thus, proper investigation of abnormalities present in cardiac acoustic signals can provide vital clinical information to assist long term monitoring. Cardiac acoustic signals, however, are very susceptible to noise and artifacts, and their characteristics vary largely with the recording conditions which makes the analysis challenging. Additionally, there are challenges in the steps used for automatic analysis and classification of cardiac acoustic signals. Broadly, these steps are the segmentation, feature extraction and subsequent classification of recorded signals using selected features. This thesis presents approaches using novel features with the aim to assist the automatic early-stage detection of cardiovascular diseases with improved performance, using cardiac acoustic signals collected in real-world conditions. Methods: Cardiac auscultatory recordings were studied to identify potential features to help in the classification of recordings from subjects with and without cardiac diseases. The diseases considered in this study for the identification of the symptoms and characteristics are the valvular heart diseases due to stenosis and regurgitation, atrial fibrillation, and splitting of fundamental heart sounds leading to additional lub/dub sounds in the systole or diastole interval of a cardiac cycle. The localisation of cardiac sounds of interest was performed using an adaptive wavelet-based filtering in combination with the Shannon energy envelope and prior information of fundamental heart sounds. This is a prerequisite step for the feature extraction and subsequent classification of recordings, leading to a more precise diagnosis. Localised segments of S1 and S2 sounds, and artifacts, were used to extract a set of perceptual and statistical features using wavelet transform, homomorphic filtering, Hilbert transform and mel-scale filtering, which were then fed to train an ensemble classifier to interpret S1 and S2 sounds. Once sound peaks of interest were identified, features extracted from these peaks, together with the features used for the identification of S1 and S2 sounds, were used to develop an algorithm to classify recorded signals. Overall, 99 features were extracted and statistically analysed using neighborhood component analysis (NCA) to identify the features which showed the greatest ability in classifying recordings. Selected features were then fed to train an ensemble classifier to classify abnormal recordings, and hyperparameters were optimized to evaluate the performance of the trained classifier. Thus, a machine learning-based approach for the automatic identification and classification of S1 and S2, and normal and abnormal recordings, in real-world noisy recordings using a novel feature set is presented. The validity of the proposed algorithm was tested using acoustic signals recorded in real-world, non-controlled environments at four auscultation sites (aortic valve, tricuspid valve, mitral valve, and pulmonary valve), from the subjects with and without cardiac diseases; together with recordings from the three large public databases. The performance metrics of the methodology in relation to classification accuracy (CA), sensitivity (SE), precision (P+), and F1 score, were evaluated. Results: This thesis proposes four different algorithms to automatically classify fundamental heart sounds – S1 and S2; normal fundamental sounds and abnormal additional lub/dub sounds recordings; normal and abnormal recordings; and recordings with heart valve disorders, namely the mitral stenosis (MS), mitral regurgitation (MR), mitral valve prolapse (MVP), aortic stenosis (AS) and murmurs, using cardiac acoustic signals. The results obtained from these algorithms were as follows: • The algorithm to classify S1 and S2 sounds achieved an average SE of 91.59% and 89.78%, and F1 score of 90.65% and 89.42%, in classifying S1 and S2, respectively. 87 features were extracted and statistically studied to identify the top 14 features which showed the best capabilities in classifying S1 and S2, and artifacts. The analysis showed that the most relevant features were those extracted using Maximum Overlap Discrete Wavelet Transform (MODWT) and Hilbert transform. • The algorithm to classify normal fundamental heart sounds and abnormal additional lub/dub sounds in the systole or diastole intervals of a cardiac cycle, achieved an average SE of 89.15%, P+ of 89.71%, F1 of 89.41%, and CA of 95.11% using the test dataset from the PASCAL database. The top 10 features that achieved the highest weights in classifying these recordings were also identified. • Normal and abnormal classification of recordings using the proposed algorithm achieved a mean CA of 94.172%, and SE of 92.38%, in classifying recordings from the different databases. Among the top 10 acoustic features identified, the deterministic energy of the sound peaks of interest and the instantaneous frequency extracted using the Hilbert Huang-transform, achieved the highest weights. • The machine learning-based approach proposed to classify recordings of heart valve disorders (AS, MS, MR, and MVP) achieved an average CA of 98.26% and SE of 95.83%. 99 acoustic features were extracted and their abilities to differentiate these abnormalities were examined using weights obtained from the neighborhood component analysis (NCA). The top 10 features which showed the greatest abilities in classifying these abnormalities using recordings from the different databases were also identified. The achieved results demonstrate the ability of the algorithms to automatically identify and classify cardiac sounds. This work provides the basis for measurements of many useful clinical attributes of cardiac acoustic signals and can potentially help in monitoring the overall cardiac health for longer duration. The work presented in this thesis is the first-of-its-kind to validate the results using both, normal and pathological cardiac acoustic signals, recorded for a long continuous duration of 5 minutes at four different auscultation sites in non-controlled real-world conditions.Open Acces

    Assessing hyper parameter optimization and speedup for convolutional neural networks

    Get PDF
    The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures

    Prediction of Cardiovascular Diseases by Integrating Electrocardiogram (ECG) and Phonocardiogram (PCG) Multi-Modal Features using Hidden Semi Morkov Model

    Get PDF
    Because the health care field generates a large amount of data, we must employ modern ways to handle this data in order to give effective outcomes and make successful decisions based on data. Heart diseases are the major cause of mortality worldwide, accounting for 1/3th of all fatalities. Cardiovascular disease detection can be accomplished by the detection of disturbance in cardiac signals, one of which is known as phonocardiography. The aim of this project is for using machine learning to categorize cardiac illness based on electrocardiogram (ECG) and phonocardiogram (PCG) readings. The investigation began with signal preprocessing, which included cutting and normalizing the signal, and was accompanied by a continuous wavelet transformation utilizing a mother wavelet analytic morlet. The results of the decomposition are shown using a scalogram, and the outcomes are predicted using the Hidden semi morkov model (HSMM). In the first phase, we submit the dataset file and choose an algorithm to run on the chosen dataset. The accuracy of each selected method is then predicted, along with a graph, and a modal is built for the one with the max frequency by training the dataset to it. In the following step, input for each cardiac parameter is provided, and the sick stage of the heart is predicted based on the modal created. We then take measures based on the patient's condition. When compared to current approaches, the suggested HSMM has 0.952 sensitivity, 0.92 specificity, 0.94 F-score, 0.91 ACC, and 0.96 AUC

    Signal processing algorithms for digital hearing aids

    Get PDF
    Hearing loss is a problem that severely affects the speech communication and disqualify most hearing-impaired people from holding a normal life. Although the vast majority of hearing loss cases could be corrected by using hearing aids, however, only a scarce of hearing-impaired people who could be benefited from hearing aids purchase one. This irregular use of hearing aids arises from the existence of a problem that, to date, has not been solved effectively and comfortably: the automatic adaptation of the hearing aid to the changing acoustic environment that surrounds its user. There are two approaches aiming to comply with it. On the one hand, the "manual" approach, in which the user has to identify the acoustic situation and choose the adequate amplification program has been found to be very uncomfortable. The second approach requires to include an automatic program selection within the hearing aid. This latter approach is deemed very useful by most hearing aid users, even if its performance is not completely perfect. Although the necessity of the aforementioned sound classification system seems to be clear, its implementation is a very difficult matter. The development of an automatic sound classification system in a digital hearing aid is a challenging goal because of the inherent limitations of the Digital Signal Processor (DSP) the hearing aid is based on. The underlying reason is that most digital hearing aids have very strong constraints in terms of computational capacity, memory and battery, which seriously limit the implementation of advanced algorithms in them. With this in mind, this thesis focuses on the design and implementation of a prototype for a digital hearing aid able to automatically classify the acoustic environments hearing aid users daily face on and select the amplification program that is best adapted to such environment aiming at enhancing the speech intelligibility perceived by the user. The most important contribution of this thesis is the implementation of a prototype for a digital hearing aid that automatically classifies the acoustic environment surrounding its user and selects the most appropriate amplification program for such environment, aiming at enhancing the sound quality perceived by the user. The battery life of this hearing aid is 140 hours, which has been found to be very similar to that of hearing aids in the market, and what is of key importance, there is still about 30% of the DSP resources available for implementing other algorithms

    A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems

    Get PDF
    A speaker's accent is the most important factor affecting the performance of Natural Language Call Routing (NLCR) systems because accents vary widely, even within the same country or community. This variation also occurs when non-native speakers start to learn a second language, the substitution of native language phonology being a common process. Such substitution leads to fuzziness between the phoneme boundaries and phoneme classes, which reduces out-of-class variations, and increases the similarities between the different sets of phonemes. Thus, this fuzziness is the main cause of reduced NLCR system performance. The main requirement for commercial enterprises using an NLCR system is to have a robust NLCR system that provides call understanding and routing to appropriate destinations. The chief motivation for this present work is to develop an NLCR system that eliminates multilayered menus and employs a sophisticated speaker accent-based automated voice response system around the clock. Currently, NLCRs are not fully equipped with accent classification capability. Our main objective is to develop both speaker-independent and speaker-dependent accent classification systems that understand a caller's query, classify the caller's accent, and route the call to the acoustic model that has been thoroughly trained on a database of speech utterances recorded by such speakers. In the field of accent classification, the dominant approaches are the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Of the two, GMM is the most widely implemented for accent classification. However, GMM performance depends on the initial partitions and number of Gaussian mixtures, both of which can reduce performance if poorly chosen. To overcome these shortcomings, we propose a speaker-independent accent classification system based on a distance metric learning approach and evolution strategy. This approach depends on side information from dissimilar pairs of accent groups to transfer data points to a new feature space where the Euclidean distances between similar and dissimilar points are at their minimum and maximum, respectively. Finally, a Non-dominated Sorting Evolution Strategy (NSES)-based k-means clustering algorithm is employed on the training data set processed by the distance metric learning approach. The main objectives of the NSES-based k-means approach are to find the cluster centroids as well as the optimal number of clusters for a GMM classifier. In the case of a speaker-dependent application, a new method is proposed based on the fuzzy canonical correlation analysis to find appropriate Gaussian mixtures for a GMM-based accent classification system. In our proposed method, we implement a fuzzy clustering approach to minimize the within-group sum-of-square-error and canonical correlation analysis to maximize the correlation between the speech feature vectors and cluster centroids. We conducted a number of experiments using the TIMIT database, the speech accent archive, and the foreign accent English databases for evaluating the performance of speaker-independent and speaker-dependent applications. Assessment of the applications and analysis shows that our proposed methodologies outperform the HMM, GMM, vector quantization GMM, and radial basis neural networks

    Design and validation of a methodology for wind energy structures health monitoring

    Get PDF
    L’objectiu de la Monitorització de la salut estructural (SHM) és la verificació de l’estat o la salut de les estructures per tal de garantir el seu correcte funcionament i estalviar en el cost de manteniment. El sistema SHM combina una xarxa de sensors connectada a l’estructura amb monitoratge continu i algoritmes específics. Es deriven diferents beneficis de l’aplicació de SHM, on trobem: coneixement sobre el comportament de l’estructura sota diferents operacions i diferents càrregues ambientals , el coneixement de l’estat actual per tal de verificar la integritat de l’estructura i determinar si una estructura pot funcionar correctament o si necessita manteniment o substitució i, per tant, reduint els costos de manteniment. El paradigma de la detecció de danys es pot abordar com un problema de reconeixement de patrons (comparació entre les dades recollides de l’estructura sense danys i l’estructura actual, per tal de determinar si hi ha algun canvi) . Hi ha moltes tècniques que poden gestionar el problema. En aquest treball s’utilitzen les dades dels acceleròmetres per desenvolupar aproximacions estadístiques utilitzant dades en temps per a la detecció dels danys en les estructures. La metodologia s’ha dissenyat per a una turbina eòlica off - shore i només s’utilitzen les dades de sortida per detectar els danys. L’excitació de la turbina de vent és induïda pel vent o per les ones del mar. La detecció de danys no és només la comparació de les dades. S’ha dissenyat una metodologia completa per a la detecció de danys en aquest treball. Gestiona dades estructurals, selecciona les dades adequades per detectar danys, i després de tenir en compte les condicions ambientals i operacionals (EOC) en el qual l’estructura està treballant, es detecta el dany mitjançant el reconeixement de patrons. Quan es parla del paradigma de la detecció de danys sempre s’ha de tenir en compte si els sensors estan funcionant correctament. Per això és molt important comptar amb una metodologia que comprova si els sensors estan sans. En aquest treball s’ha aplicat un mètode per detectar els sensors danyats i s’ha insertat en la metodologia de detecció de danys.The objective of Structural Health Monitoring (SHM) is the verification of the state or the health of the structures in order to ensure their proper performance and save on maintenance costs. The SHM system combines a sensor network attached to the structure with continuous monitoring and specific, proprietary algorithms. Different benefits are derived from the implementation of SHM, some of them are: knowledge about the behavior of the structure under different loads and different environmental changes, knowledge of the current state in order to verify the integrity of the structure and determine whether a structure can work properly or whether it needs to be maintained or replaced and, therefore, reduce maintenance costs. The paradigm of damage detection can be tackled as a pattern recognition problem (comparison between the data collected from the structure without damages and the current structure in order to determine if there are any changes). There are lots of techniques that can handle the problem. In this work, accelerometer data is used to develop statistical data driven approaches for the detection of damages in structures. As the methodology is designed for wind turbines, only the output data is used to detect damage; the excitation of the wind turbine is provided by the wind itself or by the sea waves, being those unknown and unpredictable. The damage detection strategy is not only based on the comparison of many data. A complete methodology for damage detection based on pattern recognition has been designed for this work. It handles structural data, selects the proper data for detecting damage and besides, considers the Environmental and Operational Conditions (EOC) in which the structure is operating. The damage detection methodology should always be accessed only if there is a way to probe that the sensors are correctly working. For this reason, it is very important to have a methodology that checks whether the sensors are healthy. In this work a method to detect the damaged sensors has been also implemented and embedded into the damage detection methodology.El objetivo de la Monitorización de la salud estructural (SHM) es la verificación del estado o la salud de las estructuras con el fin de garantizar su correcto funcionamiento y ahorrar en el costo de mantenimiento. El sistema SHM combina una red de sensores conectada a la estructura con monitorización continua y algoritmos específicos. Se derivan diferentes beneficios de la aplicación de SHM, donde encontramos: conocimiento sobre el comportamiento de la estructura bajo diferentes operaciones y diferentes cargas ambientales, el conocimiento del estado actual con el fin de verificar la integridad de la estructura y determinar si una estructura puede funcionar correctamente o si necesita mantenimiento o sustitución y, por lo tanto, reduciendo los costes de mantenimiento. El paradigma de la detección de daños se puede abordar como un problema de reconocimiento de patrones (comparación entre los datos recogidos de la estructura sin daños y la estructura actual, con el fin de determinar si hay algún cambio). Hay muchas técnicas que pueden manejar el problema. En este trabajo se utilizan los datos de los acelerómetros para desarrollar aproximaciones estadísticas utilizando datos en tiempo para la detección de los daños en las estructuras. La metodología se ha diseñado para una turbina eólica off-shore y sólo se utilizan los datos de salida para detectar los daños. La excitación de la turbina de viento es inducida por el viento o por las olas del mar. La detección de daños no es sólo la comparación de los datos. Se ha diseñado una metodología completa para la detección de daños en este trabajo. Gestiona datos estructurales, selecciona los datos adecuados para detectar daños, y después de tener en cuenta las condiciones ambientales y operacionales (EOC) en el que la estructura está trabajando, se detecta el daño mediante el reconocimiento de patrones. Cuando se habla del paradigma de la detección de daños siempre se debe tener en cuenta si los sensores están funcionando correctamente. Por eso es muy importante contar con una metodología que comprueba si los sensores están sanos. En este trabajo se ha aplicado un método para detectar los sensores dañados y se ha metido en la metodología de detección de dañosPostprint (published version
    corecore