16 research outputs found

    Infant Cry Signal Processing, Analysis, and Classification with Artificial Neural Networks

    Get PDF
    As a special type of speech and environmental sound, infant cry has been a growing research area covering infant cry reason classification, pathological infant cry identification, and infant cry detection in the past two decades. In this dissertation, we build a new dataset, explore new feature extraction methods, and propose novel classification approaches, to improve the infant cry classification accuracy and identify diseases by learning infant cry signals. We propose a method through generating weighted prosodic features combined with acoustic features for a deep learning model to improve the performance of asphyxiated infant cry identification. The combined feature matrix captures the diversity of variations within infant cries and the result outperforms all other related studies on asphyxiated baby crying classification. We propose a non-invasive fast method of using infant cry signals with convolutional neural network (CNN) based age classification to diagnose the abnormality of infant vocal tract development as early as 4-month age. Experiments discover the pattern and tendency of the vocal tract changes and predict the abnormality of infant vocal tract by classifying the cry signals into younger age category. We propose an approach of generating hybrid feature set and using prior knowledge in a multi-stage CNNs model for robust infant sound classification. The dominant and auxiliary features within the set are beneficial to enlarge the coverage as well as keeping a good resolution for modeling the diversity of variations within infant sound and the experimental results give encouraging improvements on two relative databases. We propose an approach of graph convolutional network (GCN) with transfer learning for robust infant cry reason classification. Non-fully connected graphs based on the similarities among the relevant nodes are built to consider the short-term and long-term effects of infant cry signals related to inner-class and inter-class messages. With as limited as 20% of labeled training data, our model outperforms that of the CNN model with 80% labeled training data in both supervised and semi-supervised settings. Lastly, we apply mel-spectrogram decomposition to infant cry classification and propose a fusion method to further improve the infant cry classification performance

    Advanced Data Analytics Methodologies for Anomaly Detection in Multivariate Time Series Vehicle Operating Data

    Get PDF
    Early detection of faults in the vehicle operating systems is a research domain of high significance to sustain full control of the systems since anomalous behaviors usually result in performance loss for a long time before detecting them as critical failures. In other words, operating systems exhibit degradation when failure begins to occur. Indeed, multiple presences of the failures in the system performance are not only anomalous behavior signals but also show that taking maintenance actions to keep the system performance is vital. Maintaining the systems in the nominal performance for the lifetime with the lowest maintenance cost is extremely challenging and it is important to be aware of imminent failure before it arises and implement the best countermeasures to avoid extra losses. In this context, the timely anomaly detection of the performance of the operating system is worthy of investigation. Early detection of imminent anomalous behaviors of the operating system is difficult without appropriate modeling, prediction, and analysis of the time series records of the system. Data based technologies have prepared a great foundation to develop advanced methods for modeling and prediction of time series data streams. In this research, we propose novel methodologies to predict the patterns of multivariate time series operational data of the vehicle and recognize the second-wise unhealthy states. These approaches help with the early detection of abnormalities in the behavior of the vehicle based on multiple data channels whose second-wise records for different functional working groups in the operating systems of the vehicle. Furthermore, a real case study data set is used to validate the accuracy of the proposed prediction and anomaly detection methodologies

    Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

    Get PDF
    The last decade has seen a revolution in the theory and application of machine learning and pattern recognition. Through these advancements, variable ranking has emerged as an active and growing research area and it is now beginning to be applied to many new problems. The rationale behind this fact is that many pattern recognition problems are by nature ranking problems. The main objective of a ranking algorithm is to sort objects according to some criteria, so that, the most relevant items will appear early in the produced result list. Ranking methods can be analyzed from two different methodological perspectives: ranking to learn and learning to rank. The former aims at studying methods and techniques to sort objects for improving the accuracy of a machine learning model. Enhancing a model performance can be challenging at times. For example, in pattern classification tasks, different data representations can complicate and hide the different explanatory factors of variation behind the data. In particular, hand-crafted features contain many cues that are either redundant or irrelevant, which turn out to reduce the overall accuracy of the classifier. In such a case feature selection is used, that, by producing ranked lists of features, helps to filter out the unwanted information. Moreover, in real-time systems (e.g., visual trackers) ranking approaches are used as optimization procedures which improve the robustness of the system that deals with the high variability of the image streams that change over time. The other way around, learning to rank is necessary in the construction of ranking models for information retrieval, biometric authentication, re-identification, and recommender systems. In this context, the ranking model's purpose is to sort objects according to their degrees of relevance, importance, or preference as defined in the specific application.Comment: European PhD Thesis. arXiv admin note: text overlap with arXiv:1601.06615, arXiv:1505.06821, arXiv:1704.02665 by other author

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    Adaptive multiple importance sampling for Gaussian processes and its application in social signal processing

    Get PDF
    Social signal processing aims to automatically understand and interpret social signals (e.g. facial expressions and prosody) generated during human-human and human-machine interactions. Automatic interpretation of social signals involves two fundamentally important aspects: feature extraction and machine learning. So far, machine learning approaches applied to social signal processing have mainly focused on parametric approaches (e.g. linear regression) or non-parametric models such as support vector machine (SVM). However, these approaches fall short of taking into account any uncertainty as a result of model misspecification or lack interpretability for analyses of scenarios in social signal processing. Consequently, they are less able to understand and interpret human behaviours effectively. Gaussian processes (GPs), that have gained popularity in data analysis, offer a solution to these limitations through their attractive properties: being non-parametric enables them to flexibly model data and being probabilistic makes them capable of quantifying uncertainty. In addition, a proper parametrisation in the covariance function makes it possible to gain insights into the application under study. However, these appealing properties of GP models hinge on an accurate characterisation of the posterior distribution with respect to the covariance parameters. This is normally done by means of standard MCMC algorithms, which require repeated expensive calculations involving the marginal likelihood. Motivated by the desire to avoid the inefficiencies of MCMC algorithms rejecting a considerable number of expensive proposals, this thesis has developed an alternative inference framework based on adaptive multiple importance sampling (AMIS). In particular, this thesis studies the application of AMIS for Gaussian processes in the case of a Gaussian likelihood, and proposes a novel pseudo-marginal-based AMIS (PM-AMIS) algorithm for non-Gaussian likelihoods, where the marginal likelihood is unbiasedly estimated. Experiments on benchmark data sets show that the proposed framework outperforms the MCMC-based inference of GP covariance parameters in a wide range of scenarios. The PM-AMIS classifier - based on Gaussian processes with a newly designed group-automatic relevance determination (G-ARD) kernel - has been applied to predict whether a Flickr user is perceived to be above the median or not with respect to each of the Big-Five personality traits. The results show that, apart from the high prediction accuracies achieved (up to 79% depending on the trait), the parameters of the G-ARD kernel allow the identification of the groups of features that better account for the classification outcome and provide indications about cultural effects through their weight differences. Therefore, this demonstrates the value of the proposed non-parametric probabilistic framework for social signal processing. Feature extraction in signal processing is dominated by various methods based on short time Fourier transform (STFT). Recently, Hilbert spectral analysis (HSA), a new representation of signal which is fundamentally different from STFT has been proposed. This thesis is also the first attempt to investigate the extraction of features from this newly proposed HSA and its application in social signal processing. The experimental results reveal that, using features extracted from the Hilbert spectrum of voice data of female speakers, the prediction accuracy can be achieved by up to 81% when predicting their Big-Five personality traits, and hence show that HSA can work as an effective alternative to STFT for feature extraction in social signal processing
    corecore