5 research outputs found

    A Computation Efficient Voice Activity Detector for Low Signal-to-Noise Ratio in Hearing Aids

    Get PDF
    This paper proposes a spectral entropy-based voice activity detection method, which is computationally efficient for hearing aids. The method is highly accurate at low SNR levels by using the spectral entropy which is more robust against changes of the noise power. Compared with the traditional fast Fourier transform based spectral entropy approaches, the proposed method of calculating the spectral entropy using the outputs of a hearing aid filter-bank significantly reduces the computational complexity. The performance of the proposed method was evaluated and compared with two other computationally efficient methods. At negative SNR levels, the proposed method has an accuracy of more than 5% higher than the power-based method with the number of floating-point operations only about 1/100 of that of the statistical model based method

    Detection of Stock Price Manipulation Using Kernel Based Principal Component Analysis and Multivariate Density Estimation

    Get PDF
    Stock price manipulation uses illegitimate means to artificially influence market prices of several stocks. It causes massive losses and undermines investors’ confidence and the integrity of the stock market. Several existing research works focused on detecting a specific manipulation scheme using supervised learning but lacks the adaptive capability to capture different manipulative strategies. This begets the assumption of model parameter values specific to the underlying manipulation scheme. In addition, supervised learning requires the use of labelled data which is difficult to acquire due to confidentiality and the proprietary nature of trading data. The proposed research establishes a detection model based on unsupervised learning using Kernel Principal Component Analysis (KPCA) and applied increased variance of selected latent features in higher dimensions. A proposed Multidimensional Kernel Density Estimation (MKDE) clustering is then applied upon the selected components to identify abnormal patterns of manipulation in data. This research has an advantage over the existing methods in overcoming the ambiguity of assuming values of several parameters, reducing the high dimensions obtained from conventional KPCA and thereby reducing computational complexity. The robustness of the detection model has also been evaluated when two or more manipulative activities occur within a short duration of each other and by varying the window length of the dataset fed to the model. The results show a comprehensive assessment of the model on multiple datasets and a significant performance enhancement in terms of the F-measure values with a significant reduction in false alarm rate (FAR) has been achieved

    Deep learning-based automatic analysis of social interactions from wearable data for healthcare applications

    Get PDF
    PhD ThesisSocial interactions of people with Late Life Depression (LLD) could be an objective measure of social functioning due to the association between LLD and poor social functioning. The utilisation of wearable computing technologies is a relatively new approach within healthcare and well-being application sectors. Recently, the design and development of wearable technologies and systems for health and well-being monitoring have attracted attention both of the clinical and scientific communities. Mainly because the current clinical practice of – typically rather sporadic – clinical behaviour assessments are often administered in artificial settings. As a result, it does not provide a realistic impression of a patient’s condition and thus does not lead to sufficient diagnosis and care. However, wearable behaviour monitors have the potential for continuous, objective assessment of behaviour and wider social interactions and thereby allowing for capturing naturalistic data without any constraints on the place of recording or any typical limitations of the lab-setting research. Such data from naturalistic ambient environments would facilitate automated transmission and analysis by having no constraints on the recordings, allowing for a more timely and accurate assessment of depressive symptoms. In response to this artificial setting issue, this thesis focuses on the analysis and assessment of the different aspects of social interactions in naturalistic environments using deep learning algorithms. That could lead to improvements in both diagnosis and treatment. The advantages of using deep learning are that there is no need for hand-crafted features engineering and this leads to using the raw data with minimal pre-processing compared to classical machine learning approaches and also its scalability and ability to generalise. The main dataset used in this thesis is recorded by a wrist worn device designed at Newcastle University. This device has multiple sensors including microphone, tri-axial accelerometer, light sensor and proximity sensor. In this thesis, only microphone and tri-axial accelerometer are used for the social interaction analysis. The other sensors are not used since they need more calibration from the user which in this will be the elderly people with depression. Hence, it was not feasible in this scenario. Novel deep learning models are proposed to automatically analyse two aspects of social interactions (the verbal interactions/acoustic communications and physical activities/movement patterns). Verbal Interactions include the total quantity of speech, who is talking to whom and when and how much engagement the wearer contributed in the conversations. The physical activity analysis includes activity recognition and the quantity of each activity and sleep patterns. This thesis is composed of three main stages, two of them discuss the acoustic analysis and the third stage describes the movement pattern analysis. The acoustic analysis starts with speech detection in which each segment of the recording is categorised as speech or non-speech. This segment classification is achieved by a novel deep learning model that leverages bi-directional Long Short-Term Memory with gated activation units combined with Maxout Networks as well as a combination of two optimisers. After detecting speech segments from audio data, the next stage is detecting how much engagement the wearer has in any conversation throughout these speech events based on detecting the wearer of the device using a variant model of the previous one that combines the convolutional autoencoder with bi-directional Long Short-Term Memory. Following this, the system then detects the spoken parts of the main speaker/wearer and therefore detects the conversational turn-taking but only includes the turn taking between the wearer and other speakers and not every speaker in the conversation. This stage did not take into account the semantics of the speakers due to the ethical constraints of the main dataset (Depression dataset) and therefore it was not possible to listen to the data by any means or even have any information about the contents. So, it is a good idea to be considered for future work. Stage 3 involves the physical activity analysis that is inferring the elementary physical activities and movement patterns. These elementary patterns include sedentary actions, walking, mixed activities, cycling, using vehicles as well as the sleep patterns. The predictive model used is based on Random Forests and Hidden Markov Models. In all stages the methods presented in this thesis have been compared to the state-of-the-art in processing audio, accelerometer data, respectively, to thoroughly assess their contribution. Following these stages is a thorough analysis of the interplay between acoustic interaction and physical movement patterns and the depression key clinical variables resulting to the outcomes of the previous stages. The main reason for not using deep learning in this stage unlike the previous stages is that the main dataset (Depression dataset) did not have any annotations for the speech or even the activity due to the ethical constraints as mentioned. Furthermore, the training dataset (Discussion dataset) did not have any annotations for the accelerometer data where the data is recorded freely and there is no camera attached to device to make it possible to be annotated afterwards.Newton-Mosharafa Fund and the mission sector and cultural affairs, ministry of Higher Education in Egypt

    Robust Distributed Multi-Source Detection and Labeling in Wireless Acoustic Sensor Networks

    Get PDF
    The growing demand in complex signal processing methods associated with low-energy large scale wireless acoustic sensor networks (WASNs) urges the shift to a new information and communication technologies (ICT) paradigm. The emerging research perception aspires for an appealing wireless network communication where multiple heterogeneous devices with different interests can cooperate in various signal processing tasks (MDMT). Contributions in this doctoral thesis focus on distributed multi-source detection and labeling applied to audio enhancement scenarios pursuing an MDMT fashioned node-specific source-of-interest signal enhancement in WASNs. In fact, an accurate detection and labeling is a pre-requisite to pursue the MDMT paradigm where nodes in the WASN communicate effectively their sources-of-interest and, therefore, multiple signal processing tasks can be enhanced via cooperation. First, a novel framework based on a dominant source model in distributed WASNs for resolving the activity detection of multiple speech sources in a reverberant and noisy environment is introduced. A preliminary rank-one multiplicative non-negative independent component analysis (M-NICA) for unique dominant energy source extraction given associated node clusters is presented. Partitional algorithms that minimize the within-cluster mean absolute deviation (MAD) and weighted MAD objectives are proposed to determine the cluster membership of the unmixed energies, and thus establish a source specific voice activity recognition. In a second study, improving the energy signal separation to alleviate the multiple source activity discrimination task is targeted. Sparsity inducing penalties are enforced on iterative rank-one singular value decomposition layers to extract sparse right rotations. Then, sparse non-negative blind energy separation is realized using multiplicative updates. Hence, the multiple source detection problem is converted into a sparse non-negative source energy decorrelation. Sparsity tunes the supposedly non-active energy signatures to exactly zero-valued energies so that it is easier to identify active energies and an activity detector can be constructed in a straightforward manner. In a centralized scenario, the activity decision is controlled by a fusion center that delivers the binary source activity detection for every participating energy source. This strategy gives precise detection results for small source numbers. With a growing number of interfering sources, the distributed detection approach is more promising. Conjointly, a robust distributed energy separation algorithm for multiple competing sources is proposed. A robust and regularized tνMt_{\nu}M-estimation of the covariance matrix of the mixed energies is employed. This approach yields a simple activity decision using only the robustly unmixed energy signatures of the sources in the WASN. The performance of the robust activity detector is validated with a distributed adaptive node-specific signal estimation method for speech enhancement. The latter enhances the quality and intelligibility of the signal while exploiting the accurately estimated multi-source voice decision patterns. In contrast to the original M-NICA for source separation, the extracted binary activity patterns with the robust energy separation significantly improve the node-specific signal estimation. Due to the increased computational complexity caused by the additional step of energy signal separation, a new approach to solving the detection question of multi-device multi-source networks is presented. Stability selection for iterative extraction of robust right singular vectors is considered. The sub-sampling selection technique provides transparency in properly choosing the regularization variable in the Lasso optimization problem. In this way, the strongest sparse right singular vectors using a robust ℓ1\ell_1-norm and stability selection are the set of basis vectors that describe the input data efficiently. Active/non-active source classification is achieved based on a robust Mahalanobis classifier. For this, a robust MM-estimator of the covariance matrix in the Mahalanobis distance is utilized. Extensive evaluation in centralized and distributed settings is performed to assess the effectiveness of the proposed approach. Thus, overcoming the computationally demanding source separation scheme is possible via exploiting robust stability selection for sparse multi-energy feature extraction. With respect to the labeling problem of various sources in a WASN, a robust approach is introduced that exploits the direction-of-arrival of the impinging source signals. A short-time Fourier transform-based subspace method estimates the angles of locally stationary wide band signals using a uniform linear array. The median of angles estimated at every frequency bin is utilized to obtain the overall angle for each participating source. The features, in this case, exploit the similarity across devices in the particular frequency bins that produce reliable direction-of-arrival estimates for each source. Reliability is defined with respect to the median across frequencies. All source-specific frequency bands that contribute to correct estimated angles are selected. A feature vector is formed for every source at each device by storing the frequency bin indices that lie within the upper and lower interval of the median absolute deviation scale of the estimated angle. Labeling is accomplished by a distributed clustering of the extracted angle-based feature vectors using consensus averaging