41 research outputs found
Sparse general non-negative matrix factorization based on left semi-tensor product
The dimension reduction of large scale high-dimensional data is a challenging task, especially the dimension reduction of face data and the accuracy increment of face recognition in the large scale face recognition system, which may cause large storage space and long recognition time. In order to further reduce the recognition time and the storage space in the large scale face recognition systems, on the basis of the general non-negative matrix factorization based on left semi-tensor (GNMFL) without dimension matching constraints proposed in our previous work, we propose a sparse GNMFL/L (SGNMFL/L) to decompose a large number of face data sets in the large scale face recognition systems, which makes the decomposed base matrix sparser and suppresses the decomposed coefficient matrix. Therefore, the dimension of the basis matrix and the coefficient matrix can be further reduced. Two sets of experiments are conducted to show the effectiveness of the proposed SGNMFL/L on two databases. The experiments are mainly designed to verify the effects of two hyper-parameters on the sparseness of basis matrix factorized by SGNMFL/L, compare the performance of the conventional NMF, sparse NMF (SNMF), GNMFL, and the proposed SGNMFL/L in terms of storage space and time efficiency, and compare their face recognition accuracies with different noises. Both the theoretical derivation and the experimental results show that the proposed SGNMF/L can effectively save the storage space and reduce the computation time while achieving high recognition accuracy and has strong robustness
Object Recognition
Vision-based object recognition tasks are very familiar in our everyday activities, such as driving our car in the correct lane. We do these tasks effortlessly in real-time. In the last decades, with the advancement of computer technology, researchers and application developers are trying to mimic the human's capability of visually recognising. Such capability will allow machine to free human from boring or dangerous jobs
Generative-Discriminative Low Rank Decomposition for Medical Imaging Applications
In this thesis, we propose a method that can be used to extract biomarkers from medical images toward early diagnosis of abnormalities. Surge of demand for biomarkers and availability of medical images in the recent years call for accurate, repeatable, and interpretable approaches for extracting meaningful imaging features. However, extracting such information from medical images is a challenging task because the number of pixels (voxels) in a typical image is in order of millions while even a large sample-size in medical image dataset does not usually exceed a few hundred. Nevertheless, depending on the nature of an abnormality, only a parsimonious subset of voxels is typically relevant to the disease; therefore various notions of sparsity are exploited in this thesis to improve the generalization performance of the prediction task.
We propose a novel discriminative dimensionality reduction method that yields good classification performance on various datasets without compromising the clinical interpretability of the results. This is achieved by combining the modelling strength of generative learning framework and the classification performance of discriminative learning paradigm. Clinical interpretability can be viewed as an additional measure of evaluation and is also helpful in designing methods that account for the clinical prior such as association of certain areas in a brain to a particular cognitive task or connectivity of some brain regions via neural fibres.
We formulate our method as a large-scale optimization problem to solve a constrained matrix factorization. Finding an optimal solution of the large-scale matrix factorization renders off-the-shelf solver computationally prohibitive; therefore, we designed an efficient algorithm based on the proximal method to address the computational bottle-neck of the optimization problem. Our formulation is readily extended for different scenarios such as cases where a large cohort of subjects has uncertain or no class labels (semi-supervised learning) or a case where each subject has a battery of imaging channels (multi-channel), \etc. We show that by using various notions of sparsity as feasible sets of the optimization problem, we can encode different forms of prior knowledge ranging from brain parcellation to brain connectivity
Intelligent Sensor Networks
In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts
The effects of hydration on the topographical and mechanical properties of corneocytes
It is well established that the biomechanical properties of the Stratum Corneum (SC) are influenced by both moisture-induced plasticization and the lipid content. This study employs Atomic Force Microscopy to investigate how hydration affects the surface topographical and elasto-viscoplastic characteristics of corneocytes from two anatomical sites. Volar forearm cells underwent swelling when immersed in water with a 50% increase in thickness and volume. Similarly, medial heel cells demonstrated significant swelling in volume, accompanied by increased cell area and reduced cell roughness. Furthermore, as the water activity was increased, they exhibited enhanced compliance, leading to a decreased Young's modulus, hardness, and relaxation times. Moreover, the swollen cells also displayed a greater tolerance to strain before experiencing permanent deformation. Despite the greater predominance of immature cornified envelopes in plantar skin, the comparable Young's modulus of medial heel and forearm corneocytes suggests that cell stiffness primarily relies on the keratin matrix rather than on the cornified envelope. The Young's moduli of the cells in distilled water are similar to those reported for the SC, which suggests that the corneodesmosomes and intercellular lamellae lipids junctions that connect the corneocytes are able to accommodate the mechanical deformations of the SC.</p
Recommended from our members
Single Channel auditory source separation with neural network
Although distinguishing different sounds in noisy environment is a relative easy task for human, source separation has long been extremely difficult in audio signal processing. The problem is challenging for three reasons: the large variety of sound type, the abundant mixing conditions and the unclear mechanism to distinguish sources, especially for similar sounds.
In recent years, the neural network based methods achieved impressive successes in various problems, including the speech enhancement, where the task is to separate the clean speech out of the noise mixture. However, the current deep learning based source separator does not perform well on real recorded noisy speech, and more importantly, is not applicable in a more general source separation scenario such as overlapped speech.
In this thesis, we firstly propose extensions for the current mask learning network, for the problem of speech enhancement, to fix the scale mismatch problem which is usually occurred in real recording audio. We solve this problem by combining two additional restoration layers in the existing mask learning network. We also proposed a residual learning architecture for the speech enhancement, further improving the network generalization under different recording conditions. We evaluate the proposed speech enhancement models on CHiME 3 data. Without retraining the acoustic model, the best bi-direction LSTM with residue connections yields 25.13% relative WER reduction on real data and 34.03% WER on simulated data.
Then we propose a novel neural network based model called “deep clustering” for more general source separation tasks. We train a deep network to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures. This yields a deep network-based analogue to spectral clustering, in that the embeddings form a low-rank pairwise affinity matrix that approximates the ideal affinity matrix, while enabling much faster performance. At test time, the clustering step “decodes” the segmentation implicit in the embeddings by optimizing K-means with respect to the unknown assignments. Experiments on single channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker and three speakers mixtures can improve signal quality for mixtures of held-out speakers by an average over 10dB.
We then propose an extension for deep clustering named “deep attractor” network that allows the system to perform efficient end-to-end training. In the proposed model, attractor points for each source are firstly created the acoustic signals which pull together the time-frequency bins corresponding to each source by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. We showed that this frame work can achieve even better results.
Lastly, we introduce two applications of the proposed models, in singing voice separation and the smart hearing aid device. For the former, a multi-task architecture is proposed, which combines the deep clustering and the classification based network. And a new state of the art separation result was achieved, where the signal to noise ratio was improved by 11.1dB on music and 7.9dB on singing voice. In the application of smart hearing aid device, we combine the neural decoding with the separation network. The system firstly decodes the user’s attention, which is further used to guide the separator for the targeting source. Both objective study and subjective study show the proposed system can accurately decode the attention and significantly improve the user experience
New approaches for unsupervised transcriptomic data analysis based on Dictionary learning
The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables.
In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes.
Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods
Acoustic event detection and localization using distributed microphone arrays
Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers.
The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments).
The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions.
The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPC¿s department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate.
Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained
Efficient and Robust Methods for Audio and Video Signal Analysis
This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification.Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process.Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR.The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance