133 research outputs found

    Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

    Get PDF
    The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

    Extracting Lungs from CT Images using Fully Convolutional Networks

    Full text link
    Analysis of cancer and other pathological diseases, like the interstitial lung diseases (ILDs), is usually possible through Computed Tomography (CT) scans. To aid this, a preprocessing step of segmentation is performed to reduce the area to be analyzed, segmenting the lungs and removing unimportant regions. Generally, complex methods are developed to extract the lung region, also using hand-made feature extractors to enhance segmentation. With the popularity of deep learning techniques and its automated feature learning, we propose a lung segmentation approach using fully convolutional networks (FCNs) combined with fully connected conditional random fields (CRF), employed in many state-of-the-art segmentation works. Aiming to develop a generalized approach, the publicly available datasets from University Hospitals of Geneva (HUG) and VESSEL12 challenge were studied, including many healthy and pathological CT scans for evaluation. Experiments using the dataset individually, its trained model on the other dataset and a combination of both datasets were employed. Dice scores of 98.67%±0.94%98.67\%\pm0.94\% for the HUG-ILD dataset and 99.19%±0.37%99.19\%\pm0.37\% for the VESSEL12 dataset were achieved, outperforming works in the former and obtaining similar state-of-the-art results in the latter dataset, showing the capability in using deep learning approaches.Comment: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 201

    Algoritmos de procesado de señal basados en Non-negative Matrix Factorization aplicados a la separación, detección y clasificación de sibilancias en señales de audio respiratorias monocanal

    Get PDF
    La auscultación es el primer examen clínico que un médico lleva a cabo para evaluar el estado del sistema respiratorio, debido a que es un método no invasivo, de bajo coste, fácil de realizar y seguro para el paciente. Sin embargo, el diagnóstico que se deriva de la auscultación sigue siendo un diagnóstico subjetivo que se encuentra condicionado a la habilidad, experiencia y entrenamiento de cada médico en la escucha e interpretación de las señales de audio respiratorias. En consecuencia, se producen un alto porcentaje de diagnósticos erróneos que ponen en riesgo la salud de los pacientes e incrementan el coste asociado a los centros de salud. Esta Tesis propone nuevos métodos basados en Non-negative Matrix Factorization aplicados a la separación, detección y clasificación de sonidos sibilantes para proporcionar una vía de información complementaria al médico que ayude a mejorar la fiabilidad del diagnóstico emitido por el especialista. Auscultation is the first clinical examination that a physician performs to evaluate the condition of the respiratory system, because it is a non-invasive, low-cost, easy-to-perform and safe method for the patient. However, the diagnosis derived from auscultation remains a subjective diagnosis that is conditioned by the ability, experience and training of each physician in the listening and interpretation of respiratory audio signals. As a result, a high percentage of misdiagnoses are produced that endanger the health of patients and increase the cost associated with health centres. This Thesis proposes new methods based on Non-negative Matrix Factorization applied to separation, detection and classification of wheezing sounds in order to provide a complementary information pathway to the physician that helps to improve the reliability of the diagnosis made by the doctor.Tesis Univ. Jaén. Departamento INGENIERÍA DE TELECOMUNICACIÓ

    A novel diffusion tensor imaging-based computer-aided diagnostic system for early diagnosis of autism.

    Get PDF
    Autism spectrum disorders (ASDs) denote a significant growing public health concern. Currently, one in 68 children has been diagnosed with ASDs in the United States, and most children are diagnosed after the age of four, despite the fact that ASDs can be identified as early as age two. The ultimate goal of this thesis is to develop a computer-aided diagnosis (CAD) system for the accurate and early diagnosis of ASDs using diffusion tensor imaging (DTI). This CAD system consists of three main steps. First, the brain tissues are segmented based on three image descriptors: a visual appearance model that has the ability to model a large dimensional feature space, a shape model that is adapted during the segmentation process using first- and second-order visual appearance features, and a spatially invariant second-order homogeneity descriptor. Secondly, discriminatory features are extracted from the segmented brains. Cortex shape variability is assessed using shape construction methods, and white matter integrity is further examined through connectivity analysis. Finally, the diagnostic capabilities of these extracted features are investigated. The accuracy of the presented CAD system has been tested on 25 infants with a high risk of developing ASDs. The preliminary diagnostic results are promising in identifying autistic from control patients

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Nonnegative matrix factorization for clustering

    Get PDF
    This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents.Ph.D

    Computational methods to predict and enhance decision-making with biomedical data.

    Get PDF
    The proposed research applies machine learning techniques to healthcare applications. The core ideas were using intelligent techniques to find automatic methods to analyze healthcare applications. Different classification and feature extraction techniques on various clinical datasets are applied. The datasets include: brain MR images, breathing curves from vessels around tumor cells during in time, breathing curves extracted from patients with successful or rejected lung transplants, and lung cancer patients diagnosed in US from in 2004-2009 extracted from SEER database. The novel idea on brain MR images segmentation is to develop a multi-scale technique to segment blood vessel tissues from similar tissues in the brain. By analyzing the vascularization of the cancer tissue during time and the behavior of vessels (arteries and veins provided in time), a new feature extraction technique developed and classification techniques was used to rank the vascularization of each tumor type. Lung transplantation is a critical surgery for which predicting the acceptance or rejection of the transplant would be very important. A review of classification techniques on the SEER database was developed to analyze the survival rates of lung cancer patients, and the best feature vector that can be used to predict the most similar patients are analyzed

    Data Clustering And Visualization Through Matrix Factorization

    Get PDF
    Clustering is traditionally an unsupervised task which is to find natural groupings or clusters in multidimensional data based on perceived similarities among the patterns. The purpose of clustering is to extract useful information from unlabeled data. In order to present the extracted useful knowledge obtained by clustering in a meaningful way, data visualization becomes a popular and growing area of research field. Visualization can provide a qualitative overview of large and complex data sets, which help us the desired insight in truly understanding the phenomena of interest in data. The contribution of this dissertation is two-fold: Semi-Supervised Non-negative Matrix Factorization (SS-NMF) for data clustering/co-clustering and Exemplar-based data Visualization (EV) through matrix factorization. Compared to traditional data mining models, matrix-based methods are fast, easy to understand and implement, especially suitable to solve large-scale challenging problems in text mining, image grouping, medical diagnosis, and bioinformatics. In this dissertation, we present two effective matrix-based solutions in the new directions of data clustering and visualization. First, in many practical learning domains, there is a large supply of unlabeled data but limited labeled data, and in most cases it might be expensive to generate large amounts of labeled data. Traditional clustering algorithms completely ignore these valuable labeled data and thus are inapplicable to these problems. Consequently, semi-supervised clustering, which can incorporate the domain knowledge to guide a clustering algorithm, has become a topic of significant recent interest. Thus, we develop a Non-negative Matrix Factorization (NMF) based framework to incorporate prior knowledge into data clustering. Moreover, with the fast growth of Internet and computational technologies in the past decade, many data mining applications have advanced swiftly from the simple clustering of one data type to the co-clustering of multiple data types, usually involving high heterogeneity. To this end, we extend SS-NMF to perform heterogeneous data co-clustering. From a theoretical perspective, SS-NMF for data clustering/co-clustering is mathematically rigorous. The convergence and correctness of our algorithms are proved. In addition, we discuss the relationship between SS-NMF with other well-known clustering and co-clustering models. Second, most of current clustering models only provide the centroids (e.g., mathematical means of the clusters) without inferring the representative exemplars from real data, thus they are unable to better summarize or visualize the raw data. A new method, Exemplar-based Visualization (EV), is proposed to cluster and visualize an extremely large-scale data. Capitalizing on recent advances in matrix approximation and factorization, EV provides a means to visualize large scale data with high accuracy (in retaining neighbor relations), high efficiency (in computation), and high flexibility (through the use of exemplars). Empirically, we demonstrate the superior performance of our matrix-based data clustering and visualization models through extensive experiments performed on the publicly available large scale data sets
    corecore