4,773 research outputs found
The Incremental Multiresolution Matrix Factorization Algorithm
Multiresolution analysis and matrix factorization are foundational tools in
computer vision. In this work, we study the interface between these two
distinct topics and obtain techniques to uncover hierarchical block structure
in symmetric matrices -- an important aspect in the success of many vision
problems. Our new algorithm, the incremental multiresolution matrix
factorization, uncovers such structure one feature at a time, and hence scales
well to large matrices. We describe how this multiscale analysis goes much
farther than what a direct global factorization of the data can identify. We
evaluate the efficacy of the resulting factorizations for relative leveraging
within regression tasks using medical imaging data. We also use the
factorization on representations learned by popular deep networks, providing
evidence of their ability to infer semantic relationships even when they are
not explicitly trained to do so. We show that this algorithm can be used as an
exploratory tool to improve the network architecture, and within numerous other
settings in vision.Comment: Computer Vision and Pattern Recognition (CVPR) 2017, 10 page
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Advances in Spectral Learning with Applications to Text Analysis and Brain Imaging
Spectral learning algorithms are becoming increasingly popular in data-rich domains, driven in part by recent advances in large scale randomized SVD, and in spectral estimation of Hidden Markov Models. Extensions of these methods lead to statistical estimation algorithms which are not only fast, scalable, and useful on real data sets, but are also provably correct.
Following this line of research, we make two contributions. First, we
propose a set of spectral algorithms for text analysis and natural
language processing. In particular, we propose fast and scalable
spectral algorithms for learning word embeddings -- low dimensional
real vectors (called Eigenwords) that capture the “meaning” of words from their context. Second, we show how similar spectral methods can be applied to analyzing brain images.
State-of-the-art approaches to learning word embeddings are slow to
train or lack theoretical grounding; We propose three spectral
algorithms that overcome these limitations. All three algorithms
harness the multi-view nature of text data i.e. the left and right
context of each word, and share three characteristics:
1). They are fast to train and are scalable.
2). They have strong theoretical properties.
3). They can induce context-specific embeddings i.e. different embedding for “river bank” or “Bank of America”.
\end{enumerate}
They also have lower sample complexity and hence higher statistical
power for rare words. We provide theory which establishes
relationships between these algorithms and optimality criteria for the
estimates they provide. We also perform thorough qualitative and
quantitative evaluation of Eigenwords and demonstrate their superior performance over state-of-the-art approaches.
Next, we turn to the task of using spectral learning methods for brain imaging data.
Methods like Sparse Principal Component Analysis (SPCA), Non-negative Matrix Factorization (NMF) and Independent Component Analysis (ICA) have been used to obtain state-of-the-art accuracies in a variety of problems in machine learning. However, their usage in brain imaging, though increasing, is limited by the fact that they are used as out-of-the-box techniques and are seldom tailored to the domain specific constraints and knowledge pertaining to medical imaging, which leads to difficulties in interpretation of results.
In order to address the above shortcomings, we propose
Eigenanatomy (EANAT), a general framework for sparse matrix factorization. Its goal is to statistically learn the boundaries of
and connections between brain regions by weighing both the data and prior neuroanatomical knowledge.
Although EANAT incorporates some neuroanatomical prior knowledge in the form of connectedness and smoothness constraints, it can still be difficult for clinicians to interpret the results in specific domains where network-specific hypotheses exist. We thus extend EANAT and present a novel framework for prior-constrained sparse decomposition of matrices derived from brain imaging data, called Prior Based Eigenanatomy (p-Eigen). We formulate our solution in terms of a prior-constrained l1 penalized (sparse) principal component analysis. Experimental evaluation confirms that p-Eigen extracts biologically-relevant, patient-specific functional parcels and that it significantly aids classification of Mild Cognitive Impairment when compared to state-of-the-art competing approaches
Altered functional and structural brain network organization in autism.
Structural and functional underconnectivity have been reported for multiple brain regions, functional systems, and white matter tracts in individuals with autism spectrum disorders (ASD). Although recent developments in complex network analysis have established that the brain is a modular network exhibiting small-world properties, network level organization has not been carefully examined in ASD. Here we used resting-state functional MRI (n = 42 ASD, n = 37 typically developing; TD) to show that children and adolescents with ASD display reduced short and long-range connectivity within functional systems (i.e., reduced functional integration) and stronger connectivity between functional systems (i.e., reduced functional segregation), particularly in default and higher-order visual regions. Using graph theoretical methods, we show that pairwise group differences in functional connectivity are reflected in network level reductions in modularity and clustering (local efficiency), but shorter characteristic path lengths (higher global efficiency). Structural networks, generated from diffusion tensor MRI derived fiber tracts (n = 51 ASD, n = 43 TD), displayed lower levels of white matter integrity yet higher numbers of fibers. TD and ASD individuals exhibited similar levels of correlation between raw measures of structural and functional connectivity (n = 35 ASD, n = 35 TD). However, a principal component analysis combining structural and functional network properties revealed that the balance of local and global efficiency between structural and functional networks was reduced in ASD, positively correlated with age, and inversely correlated with ASD symptom severity. Overall, our findings suggest that modeling the brain as a complex network will be highly informative in unraveling the biological basis of ASD and other neuropsychiatric disorders
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
- …