30,663 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Supervised cross-modal factor analysis for multiple modal data classification
In this paper we study the problem of learning from multiple modal data for
purpose of document classification. In this problem, each document is composed
two different modals of data, i.e., an image and a text. Cross-modal factor
analysis (CFA) has been proposed to project the two different modals of data to
a shared data space, so that the classification of a image or a text can be
performed directly in this space. A disadvantage of CFA is that it has ignored
the supervision information. In this paper, we improve CFA by incorporating the
supervision information to represent and classify both image and text modals of
documents. We project both image and text data to a shared data space by factor
analysis, and then train a class label predictor in the shared space to use the
class label information. The factor analysis parameter and the predictor
parameter are learned jointly by solving one single objective function. With
this objective function, we minimize the distance between the projections of
image and text of the same document, and the classification error of the
projection measured by hinge loss function. The objective function is optimized
by an alternate optimization strategy in an iterative algorithm. Experiments in
two different multiple modal document data sets show the advantage of the
proposed algorithm over other CFA methods
On Geometric Alignment in Low Doubling Dimension
In real-world, many problems can be formulated as the alignment between two
geometric patterns. Previously, a great amount of research focus on the
alignment of 2D or 3D patterns, especially in the field of computer vision.
Recently, the alignment of geometric patterns in high dimension finds several
novel applications, and has attracted more and more attentions. However, the
research is still rather limited in terms of algorithms. To the best of our
knowledge, most existing approaches for high dimensional alignment are just
simple extensions of their counterparts for 2D and 3D cases, and often suffer
from the issues such as high complexities. In this paper, we propose an
effective framework to compress the high dimensional geometric patterns and
approximately preserve the alignment quality. As a consequence, existing
alignment approach can be applied to the compressed geometric patterns and thus
the time complexity is significantly reduced. Our idea is inspired by the
observation that high dimensional data often has a low intrinsic dimension. We
adopt the widely used notion "doubling dimension" to measure the extents of our
compression and the resulting approximation. Finally, we test our method on
both random and real datasets, the experimental results reveal that running the
alignment algorithm on compressed patterns can achieve similar qualities,
comparing with the results on the original patterns, but the running times
(including the times cost for compression) are substantially lower
An interactome-centered protein discovery approach reveals novel components involved in mitosome function and homeostasis in giardia lamblia
Protozoan parasites of the genus Giardia are highly prevalent globally, and infect a wide range of vertebrate hosts including humans, with proliferation and pathology restricted to the small intestine. This narrow ecological specialization entailed extensive structural and functional adaptations during host-parasite co-evolution. An example is the streamlined mitosomal proteome with iron-sulphur protein maturation as the only biochemical pathway clearly associated with this organelle. Here, we applied techniques in microscopy and protein biochemistry to investigate the mitosomal membrane proteome in association to mitosome homeostasis. Live cell imaging revealed a highly immobilized array of 30–40 physically distinct mitosome organelles in trophozoites. We provide direct evidence for the single giardial dynamin-related protein as a contributor to mitosomal morphogenesis and homeostasis. To overcome inherent limitations that have hitherto severely hampered the characterization of these unique organelles we applied a novel interaction-based proteome discovery strategy using forward and reverse protein co-immunoprecipitation. This allowed generation of organelle proteome data strictly in a protein-protein interaction context. We built an initial Tom40-centered outer membrane interactome by co-immunoprecipitation experiments, identifying small GTPases, factors with dual mitosome and endoplasmic reticulum (ER) distribution, as well as novel matrix proteins. Through iterative expansion of this protein-protein interaction network, we were able to i) significantly extend this interaction-based mitosomal proteome to include other membrane-associated proteins with possible roles in mitosome morphogenesis and connection to other subcellular compartments, and ii) identify novel matrix proteins which may shed light on mitosome-associated metabolic functions other than Fe-S cluster biogenesis. Functional analysis also revealed conceptual conservation of protein translocation despite the massive divergence and reduction of protein import machinery in Giardia mitosomes
- …