30,663 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Supervised cross-modal factor analysis for multiple modal data classification

    Full text link
    In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods

    On Geometric Alignment in Low Doubling Dimension

    Full text link
    In real-world, many problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns, especially in the field of computer vision. Recently, the alignment of geometric patterns in high dimension finds several novel applications, and has attracted more and more attentions. However, the research is still rather limited in terms of algorithms. To the best of our knowledge, most existing approaches for high dimensional alignment are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns and approximately preserve the alignment quality. As a consequence, existing alignment approach can be applied to the compressed geometric patterns and thus the time complexity is significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. We adopt the widely used notion "doubling dimension" to measure the extents of our compression and the resulting approximation. Finally, we test our method on both random and real datasets, the experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the running times (including the times cost for compression) are substantially lower

    An interactome-centered protein discovery approach reveals novel components involved in mitosome function and homeostasis in giardia lamblia

    Get PDF
    Protozoan parasites of the genus Giardia are highly prevalent globally, and infect a wide range of vertebrate hosts including humans, with proliferation and pathology restricted to the small intestine. This narrow ecological specialization entailed extensive structural and functional adaptations during host-parasite co-evolution. An example is the streamlined mitosomal proteome with iron-sulphur protein maturation as the only biochemical pathway clearly associated with this organelle. Here, we applied techniques in microscopy and protein biochemistry to investigate the mitosomal membrane proteome in association to mitosome homeostasis. Live cell imaging revealed a highly immobilized array of 30–40 physically distinct mitosome organelles in trophozoites. We provide direct evidence for the single giardial dynamin-related protein as a contributor to mitosomal morphogenesis and homeostasis. To overcome inherent limitations that have hitherto severely hampered the characterization of these unique organelles we applied a novel interaction-based proteome discovery strategy using forward and reverse protein co-immunoprecipitation. This allowed generation of organelle proteome data strictly in a protein-protein interaction context. We built an initial Tom40-centered outer membrane interactome by co-immunoprecipitation experiments, identifying small GTPases, factors with dual mitosome and endoplasmic reticulum (ER) distribution, as well as novel matrix proteins. Through iterative expansion of this protein-protein interaction network, we were able to i) significantly extend this interaction-based mitosomal proteome to include other membrane-associated proteins with possible roles in mitosome morphogenesis and connection to other subcellular compartments, and ii) identify novel matrix proteins which may shed light on mitosome-associated metabolic functions other than Fe-S cluster biogenesis. Functional analysis also revealed conceptual conservation of protein translocation despite the massive divergence and reduction of protein import machinery in Giardia mitosomes
    • …
    corecore