33,382 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Unsupervised Bayesian linear unmixing of gene expression microarrays

    Get PDF
    Background: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Results: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. Conclusions: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor

    Mixtures of Common Skew-t Factor Analyzers

    Full text link
    A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used

    RGS14 (414)-mediated prevention of an episodic memory loss: a study of molecular mechanism

    Get PDF
    A large proportion of human populations suffer memory impairments either caused by normal aging or afflicted by diverse neurological and neurodegenerative diseases. Memory enhancers and other drugs tested so far against memory loss have failed to produce therapeutic efficacy in clinical trials and thus, there is a need to find remedy for this mental disorder. In search for cure of memory loss, our laboratory discovered a robust memory enhancer called RGS14(414). A treatment in brain with its gene produces an enduring effect on memory that lasts for lifetime of rats. Therefore, current thesis work was designed to investigate whether RGS14(414) treatment can prevent memory loss and furthermore, explore through biological processes responsible for RGS-mediated memory enhancement. We found that RGS14(414) gene treatment prevented episodic memory loss in rodent models of normal aging and Alzheimer´s disease. A memory loss was observed in normal rats at 18 months of age; however, when they were treated with RGS14(414) gene at 3 months of age, they abrogated this deficit and their memory remained intact till the age of 22 months. In addition to normal aging rats, effect of memory enhancer treatment in mice model of Alzheimer´s disease (AD-mice) produced a similar effect. AD-mice subjected to treatment with RGS14(414) gene at the age of 2 months, a period when memory was intact, showed not only a prevention in memory loss observed at 4 months of age but also they were able to maintain normal memory after 6 months of the treatment. We posit that long-lasting effect on memory enhancement and prevention of memory loss mediated through RGS14(414) might be due to a permanent structural change caused by a surge in neuronal connections and enhanced neuronal remodeling, key processes for long-term memory formation. A neuronal arborization analysis of both pyramidal and non-pyramidal neurons in brain of RGS14(414)-treated rats exhibited robust rise in neurites outgrowth of both kind of cells, and an increment in number of branching from the apical dendrite of pyramidal neurons, reaching to almost three times of the control animals. To further understand of underlying mechanism by which RGS14(414) induces neuronal arborization, we investigated into neurotrophic factors. We observed that RGS14 treatment induces a selective increase in BDNF. Role of BDNF in neuronal arborization, as well as its implication in learning and memory processes is well described. In addition, our results showing a dynamic expression pattern of BDNF during ORM processing that overlapped with memory consolidation further support the idea of the implication of this neurotrophin in formation of long-term memory in RGS-animals. On the other hand, in studies of expression profiling of RGS-treated animals, we have demonstrated that 14-3-3ζ protein displays a coherent relationship to RGS-mediated ORM enhancement. Recent studies have demonstrated that the interaction of receptor for activated protein kinase 1 (RACK1) with 14-3-3ζ is essential for its nuclear translocation, where RACK1-14-3-3ζ complex binds at promotor IV region of BDNF and promotes an increase in BDNF gene transcription. These observations suggest that 14-3-3ζ might regulate the elevated level of BDNF seen in RGS14(414) gene treated animals. Therefore, it seems that RGS-mediated surge in 14-3-3ζ causes elevated BDNF synthesis needed for neuronal arborization and enhanced ORM. The prevention of memory loss might be mediated through a restoration in BDNF and 14-3-3ζ protein levels, which are significantly decreased in aging and Alzheimer’s disease. Additionally, our results demonstrate that RGS14(414) treatment could be a viable strategy against episodic memory loss

    Integrated smoothed location model and data reduction approaches for multi variables classification

    Get PDF
    Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct a practical classification analysis, the objects must first be sorted into the cells of a multinomial table generated from the binary variables. Then, the parameters in each cell will be estimated using the sorted objects. However, in many situations, the estimated parameters are poor if the number of binary is large relative to the size of sample. Large binary variables will create too many multinomial cells which are empty, leading to high sparsity problem and finally give exceedingly poor performance for the constructed rule. In the worst case scenario, the rule cannot be constructed. To overcome such shortcomings, this study proposes new strategies to extract adequate variables that contribute to optimum performance of the rule. Combinations of two extraction techniques are introduced, namely 2PCA and PCA+MCA with new cutpoints of eigenvalue and total variance explained, to determine adequate extracted variables which lead to minimum misclassification rate. The outcomes from these extraction techniques are used to construct the smoothed location models, which then produce two new approaches of classification called 2PCALM and 2DLM. Numerical evidence from simulation studies demonstrates that the computed misclassification rate indicates no significant difference between the extraction techniques in normal and non-normal data. Nevertheless, both proposed approaches are slightly affected for non-normal data and severely affected for highly overlapping groups. Investigations on some real data sets show that the two approaches are competitive with, and better than other existing classification methods. The overall findings reveal that both proposed approaches can be considered as improvement to the location model, and alternatives to other classification methods particularly in handling mixed variables with large binary size

    A primer on correlation-based dimension reduction methods for multi-omics analysis

    Full text link
    The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will guide researchers navigate the emerging methods for multi-omics and help them integrate diverse omic datasets appropriately and embrace the opportunity of population multi-omics.Comment: 30 pages, 2 figures, 6 table
    corecore