37,365 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Structured penalized regression for drug sensitivity prediction

    Full text link
    Large-scale {\it in vitro} drug sensitivity screens are an important tool in personalized oncology to predict the effectiveness of potential cancer drugs. The prediction of the sensitivity of cancer cell lines to a panel of drugs is a multivariate regression problem with high-dimensional heterogeneous multi-omics data as input data and with potentially strong correlations between the outcome variables which represent the sensitivity to the different drugs. We propose a joint penalized regression approach with structured penalty terms which allow us to utilize the correlation structure between drugs with group-lasso-type penalties and at the same time address the heterogeneity between omics data sources by introducing data-source-specific penalty factors to penalize different data sources differently. By combining integrative penalty factors (IPF) with tree-guided group lasso, we create the IPF-tree-lasso method. We present a unified framework to transform more general IPF-type methods to the original penalized method. Because the structured penalty terms have multiple parameters, we demonstrate how the interval-search Efficient Parameter Selection via Global Optimization (EPSGO) algorithm can be used to optimize multiple penalty parameters efficiently. Simulation studies show that IPF-tree-lasso can improve the prediction performance compared to other lasso-type methods, in particular for heterogenous data sources. Finally, we employ the new methods to analyse data from the Genomics of Drug Sensitivity in Cancer project.Comment: Zhao Z, Zucknick M (2020). Structured penalized regression for drug sensitivity prediction. Journal of the Royal Statistical Society, Series C. 19 pages, 6 figures and 2 table

    A Selective Review of Group Selection in High-Dimensional Models

    Full text link
    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Aging concrete structures: a review of mechanics and concepts

    Get PDF
    The safe and cost-efficient management of our built infrastructure is a challenging task considering the expected service life of at least 50 years. In spite of time-dependent changes in material properties, deterioration processes and changing demand by society, the structures need to satisfy many technical requirements related to serviceability, durability, sustainability and bearing capacity. This review paper summarizes the challenges associated with the safe design and maintenance of aging concrete structures and gives an overview of some concepts and approaches that are being developed to address these challenges

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine
    corecore