10,588 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

    Full text link
    Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using L1L_1 regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems.Comment: Technical report corresponding to the ICML2010 submission of the same nam

    Debiased Bayesian inference for average treatment effects

    Get PDF
    Bayesian approaches have become increasingly popular in causal inference problems due to their conceptual simplicity, excellent performance and in-built uncertainty quantification ('posterior credible sets'). We investigate Bayesian inference for average treatment effects from observational data, which is a challenging problem due to the missing counterfactuals and selection bias. Working in the standard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-order posterior bias, thereby improving performance. We illustrate our method for Gaussian process (GP) priors using (semi-)synthetic data. Our experiments demonstrate significant improvement in both estimation accuracy and uncertainty quantification compared to the unmodified GP, rendering our approach highly competitive with the state-of-the-art.Comment: NeurIPS 201

    Partial mixture model for tight clustering of gene expression time-course

    Get PDF
    Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

    On the hedonic modelling of land prices

    Get PDF
    In this study hedonic modelling methods beyond the ordinary least squares estimator are investigated in explaining and predicting the land prices in the two submarkets (Espoo and Nurmijärvi) of the Finnish land markets. The first paper deals with the estimation of several parametric hedonic models, including dynamic responses, using recursive estimation technique. The second paper examines the applicability of semiparametric structural time series methods to the optimal estimation of spatio-temporal movements of land prices. The third paper focuses on the robust nonparametric estimation using local polynomial modelling approach in explaining and predicting the land prices. The fourth paper investigates flexible wavelet transforms in the estimation of long-run temporal land price movements (cycles and trends). The final fifth paper uses robust parametric estimator, the three-stage MM-estimator, to explicitly address the problem of outlying and influential data points. The key observation of this study is that there is much scope for methods beyond the ordinary least squares estimator in explaining and predicting the land prices in local markets. This is especially true in the submarket of Espoo, where the use of unconventional methods of the study showed that significant improvements could be achieved in hedonic models' explanatory power and/or predictive validity when the methods of this research are used instead of the orthodox least squares estimator. In the Espoo case structural time series models, local polynomial regression and robust MM-estimation all generated more precise results in terms of post-sample prediction power than the conventional least squares estimator. The empirical experimentation quite strongly indicated that the determination of land prices in the municipality of Nurmijärvi could be best explained by the use of unobserved component models. The flexible local polynomial modelling and three-stage MM-estimation surprisingly added no value in terms of greater post-sample precision in the Nurmijärvi case.Tässä tutkimuksessa tarkastellaan sellaisia hedonisia mallintamismenetelmiä, jotka yleistävät tavallisen pienimmän neliösumman mukaista ratkaisua, kun selitetään ja ennustetaan maanhintoja kahdella osamarkkina-alueella (Espoo ja Nurmijärvi) Suomen maamarkkinoilla. Ensimmäinen artikkeli tarkastelee erilaisten parametristen mallien estimointia käyttämällä rekursiivista estimointitekniikkaa. Toinen artikkeli tutkii semiparametristen rakenteellisten aikasarjamallien soveltuvuutta ajallis-paikallisten maanhintavaihteluiden optimaalisessa estimoinnissa. Kolmas artikkeli keskittyy vikasietoiseen ja ei-parametriseen estimointiin käyttämällä paikallisia polynomimalleja, kun selitetään ja ennustetaan maanhintoja. Neljäs artikkeli tutkii joustavia aalloke-muunnoksia pitkän ajanjakson maanhintojen vaihteluiden (syklien ja trendien) estimoinnissa. Viimeinen viides artikkeli käyttää vikasietoista parametrista estimaattoria, kolmivaiheista MM-estimaattoria, vähentämään mallintamisessa ilmenevien poikkeavien ja vaikutusvaltaisten havaintopisteiden negatiivinen vaikutus. Tutkimuksen avainhavainto on, että tutkimuksessa tarkasteltuja epästandardeja menetelmiä voidaan soveltaa hyvin käytännön ongelmaratkaisutilanteissa, kun selitetään ja ennustetaan maanhintoja paikallisilla markkinoilla. Tämä pätee erityisesti Espoon hinta-aineistolla, jossa epästandardien menetelmien käyttö johti hedonisiin hintamalleihin, jotka omasivat huomattavasti korkeamman selitysvoimakkuuden ja/tai ennustustarkkuuden kuin tavallisen pienimmän neliösumman mukainen ratkaisu. Espoon osamarkkinoiden tapauksessa rakenteelliset aikasarjamallit, vikasietoinen paikallinen regressioanalyysi ja vikasietoinen MM-estimointi tuottivat tarkempia tuloksia kuin perinteinen pienimmän neliösumman mukainen keino, kun estimoitujen mallien hyvyyttä arviointiin ennustustarkkuuden mielessä eri kriteereillä. Empiirinen tutkimus indikoi varsin voimakkaasti, että Nurmijärven osamarkkinoiden tapauksessa maanhinnan muodostus voitiin parhaiten selittää käyttämällä rakenteellisia aikasarjamalleja. Sen sijaan joustavat polynomimallit ja MM-estimointi eivät tuoneet lisäarvoa mallien paremman ennustustarkkuuden valossa Nurmijärven hinta-aineistolla.reviewe

    Computational statistics using the Bayesian Inference Engine

    Full text link
    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimised software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organise and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasises hybrid tempered MCMC schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE is implements a full persistence or serialisation system that stores the full byte-level image of the running inference and previously characterised posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GPL.Comment: Resubmitted version. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GP

    Bayesian Design in Clinical Trials

    Get PDF
    In the last decade, the number of clinical trials using Bayesian methods has grown dramatically. Nowadays, regulatory authorities appear to be more receptive to Bayesian methods than ever. The Bayesian methodology is well suited to address the issues arising in the planning, analysis, and conduct of clinical trials. Due to their flexibility, Bayesian design methods based on the accrued data of ongoing trials have been recommended by both the US Food and Drug Administration and the European Medicines Agency for dose-response trials in early clinical development. A distinctive feature of the Bayesian approach is its ability to deal with external information, such as historical data, findings from previous studies and expert opinions, through prior elicitation. In fact, it provides a framework for embedding and handling the variability of auxiliary information within the planning and analysis of the study. A growing body of literature examines the use of historical data to augment newly collected data, especially in clinical trials where patients are difficult to recruit, which is the case for rare diseases, for example. Many works explore how this can be done properly, since using historical data has been recognized as less controversial than eliciting prior information from experts’ opinions. In this book, applications of Bayesian design in the planning and analysis of clinical trials are introduced, along with methodological contributions to specific topics of Bayesian statistics. Finally, two reviews regarding the state-of-the-art of the Bayesian approach in clinical field trials are presented
    corecore