66,066 research outputs found

    Sequential Design for Ranking Response Surfaces

    Full text link
    We propose and analyze sequential design methods for the problem of ranking several response surfaces. Namely, given L≥2L \ge 2 response surfaces over a continuous input space X\cal X, the aim is to efficiently find the index of the minimal response across the entire X\cal X. The response surfaces are not known and have to be noisily sampled one-at-a-time. This setting is motivated by stochastic control applications and requires joint experimental design both in space and response-index dimensions. To generate sequential design heuristics we investigate stepwise uncertainty reduction approaches, as well as sampling based on posterior classification complexity. We also make connections between our continuous-input formulation and the discrete framework of pure regret in multi-armed bandits. To model the response surfaces we utilize kriging surrogates. Several numerical examples using both synthetic data and an epidemics control problem are provided to illustrate our approach and the efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures

    Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

    Full text link
    Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our "pathways group lasso with adaptive weights" (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.Comment: 29 page

    How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

    Full text link
    Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

    Bayesian analysis of ranking data with the constrained Extended Plackett-Luce model

    Get PDF
    Multistage ranking models, including the popular Plackett-Luce distribution (PL), rely on the assumption that the ranking process is performed sequentially, by assigning the positions from the top to the bottom one (forward order). A recent contribution to the ranking literature relaxed this assumption with the addition of the discrete-valued reference order parameter, yielding the novel Extended Plackett-Luce model (EPL). Inference on the EPL and its generalization into a finite mixture framework was originally addressed from the frequentist perspective. In this work, we propose the Bayesian estimation of the EPL with order constraints on the reference order parameter. The proposed restrictions reflect a meaningful rank assignment process. By combining the restrictions with the data augmentation strategy and the conjugacy of the Gamma prior distribution with the EPL, we facilitate the construction of a tuned joint Metropolis-Hastings algorithm within Gibbs sampling to simulate from the posterior distribution. The Bayesian approach allows to address more efficiently the inference on the additional discrete-valued parameter and the assessment of its estimation uncertainty. The usefulness of the proposal is illustrated with applications to simulated and real datasets.Comment: 20 pages, 4 figures, 4 tables. arXiv admin note: substantial text overlap with arXiv:1803.0288
    • …
    corecore