4 research outputs found

    How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

    Full text link
    Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

    Quantifying Similarity in Reliability Surfaces Using the Probability of Agreement

    Get PDF
    When separate populations exhibit similar reliability as a function of multiple explanatory variables, combining them into a single population is tempting. This can simplify future predictions and reduce uncertainty associated with estimation. However, combining these populations may introduce bias if the underlying relationships are in fact different. The probability of agreement formally and intuitively quantifies the similarity of estimated reliability surfaces across a two-factor input space. An example from the reliability literature demonstrates the utility of the approach when deciding whether to combine two populations or to keep them as distinct. New graphical summaries provide strategies for visualizing the results

    Comparing the Reliability of Related Populations With the Probability of Agreement

    No full text
    <p>Combining information from different populations to improve precision, simplify future predictions, or improve underlying understanding of relationships can be advantageous when considering the reliability of several related sets of systems. Using the probability of agreement to help quantify the similarities of populations can help to give a realistic assessment of whether the systems have reliability that are sufficiently similar for practical purposes to be treated as a homogeneous population. The new method is described and illustrated with an example involving two generations of a complex system, where the reliability is modeled using either a logistic or probit regression model. Note that supplementary materials including code, datasets, and added discussion are available online.</p
    corecore