1,488 research outputs found

    Using simulation studies to evaluate statistical methods

    Get PDF
    Simulation studies are computer experiments that involve creating data by pseudorandom sampling. The key strength of simulation studies is the ability to understand the behaviour of statistical methods because some 'truth' (usually some parameter/s of interest) is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analysed and reported. This tutorial outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting and presentation. In particular, this tutorial provides: a structured approach for planning and reporting simulation studies, which involves defining aims, data-generating mechanisms, estimands, methods and performance measures ('ADEMP'); coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their estimation; guidance on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing recent practice, we review 100 articles taken from Volume 34 of Statistics in Medicine that included at least one simulation study and identify areas for improvement.Comment: 31 pages, 9 figures (2 in appendix), 8 tables (1 in appendix

    Statistical Testing of Optimality Conditions in Multiresponse Simulation-based Optimization (Revision of 2005-81)

    Get PDF
    This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination (proposed by some optimization heuristic) satisfies the Karush-Kuhn-Tucker (KKT) first-order optimality conditions. The pa- per focuses on "expensive" simulations, which have small sample sizes. The paper applies the classic t test to check whether the specific input combination is feasi- ble, and whether any constraints are binding; it applies bootstrapping (resampling) to test the estimated gradients in the KKT conditions. The new methodology is applied to three examples, which gives encouraging empirical results.Stopping rule;metaheuristics;response surface methodology;design of experiments

    09181 Abstracts Collection -- Sampling-based Optimization in the Presence of Uncertainty

    Get PDF
    This Dagstuhl seminar brought together researchers from statistical ranking and selection; experimental design and response-surface modeling; stochastic programming; approximate dynamic programming; optimal learning; and the design and analysis of computer experiments with the goal of attaining a much better mutual understanding of the commonalities and differences of the various approaches to sampling-based optimization, and to take first steps toward an overarching theory, encompassing many of the topics above

    The Development of Statistical Computing at Rothamsted

    Get PDF
    An account is given of the development of statistical computing at Rothamsted. It is concerned mainly with the period from 1954 (when the first electronic computer was delivered) until 1985 (when this article was written). Initially, many specialised programs were written, but it was soon realised that, for efficiency, general-purpose programs—each unifying many statistical techniques—were required. The development of these programs was gradual and required corresponding developments in statistical theory. Now, the bulk of statistical work, not only for Rothamsted but also for the Agricultural and Food Research Service (AFRS) as a whole, is covered by a few programs, notably Genstat that has an international market. Further developments of these programs are required to make them more accessible to scientists who are not well versed in statistics and to take advantage of technological advances

    Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

    Get PDF
    The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.Comment: 26 page

    Developing a comprehensive framework for multimodal feature extraction

    Full text link
    Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. In a world where nearly every new service has its own API, documentation, and/or client library, data scientists who need to combine diverse features obtained from multiple sources are often forced to write and maintain ever more elaborate feature extraction pipelines. To address this challenge, we introduce a new open-source framework for comprehensive multimodal feature extraction. Pliers is an open-source Python package that supports standardized annotation of diverse data types (video, images, audio, and text), and is expressly with both ease-of-use and extensibility in mind. Users can apply a wide range of pre-existing feature extraction tools to their data in just a few lines of Python code, and can also easily add their own custom extractors by writing modular classes. A graph-based API enables rapid development of complex feature extraction pipelines that output results in a single, standardized format. We describe the package's architecture, detail its major advantages over previous feature extraction toolboxes, and use a sample application to a large functional MRI dataset to illustrate how pliers can significantly reduce the time and effort required to construct sophisticated feature extraction workflows while increasing code clarity and maintainability

    GECKO: a complete large-scale gene expression analysis platform

    Get PDF
    BACKGROUND: Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. RESULTS: Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing ~ 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (~ 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. CONCLUSIONS: The Gecko system is being made publicly available as free software . In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs

    Detecting mistakes in engineering models: the effects of experimental design

    Get PDF
    This paper presents the results of an experiment with human subjects investigating their ability to discover a mistake in a model used for engineering design. For the purpose of this study, a known mistake was intentionally placed into a model that was to be used by engineers in a design process. The treatment condition was the experimental design that the subjects were asked to use to explore the design alternatives available to them. The engineers in the study were asked to improve the performance of the engineering system and were not informed that there was a mistake intentionally placed in the model. Of the subjects who varied only one-factor-at-a-time, fourteen of the twenty-seven independently identified the mistake during debriefing after the design process. A much lower fraction, one out of twenty-seven engineers, independently identified the mistake during debriefing when they used a fractional factorial experimental design. Regression analysis shows that relevant domain knowledge improved the ability of subjects to discover mistakes in models, but experimental design had a larger effect than domain knowledge in this study. Analysis of video tapes provided additional confirmation as the likelihood of subjects to appear surprised by data from a model was significantly different across the treatment conditions. This experiment suggests that the complexity of factor changes during the design process is a major consideration influencing the ability of engineers to critically assess models.Charles Stark Draper LaboratorySUTD-MIT International Design Centr
    • …
    corecore