1,488 research outputs found
Using simulation studies to evaluate statistical methods
Simulation studies are computer experiments that involve creating data by
pseudorandom sampling. The key strength of simulation studies is the ability to
understand the behaviour of statistical methods because some 'truth' (usually
some parameter/s of interest) is known from the process of generating the data.
This allows us to consider properties of methods, such as bias. While widely
used, simulation studies are often poorly designed, analysed and reported. This
tutorial outlines the rationale for using simulation studies and offers
guidance for design, execution, analysis, reporting and presentation. In
particular, this tutorial provides: a structured approach for planning and
reporting simulation studies, which involves defining aims, data-generating
mechanisms, estimands, methods and performance measures ('ADEMP'); coherent
terminology for simulation studies; guidance on coding simulation studies; a
critical discussion of key performance measures and their estimation; guidance
on structuring tabular and graphical presentation of results; and new graphical
presentations. With a view to describing recent practice, we review 100
articles taken from Volume 34 of Statistics in Medicine that included at least
one simulation study and identify areas for improvement.Comment: 31 pages, 9 figures (2 in appendix), 8 tables (1 in appendix
Statistical Testing of Optimality Conditions in Multiresponse Simulation-based Optimization (Revision of 2005-81)
This paper studies simulation-based optimization with multiple outputs. It assumes that the simulation model has one random objective function and must satisfy given constraints on the other random outputs. It presents a statistical procedure for test- ing whether a specific input combination (proposed by some optimization heuristic) satisfies the Karush-Kuhn-Tucker (KKT) first-order optimality conditions. The pa- per focuses on "expensive" simulations, which have small sample sizes. The paper applies the classic t test to check whether the specific input combination is feasi- ble, and whether any constraints are binding; it applies bootstrapping (resampling) to test the estimated gradients in the KKT conditions. The new methodology is applied to three examples, which gives encouraging empirical results.Stopping rule;metaheuristics;response surface methodology;design of experiments
09181 Abstracts Collection -- Sampling-based Optimization in the Presence of Uncertainty
This Dagstuhl seminar brought together researchers from statistical ranking and selection; experimental design and response-surface modeling; stochastic programming; approximate dynamic programming; optimal learning; and the design and analysis of computer experiments with the goal of attaining a much better mutual understanding of the commonalities and differences of the various approaches to sampling-based optimization, and to take first
steps toward an overarching theory, encompassing many of the topics above
The Development of Statistical Computing at Rothamsted
An account is given of the development of statistical computing at Rothamsted. It is concerned mainly with the period from 1954 (when the first electronic computer was delivered) until 1985 (when this article was written). Initially, many specialised programs were written, but it was soon realised that, for efficiency, general-purpose programs—each unifying many statistical techniques—were required. The development of these programs was gradual and required corresponding developments in statistical theory. Now, the bulk of statistical work, not only for Rothamsted but also for the Agricultural and Food Research Service (AFRS) as a whole, is covered by a few programs, notably Genstat that has an international market. Further developments of these programs are required to make them more accessible to scientists who are not well versed in statistics and to take advantage of technological advances
Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering
The history of data analysis that is addressed here is underpinned by two
themes, -- those of tabular data analysis, and the analysis of collected
heterogeneous data. "Exploratory data analysis" is taken as the heuristic
approach that begins with data and information and seeks underlying explanation
for what is observed or measured. I also cover some of the evolving context of
research and applications, including scholarly publishing, technology transfer
and the economic relationship of the university to society.Comment: 26 page
A multivariable screening procedure adaptable to electronic computers for the empirical exploration of response surfaces
M.S.Harrison M. Wadswort
Developing a comprehensive framework for multimodal feature extraction
Feature extraction is a critical component of many applied data science
workflows. In recent years, rapid advances in artificial intelligence and
machine learning have led to an explosion of feature extraction tools and
services that allow data scientists to cheaply and effectively annotate their
data along a vast array of dimensions---ranging from detecting faces in images
to analyzing the sentiment expressed in coherent text. Unfortunately, the
proliferation of powerful feature extraction services has been mirrored by a
corresponding expansion in the number of distinct interfaces to feature
extraction services. In a world where nearly every new service has its own API,
documentation, and/or client library, data scientists who need to combine
diverse features obtained from multiple sources are often forced to write and
maintain ever more elaborate feature extraction pipelines. To address this
challenge, we introduce a new open-source framework for comprehensive
multimodal feature extraction. Pliers is an open-source Python package that
supports standardized annotation of diverse data types (video, images, audio,
and text), and is expressly with both ease-of-use and extensibility in mind.
Users can apply a wide range of pre-existing feature extraction tools to their
data in just a few lines of Python code, and can also easily add their own
custom extractors by writing modular classes. A graph-based API enables rapid
development of complex feature extraction pipelines that output results in a
single, standardized format. We describe the package's architecture, detail its
major advantages over previous feature extraction toolboxes, and use a sample
application to a large functional MRI dataset to illustrate how pliers can
significantly reduce the time and effort required to construct sophisticated
feature extraction workflows while increasing code clarity and maintainability
GECKO: a complete large-scale gene expression analysis platform
BACKGROUND: Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. RESULTS: Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing ~ 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (~ 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. CONCLUSIONS: The Gecko system is being made publicly available as free software . In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs
Detecting mistakes in engineering models: the effects of experimental design
This paper presents the results of an experiment with human subjects investigating their ability to discover a mistake in a model used for engineering design. For the purpose of this study, a known mistake was intentionally placed into a model that was to be used by engineers in a design process. The treatment condition was the experimental design that the subjects were asked to use to explore the design alternatives available to them. The engineers in the study were asked to improve the performance of the engineering system and were not informed that there was a mistake intentionally placed in the model. Of the subjects who varied only one-factor-at-a-time, fourteen of the twenty-seven independently identified the mistake during debriefing after the design process. A much lower fraction, one out of twenty-seven engineers, independently identified the mistake during debriefing when they used a fractional factorial experimental design. Regression analysis shows that relevant domain knowledge improved the ability of subjects to discover mistakes in models, but experimental design had a larger effect than domain knowledge in this study. Analysis of video tapes provided additional confirmation as the likelihood of subjects to appear surprised by data from a model was significantly different across the treatment conditions. This experiment suggests that the complexity of factor changes during the design process is a major consideration influencing the ability of engineers to critically assess models.Charles Stark Draper LaboratorySUTD-MIT International Design Centr
- …