1,255 research outputs found

    The quantification of Simpsons paradox and other contributions to contingency table theory

    Full text link
    The analysis of contingency tables is a powerful statistical tool used in experiments with categorical variables. This study improves parts of the theory underlying the use of contingency tables. Specifically, the linkage disequilibrium parameter as a measure of two-way interactions applied to three-way tables makes it possible to quantify Simpsons paradox by a simple formula. With tests on three-way interactions, there is only one that determines whether the partial interactions of all variables agree or whether there is at least one variable whose partial interactions disagree. To date, there has been no test available that determines whether the partial interactions of a certain variable agree or disagree, and the presented work closes this gap. This work reveals the relation of the multiplicative and the additive measure of a three-way interaction. Another contribution addresses the question of which cells in a contingency table are fixed when the first- and second-order marginal totals are given. The proposed procedure not only detects fixed zero counts but also fixed positive counts. This impacts the determination of the degrees of freedom. Furthermore, limitations of methods that simulate contingency tables with given pairwise associations are addressed.Comment: 36 page

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Design of Experiments for Screening

    Full text link
    The aim of this paper is to review methods of designing screening experiments, ranging from designs originally developed for physical experiments to those especially tailored to experiments on numerical models. The strengths and weaknesses of the various designs for screening variables in numerical models are discussed. First, classes of factorial designs for experiments to estimate main effects and interactions through a linear statistical model are described, specifically regular and nonregular fractional factorial designs, supersaturated designs and systematic fractional replicate designs. Generic issues of aliasing, bias and cancellation of factorial effects are discussed. Second, group screening experiments are considered including factorial group screening and sequential bifurcation. Third, random sampling plans are discussed including Latin hypercube sampling and sampling plans to estimate elementary effects. Fourth, a variety of modelling methods commonly employed with screening designs are briefly described. Finally, a novel study demonstrates six screening methods on two frequently-used exemplars, and their performances are compared

    Math inside : surprising mathematics

    Get PDF

    Math inside : surprising mathematics

    Get PDF

    A New Take on John Maynard Smith's Concept of Protein Space for Understanding Molecular Evolution

    Get PDF
    Much of the public lacks a proper understanding of Darwinian evolution, a problem that can be addressed with new learning and teaching approaches to be implemented both inside the classroom and in less formal settings. Few analogies have been as successful in communicating the basics of molecular evolution as John Maynard Smith’s protein space analogy (1970), in which he compared protein evolution to the transition between the terms WORD and GENE, changing one letter at a time to yield a different, meaningful word (in his example, the preferred path was WORD → WORE → GORE → GONE → GENE). Using freely available computer science tools (Google Books Ngram Viewer), we offer an update to Maynard Smith’s analogy and explain how it might be developed into an exploratory and pedagogical device for understanding the basics of molecular evolution and, more specifically, the adaptive landscape concept. We explain how the device works through several examples and provide resources that might facilitate its use in multiple settings, ranging from public engagement activities to formal instruction in evolution, population genetics, and computational biology

    From Biology to Mathematical Models and Back: Teaching Modeling to Biology Students, and Biology to Math and Engineering Students

    Get PDF
    We describe the development of a course to teach modeling and mathematical analysis skills to students of biology and to teach biology to students with strong backgrounds in mathematics, physics, or engineering. The two groups of students have different ways of learning material and often have strong negative feelings toward the area of knowledge that they find difficult. To give students a sense of mastery in each area, several complementary approaches are used in the course: 1) a “live” textbook that allows students to explore models and mathematical processes interactively; 2) benchmark problems providing key skills on which students make continuous progress; 3) assignment of students to teams of two throughout the semester; 4) regular one-on-one interactions with instructors throughout the semester; and 5) a term project in which students reconstruct, analyze, extend, and then write in detail about a recently published biological model. Based on student evaluations and comments, an attitude survey, and the quality of the students' term papers, the course has significantly increased the ability and willingness of biology students to use mathematical concepts and modeling tools to understand biological systems, and it has significantly enhanced engineering students' appreciation of biology

    General methods for evolutionary quantitative genetic inference from generalized mixed models

    Get PDF
    P.d.V. was supported by a doctoral studentship from the French MinistĂšre de la Recherche et de l’Enseignement SupĂ©rieur. H.S. was supported by an Emmy Noether fellowship from the German Research Foundation (SCHI 1188/1-1). S.N. is supported by a Future Fellowship, Australia (FT130100268). M.M. is supported by a University Research Fellowship from the Royal Society (London). The collection of the Soay sheep data is supported by the National Trust for Scotland and QinetQ, with funding from the Natural Environment Research Council, the Royal Society, and the Leverhulme Trust.Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioural traits, have inherently non-normal distributions. The generalised linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for non-normal traits. However, whereas GLMMs provide inference on a statistically-convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGGLMM. We show that known formulae for quantities such as heritability of traits with Binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation, and apply our approach to data from a wild pedigreed vertebrate population.Publisher PDFPeer reviewe
    • 

    corecore