1,255 research outputs found
The quantification of Simpsons paradox and other contributions to contingency table theory
The analysis of contingency tables is a powerful statistical tool used in
experiments with categorical variables. This study improves parts of the theory
underlying the use of contingency tables. Specifically, the linkage
disequilibrium parameter as a measure of two-way interactions applied to
three-way tables makes it possible to quantify Simpsons paradox by a simple
formula. With tests on three-way interactions, there is only one that
determines whether the partial interactions of all variables agree or whether
there is at least one variable whose partial interactions disagree. To date,
there has been no test available that determines whether the partial
interactions of a certain variable agree or disagree, and the presented work
closes this gap. This work reveals the relation of the multiplicative and the
additive measure of a three-way interaction. Another contribution addresses the
question of which cells in a contingency table are fixed when the first- and
second-order marginal totals are given. The proposed procedure not only detects
fixed zero counts but also fixed positive counts. This impacts the
determination of the degrees of freedom. Furthermore, limitations of methods
that simulate contingency tables with given pairwise associations are
addressed.Comment: 36 page
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Design of Experiments for Screening
The aim of this paper is to review methods of designing screening
experiments, ranging from designs originally developed for physical experiments
to those especially tailored to experiments on numerical models. The strengths
and weaknesses of the various designs for screening variables in numerical
models are discussed. First, classes of factorial designs for experiments to
estimate main effects and interactions through a linear statistical model are
described, specifically regular and nonregular fractional factorial designs,
supersaturated designs and systematic fractional replicate designs. Generic
issues of aliasing, bias and cancellation of factorial effects are discussed.
Second, group screening experiments are considered including factorial group
screening and sequential bifurcation. Third, random sampling plans are
discussed including Latin hypercube sampling and sampling plans to estimate
elementary effects. Fourth, a variety of modelling methods commonly employed
with screening designs are briefly described. Finally, a novel study
demonstrates six screening methods on two frequently-used exemplars, and their
performances are compared
A New Take on John Maynard Smith's Concept of Protein Space for Understanding Molecular Evolution
Much of the public lacks a proper understanding of Darwinian evolution, a problem that can be addressed with new learning and teaching approaches to be implemented both inside the classroom and in less formal settings. Few analogies have been as successful in communicating the basics of molecular evolution as John Maynard Smithâs protein space analogy (1970), in which he compared protein evolution to the transition between the terms WORD and GENE, changing one letter at a time to yield a different, meaningful word (in his example, the preferred path was WORD â WORE â GORE â GONE â GENE). Using freely available computer science tools (Google Books Ngram Viewer), we offer an update to Maynard Smithâs analogy and explain how it might be developed into an exploratory and pedagogical device for understanding the basics of molecular evolution and, more specifically, the adaptive landscape concept. We explain how the device works through several examples and provide resources that might facilitate its use in multiple settings, ranging from public engagement activities to formal instruction in evolution, population genetics, and computational biology
From Biology to Mathematical Models and Back: Teaching Modeling to Biology Students, and Biology to Math and Engineering Students
We describe the development of a course to teach modeling and mathematical analysis skills to students of biology and to teach biology to students with strong backgrounds in mathematics, physics, or engineering. The two groups of students have different ways of learning material and often have strong negative feelings toward the area of knowledge that they find difficult. To give students a sense of mastery in each area, several complementary approaches are used in the course: 1) a âliveâ textbook that allows students to explore models and mathematical processes interactively; 2) benchmark problems providing key skills on which students make continuous progress; 3) assignment of students to teams of two throughout the semester; 4) regular one-on-one interactions with instructors throughout the semester; and 5) a term project in which students reconstruct, analyze, extend, and then write in detail about a recently published biological model. Based on student evaluations and comments, an attitude survey, and the quality of the students' term papers, the course has significantly increased the ability and willingness of biology students to use mathematical concepts and modeling tools to understand biological systems, and it has significantly enhanced engineering students' appreciation of biology
Recommended from our members
Defining the Identity and Dynamics of Adult Gastric Isthmus Stem Cells.
The gastric corpus epithelium is the thickest part of the gastrointestinal tract and is rapidly turned over. Several markers have been proposed for gastric corpus stem cells in both isthmus and base regions. However, the identity of isthmus stem cells (IsthSCs) and the interaction between distinct stem cell populations is still under debate. Here, based on unbiased genetic labeling and biophysical modeling, we show that corpus glands are compartmentalized into two independent zones, with slow-cycling stem cells maintaining the base and actively cycling stem cells maintaining the pit-isthmus-neck region through a process of "punctuated" neutral drift dynamics. Independent lineage tracing based on Stmn1 and Ki67 expression confirmed that rapidly cycling IsthSCs maintain the pit-isthmus-neck region. Finally, single-cell RNA sequencing (RNA-seq) analysis is used to define the molecular identity and lineage relationship of a single, cycling, IsthSC population. These observations define the identity and functional behavior of IsthSCs.Wellcome Trust
Royal Societ
General methods for evolutionary quantitative genetic inference from generalized mixed models
P.d.V. was supported by a doctoral studentship from the French MinistĂšre de la Recherche et de lâEnseignement SupĂ©rieur. H.S. was supported by an Emmy Noether fellowship from the German Research Foundation (SCHI 1188/1-1). S.N. is supported by a Future Fellowship, Australia (FT130100268). M.M. is supported by a University Research Fellowship from the Royal Society (London). The collection of the Soay sheep data is supported by the National Trust for Scotland and QinetQ, with funding from the Natural Environment Research Council, the Royal Society, and the Leverhulme Trust.Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioural traits, have inherently non-normal distributions. The generalised linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for non-normal traits. However, whereas GLMMs provide inference on a statistically-convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGGLMM. We show that known formulae for quantities such as heritability of traits with Binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation, and apply our approach to data from a wild pedigreed vertebrate population.Publisher PDFPeer reviewe
- âŠ