53 research outputs found

    Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse

    Get PDF
    Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing non-significant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one ‘significant’ effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their two-way interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are over-fitted before simplification (low N/k ratio). The increase in false-positive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upward-biased effect sizes that often cannot be reproduced in follow-up studies (‘the winner's curse’). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results

    The microRNA regulated SBP-box genes SPL9 and SPL15 control shoot maturation in Arabidopsis

    Get PDF
    Throughout development the Arabidopsis shoot apical meristem successively undergoes several major phase transitions such as the juvenile-to-adult and floral transitions until, finally, it will produce flowers instead of leaves and shoots. Members of the Arabidopsis SBP-box gene family of transcription factors have been implicated in promoting the floral transition in dependence of miR156 and, accordingly, transgenics constitutively over-expressing this microRNA are delayed in flowering. To elaborate their roles in Arabidopsis shoot development, we analysed two of the 11 miR156 regulated Arabidopsis SBP-box genes, i.e. the likely paralogous genes SPL9 and SPL15. Single and double mutant phenotype analysis showed these genes to act redundantly in controlling the juvenile-to-adult phase transition. In addition, their loss-of-function results in a shortened plastochron during vegetative growth, altered inflorescence architecture and enhanced branching. In these aspects, the double mutant partly phenocopies constitutive MIR156b over-expressing transgenic plants and thus a major contribution to the phenotype of these transgenics as a result of the repression of SPL9 and SPL15 is strongly suggested

    Extrapolation for Time-Series and Cross-Sectional Data

    Get PDF
    Extrapolation methods are reliable, objective, inexpensive, quick, and easily automated. As a result, they are widely used, especially for inventory and production forecasts, for operational planning for up to two years ahead, and for long-term forecasts in some situations, such as population forecasting. This paper provides principles for selecting and preparing data, making seasonal adjustments, extrapolating, assessing uncertainty, and identifying when to use extrapolation. The principles are based on received wisdom (i.e., experts’ commonly held opinions) and on empirical studies. Some of the more important principles are:• In selecting and preparing data, use all relevant data and adjust the data for important events that occurred in the past.• Make seasonal adjustments only when seasonal effects are expected and only if there is good evidence by which to measure them.• In extrapolating, use simple functional forms. Weight the most recent data heavily if there are small measurement errors, stable series, and short forecast horizons. Domain knowledge and forecasting expertise can help to select effective extrapolation procedures. When there is uncertainty, be conservative in forecasting trends. Update extrapolation models as new data are received.• To assess uncertainty, make empirical estimates to establish prediction intervals.• Use pure extrapolation when many forecasts are required, little is known about the situation, the situation is stable, and expert forecasts might be biased

    Costello syndrome: Clinical phenotype, genotype, and management guidelines

    Get PDF
    Costello syndrome (CS) is a RASopathy caused by activating germline mutations in HRAS. Due to ubiquitous HRAS gene expression, CS affects multiple organ systems and individuals are predisposed to cancer. Individuals with CS may have distinctive craniofacial features, cardiac anomalies, growth and developmental delays, as well as dermatological, orthopedic, ocular, and neurological issues; however, considerable overlap with other RASopathies exists. Medical evaluation requires an understanding of the multifaceted phenotype. Subspecialists may have limited experience in caring for these individuals because of the rarity of CS. Furthermore, the phenotypic presentation may vary with the underlying genotype. These guidelines were developed by an interdisciplinary team of experts in order to encourage timely health care practices and provide medical management guidelines for the primary and specialty care provider, as well as for the families and affected individuals across their lifespan. These guidelines are based on expert opinion and do not represent evidence-based guidelines due to the lack of data for this rare condition

    The use of experimental data in simulation model validation

    No full text
    The use of experimental data for the validation of deterministic dynamic simulation models based on sets of ordinary differential equations and algebraic equations is discussed. Comparisons of model and target system data are considered using graphical methods and quantitative measures in the time and frequency domains. System identification and parameter estimation methods are emphasized, especially in terms of identifiability analysis which can provide valuable information for experiment design. In general, experiments that are suitable for system identification are also appropriate for model validation. However, there is a dilemma since models are needed for this design process. The experiment design, data collection and analysis of model validation results is, inevitably, an iterative process and experiments designed for model validation can never be truly optimal. A model of the pulmonary gas exchange processes in humans is used to illustrate some issues of identifiability, experiment design and test input selection for model validation
    • …
    corecore