4,631,246 research outputs found

    RooStats for Searches

    Full text link
    The RooStats toolkit, which is distributed with the ROOT software package, provides a large collection of software tools that implement statistical methods commonly used by the High Energy Physics community. The toolkit is based on RooFit, a high-level data analysis modeling package that implements various methods of statistical data analysis. RooStats enforces a clear mapping of statistical concepts to C++ classes and methods and emphasizes the ability to easily combine analyses within and across experiments. We present an overview of the RooStats toolkit, describe some of the methods used for hypothesis testing and estimation of confidence intervals and finally discuss some of the latest developments.Comment: Contributed to "PHYSTAT 2011 Workshop on Statistical Issues Related to Discovery Claims in Search Experiments and Unfolding

    Statistical analysis of global surface air temperature and sea level using cointegration methods

    Get PDF
    Global sea levels are rising which is widely understood as a consequence of thermal expansion and melting of glaciers and land-based ice caps. Due to physically-based models being unable to simulate observed sea level trends, semi-empirical models have been applied as an alternative for projecting of future sea levels. There is in this, however, potential pitfalls due to the trending nature of the time series. We apply a statistical method called cointegration analysis to observed global sea level and surface air temperature, capable of handling such peculiarities. We find a relationship between sea level and temperature and find that temperature causally depends on the sea level, which can be understood as a consequence of the large heat capacity of the ocean. We further find that the warming episode in the 1940s is exceptional in the sense that sea level and warming deviates from the expected relationship. This suggests that this warming episode is mainly due to internal dynamics of the ocean rather than external radiative forcing. On the other hand, the present warming follows the expected relationship, suggesting that it is mainly due to radiative forcing. In a second step, we use the total radiative forcing as an explanatory variable, but unexpectedly find that the sea level does not depend on the forcing. We hypothesize that this is due to a long adjustment time scale of the ocean and show that the number of years of data needed to build statistical models that have the relationship expected from physics exceeds what is currently available by a factor of almost ten.

    An evaluation of the quality of statistical design and analysis of published medical research : results from a systematic survey of general orthopaedic journals

    Get PDF
    Background: The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. Methods: A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. Results: The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10–26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30–49%) of studies a different analysis should have been undertaken and in 17% (10–26%) a different analysis could have made a difference to the overall conclusions. Conclusion: It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

    Missing.... presumed at random: cost-analysis of incomplete data

    Get PDF
    When collecting patient-level resource use data for statistical analysis, for some patients and in some categories of resource use, the required count will not be observed. Although this problem must arise in most reported economic evaluations containing patient-level data, it is rare for authors to detail how the problem was overcome. Statistical packages may default to handling missing data through a so-called complete case analysis, while some recent cost-analyses have appeared to favour an available case approach. Both of these methods are problematic: complete case analysis is inefficient and is likely to be biased; available case analysis, by employing different numbers of observations for each resource use item, generates severe problems for standard statistical inference. Instead we explore imputation methods for generating replacement values for missing data that will permit complete case analysis using the whole data set and we illustrate these methods using two data sets that had incomplete resource use information
    corecore