3,083 research outputs found

    The Detailed Forms of the LMC Cepheid PL and PLC Relations

    Get PDF
    Possible deviations from linearity of the LMC Cepheid PL and PLC relations are investigated. Two datasets are studied, respectively from the OGLE and MACHO projects. A nonparametric test, based on linear regression residuals, suggests that neither PL relation is linear. If colour dependence is allowed for then the MACHO PL relation is found to deviate more significantly from the linear, while the OGLE PL relation is consistent with linearity. These finding are confirmed by fitting "Generalised Additive Models" (nonparametric regression functions) to the two datasets. Colour dependence is shown to be nonlinear in both datasets, distinctly so in the case of the MACHO Cepheids. It is also shown that there is interaction between the period and colour functions in the MACHO data.Comment: 20 pages, 20 figures, MNRAS accepte

    A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics.</p> <p>Results</p> <p>This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases.</p> <p>Conclusions</p> <p>The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.</p

    A force profile analysis comparison between functional data analysis, statistical parametric mapping and statistical non-parametric mapping in on-water single sculling

    Get PDF
    Objectives: To examine whether the Functional Data Analysis (FDA), Statistical Parametric Mapping (SPM) and Statistical non-Parametric Mapping (SnPM) hypothesis testing techniques differ in their ability to draw inferences in the context of a single, simple experimental design. Design: The sample data used is cross-sectional (two-sample gender comparison) and evaluation of differences between statistical techniques used a combination of descriptive and qualitative assessments. Methods: FDA, SPM and SnPM t-tests were applied to sample data of twenty highly skilled male and female rowers, rowing at 32 strokes per minute in a single scull boat. Statistical differences for gender were assessed by applying two t-tests (one for each side of the boat). Results: The t-statistic values were identical for all three methods (with the FDA t-statistic presented as an absolute measure). The critical t-statistics (tcrit) were very similar between the techniques, with SPM tcrit providing a marginally higher tcrit than the FDA and SnPM tcrit values (which were identical). All techniques were successful in identifying consistent sections of the force waveform, where male and female rowers were shown to differ significantly (p < 0.05). Conclusions: This is the first study to show that FDA, SPM and SnPM t-tests provide consistent results when applied to sports biomechanics data. Though the results were similar, selection of one technique over another by applied researchers and practitioners should be based on the underlying parametric assumption of SPM, as well as contextual factors related to the type of waveform data to be analysed and the experimental research question of interest

    A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity

    Get PDF
    We investigate the properties of a Block Decomposition Method (BDM), which extends the power of a Coding Theorem Method (CTM) that approximates local estimations of algorithmic complexity based on Solomonoff–Levin’s theory of algorithmic probability providing a closer connection to algorithmic complexity than previous attempts based on statistical regularities such as popular lossless compression schemes. The strategy behind BDM is to find small computer programs that produce the components of a larger, decomposed object. The set of short computer programs can then be artfully arranged in sequence so as to produce the original object. We show that the method provides efficient estimations of algorithmic complexity but that it performs like Shannon entropy when it loses accuracy. We estimate errors and study the behaviour of BDM for different boundary conditions, all of which are compared and assessed in detail. The measure may be adapted for use with more multi-dimensional objects than strings, objects such as arrays and tensors. To test the measure we demonstrate the power of CTM on low algorithmic-randomness objects that are assigned maximal entropy (e.g., π) but whose numerical approximations are closer to the theoretical low algorithmic-randomness expectation. We also test the measure on larger objects including dual, isomorphic and cospectral graphs for which we know that algorithmic randomness is low. We also release implementations of the methods in most major programming languages—Wolfram Language (Mathematica), Matlab, R, Perl, Python, Pascal, C++, and Haskell—and an online algorithmic complexity calculator.Swedish Research Council (Vetenskapsrådet

    A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity

    Get PDF
    We investigate the properties of a Block Decomposition Method (BDM), which extends the power of a Coding Theorem Method (CTM) that approximates local estimations of algorithmic complexity based on Solomonoff–Levin’s theory of algorithmic probability providing a closer connection to algorithmic complexity than previous attempts based on statistical regularities such as popular lossless compression schemes. The strategy behind BDM is to find small computer programs that produce the components of a larger, decomposed object. The set of short computer programs can then be artfully arranged in sequence so as to produce the original object. We show that the method provides efficient estimations of algorithmic complexity but that it performs like Shannon entropy when it loses accuracy. We estimate errors and study the behaviour of BDM for different boundary conditions, all of which are compared and assessed in detail. The measure may be adapted for use with more multi-dimensional objects than strings, objects such as arrays and tensors. To test the measure we demonstrate the power of CTM on low algorithmic-randomness objects that are assigned maximal entropy (e.g., π) but whose numerical approximations are closer to the theoretical low algorithmic-randomness expectation. We also test the measure on larger objects including dual, isomorphic and cospectral graphs for which we know that algorithmic randomness is low. We also release implementations of the methods in most major programming languages—Wolfram Language (Mathematica), Matlab, R, Perl, Python, Pascal, C++, and Haskell—and an online algorithmic complexity calculator.Swedish Research Council (Vetenskapsrådet

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Spatial clustering of array CGH features in combination with hierarchical multiple testing

    Get PDF
    We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering: joining contiguous, correlated regions based on a maximum likelihood principle. The model-based clustering algorithm accounts for the apparent spatial patterns in the data. We evaluate the randomness of the clustering result by a cluster stability score in combination with cross-validation. Moreover, we argue that the clustering really captures spatial genomic dependency by showing that coincidental clustering of independent regions is very unlikely. Using the region and cluster information, we combine testing of these for association with a clinical variable in an hierarchical multiple testing approach. This allows for interpreting the significance of both regions and clusters while controlling the Family-Wise Error Rate simultaneously. We prove that in the context of permutation tests and permutation-invariant clusters it is allowed to perform clustering and testing on the same data set. Our procedures are illustrated on two cancer data sets
    • …
    corecore