49,943 research outputs found

    A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data

    Full text link
    Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts

    Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?

    Get PDF
    This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data

    Bayesian analysis of ranking data with the constrained Extended Plackett-Luce model

    Get PDF
    Multistage ranking models, including the popular Plackett-Luce distribution (PL), rely on the assumption that the ranking process is performed sequentially, by assigning the positions from the top to the bottom one (forward order). A recent contribution to the ranking literature relaxed this assumption with the addition of the discrete-valued reference order parameter, yielding the novel Extended Plackett-Luce model (EPL). Inference on the EPL and its generalization into a finite mixture framework was originally addressed from the frequentist perspective. In this work, we propose the Bayesian estimation of the EPL with order constraints on the reference order parameter. The proposed restrictions reflect a meaningful rank assignment process. By combining the restrictions with the data augmentation strategy and the conjugacy of the Gamma prior distribution with the EPL, we facilitate the construction of a tuned joint Metropolis-Hastings algorithm within Gibbs sampling to simulate from the posterior distribution. The Bayesian approach allows to address more efficiently the inference on the additional discrete-valued parameter and the assessment of its estimation uncertainty. The usefulness of the proposal is illustrated with applications to simulated and real datasets.Comment: 20 pages, 4 figures, 4 tables. arXiv admin note: substantial text overlap with arXiv:1803.0288

    A Grouping Genetic Algorithm for Joint Stratification and Sample Allocation Designs

    Full text link
    Predicting the cheapest sample size for the optimal stratification in multivariate survey design is a problem in cases where the population frame is large. A solution exists that iteratively searches for the minimum sample size necessary to meet accuracy constraints in partitions of atomic strata created by the Cartesian product of auxiliary variables into larger strata. The optimal stratification can be found by testing all possible partitions. However the number of possible partitions grows exponentially with the number of initial strata. There are alternative ways of modelling this problem, one of the most natural is using Genetic Algorithms (GA). These evolutionary algorithms use recombination, mutation and selection to search for optimal solutions. They often converge on optimal or near-optimal solution more quickly than exact methods. We propose a new GA approach to this problem using grouping genetic operators instead of traditional operators. The results show a significant improvement in solution quality for similar computational effort, corresponding to large monetary savings.Comment: 22 page

    MULTIPLE COMPARISONS WITH THE BEST: BAYESIAN PRECISION MEASURES OF EFFICIENCY RANKINGS

    Get PDF
    A large literature exists on measuring the allocative and technical efficiency of a set of firms. A segment of this literature uses data envelopment analysis (DEA), creating relative efficiency rankings that are nonstochastic and thus cannot be evaluated according to the precision of the rankings. A parallel literature uses econometric techniques to estimate stochastic production frontiers or distance functions, providing at least the possibility of computing the precision of the resulting efficiency rankings. Recently, Horrace and Schmidt (2000) have applied sampling theoretic statistical techniques known as multiple comparisons with control (MCC) and multiple comparisons with the best (MCB) to the issue of measuring the precision of efficiency rankings. This paper offers a Bayesian multiple comparison alternative that we argue is simpler to implement, gives the researcher increased exibility over the type of comparison made, and provides greater, and more in-tuitive, information content. We demonstrate this method on technical efficiency rankings of a set of U.S. electric generating firms derived within a distance function framework.Research Methods/ Statistical Methods,

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    Evaluating probabilistic forecasts with scoringRules

    Get PDF
    Probabilistic forecasts in the form of probability distributions over future events have become popular in several fields including meteorology, hydrology, economics, and demography. In typical applications, many alternative statistical models and data sources can be used to produce probabilistic forecasts. Hence, evaluating and selecting among competing methods is an important task. The scoringRules package for R provides functionality for comparative evaluation of probabilistic models based on proper scoring rules, covering a wide range of situations in applied work. This paper discusses implementation and usage details, presents case studies from meteorology and economics, and points to the relevant background literature
    • …
    corecore