338,704 research outputs found

    On the correct interpretation of p values and the importance of random variables

    Get PDF
    The p value is the probability under the null hypothesis of obtaining an experimental result that is at least as extreme as the one that we have actually obtained. That probability plays a crucial role in frequentist statistical inferences. But if we take the word ‘extreme’ to mean ‘improbable’, then we can show that this type of inference can be very problematic. In this paper, I argue that it is a mistake to make such an interpretation. Under minimal assumptions about the alternative hypothesis, I explain why ‘extreme’ means ‘outside the most precise predicted range of experimental outcomes for a given upper bound probability of error’. Doing so, I rebut recent formulations of recurrent criticisms against the frequentist approach in statistics and underscore the importance of random variables

    Random Forests: some methodological insights

    Get PDF
    This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy

    Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?

    Get PDF
    This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data

    EPR Paradox,Locality and Completeness of Quantum Theory

    Full text link
    The quantum theory (QT) and new stochastic approaches have no deterministic prediction for a single measurement or for a single time -series of events observed for a trapped ion, electron or any other individual physical system. The predictions of QT being of probabilistic character apply to the statistical distribution of the results obtained in various experiments. The probability distribution is not an attribute of a dice but it is a characteristic of a whole random experiment : '' rolling a dice''. and statistical long range correlations between two random variables X and Y are not a proof of any causal relation between these variable. Moreover any probabilistic model used to describe a random experiment is consistent only with a specific protocol telling how the random experiment has to be performed.In this sense the quantum theory is a statistical and contextual theory of phenomena. In this paper we discuss these important topics in some detail. Besides we discuss in historical perspective various prerequisites used in the proofs of Bell and CHSH inequalities concluding that the violation of these inequalities in spin polarization correlation experiments is neither a proof of the completeness of QT nor of its nonlocality. The question whether QT is predictably complete is still open and it should be answered by a careful and unconventional analysis of the experimental data. It is sufficient to analyze more in detail the existing experimental data by using various non-parametric purity tests and other specific statistical tools invented to study the fine structure of the time-series. The correct understanding of statistical and contextual character of QT has far reaching consequences for the quantum information and quantum computing.Comment: 16 pages, 59 references,the contribution to the conference QTRF-4 held in Vaxjo, Sweden, 11-16 june 2007. To be published in the Proceeding

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
    corecore