14,244 research outputs found

    Statistical applications of the multivariate skew-normal distribution

    Full text link
    Azzalini & Dalla Valle (1996) have recently discussed the multivariate skew-normal distribution which extends the class of normal distributions by the addition of a shape parameter. The first part of the present paper examines further probabilistic properties of the distribution, with special emphasis on aspects of statistical relevance. Inferential and other statistical issues are discussed in the following part, with applications to some multivariate statistics problems, illustrated by numerical examples. Finally, a further extension is described which introduces a skewing factor of an elliptical density.Comment: full-length version of the published paper, 32 pages, with 7 figures, uses psfra

    A method for generating realistic correlation matrices

    Get PDF
    Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform [1,1][-1,1] deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yields results which are superior to those obtained by simply simulating Gaussian data. Moreover, we demonstrate how our general algorithm can be tailored to a number of different correlation models. Using our results with a few different applications, we show that simulating correlation matrices can help assess statistical methodology.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS638 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Land use/land cover mapping (1:25000) of Taiwan, Republic of China by automated multispectral interpretation of LANDSAT imagery

    Get PDF
    Three methods were tested for collection of the training sets needed to establish the spectral signatures of the land uses/land covers sought due to the difficulties of retrospective collection of representative ground control data. Computer preprocessing techniques applied to the digital images to improve the final classification results were geometric corrections, spectral band or image ratioing and statistical cleaning of the representative training sets. A minimal level of statistical verification was made based upon the comparisons between the airphoto estimates and the classification results. The verifications provided a further support to the selection of MSS band 5 and 7. It also indicated that the maximum likelihood ratioing technique can achieve more agreeable classification results with the airphoto estimates than the stepwise discriminant analysis

    Discriminant analysis of solar bright points and faculae I. Classification method and center-to-limb distribution

    Full text link
    While photospheric magnetic elements appear mainly as Bright Points (BPs) at the disk center and as faculae near the limb, high-resolution images reveal the coexistence of BPs and faculae over a range of heliocentric angles. This is not explained by a "hot wall" effect through vertical flux tubes, and suggests that the transition from BPs to faculae needs to be quantitatively investigated. To achieve this, we made the first recorded attempt to discriminate BPs and faculae, using a statistical classification approach based on Linear Discriminant Analysis(LDA). This paper gives a detailed description of our method, and shows its application on high-resolution images of active regions to retrieve a center-to-limb distribution of BPs and faculae. Bright "magnetic" features were detected at various disk positions by a segmentation algorithm using simultaneous G-band and continuum information. By using a selected sample of those features to represent BPs and faculae, suitable photometric parameters were identified in order to carry out LDA. We thus obtained a Center-to-Limb Variation (CLV) of the relative number of BPs and faculae, revealing the predominance of faculae at all disk positions except close to disk center (mu > 0.9). Although the present dataset suffers from limited statistics, our results are consistent with other observations of BPs and faculae at various disk positions. The retrieved CLV indicates that at high resolution, faculae are an essential constituent of active regions all across the solar disk. We speculate that the faculae near disk center as well as the BPs away from disk center are associated with inclined fields

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table

    Correcting the optimally selected resampling-based error rate: A smooth analytical alternative to nested cross-validation

    Get PDF
    High-dimensional binary classification tasks, e.g. the classification of microarray samples into normal and cancer tissues, usually involve a tuning parameter adjusting the complexity of the applied method to the examined data set. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are published. The contribution of this paper is two-fold. Firstly, we develop a new method for tuning bias correction which can be motivated by decision theoretic considerations. The method is based on the decomposition of the unconditional error rate involving the tuning procedure. Our corrected error estimator can be written as a weighted mean of the errors obtained using the different tuning parameter values. It can be interpreted as a smooth version of nested cross-validation (NCV) which is the standard approach for avoiding tuning bias. In contrast to NCV, the weighting scheme of our method guarantees intuitive bounds for the corrected error. Secondly, we suggest to use bias correction methods also to address the bias resulting from the optimal choice of the classification method among several competitors. This method selection bias is particularly relevant to prediction problems in high-dimensional data. In the absence of standards, it is common practice to try several methods successively, which can lead to an optimistic bias similar to the tuning bias. We demonstrate the performance of our method to address both types of bias based on microarray data sets and compare it to existing methods. This study confirms that our approach yields estimates competitive to NCV at a much lower computational price
    corecore