16 research outputs found

    Sine-skewed toroidal distributions and their application in protein bioinformatics

    Get PDF
    In the bioinformatics field, there has been a growing interest in modelling dihedral angles of amino acids by viewing them as data on the torus. This has motivated, over the past years, new proposals of distributions on the bivariate torus. The main drawback of most of these models is that the related densities are (pointwise) symmetric, despite the fact that the data usually present asymmetric patterns. This motivates the need to find a new way of constructing asymmetric toroidal distributions starting from a symmetric distribution. We tackle this problem in this paper by introducing the sine-skewed toroidal distributions. The general properties of the new models are derived. Based on the initial symmetric model, explicit expressions for the shape parameters are obtained, a simple algorithm for generating random numbers is provided, and asymptotic results for the maximum likelihood estimators are established. An important feature of our construction is that no normalizing constant needs to be calculated, leading to more flexible distributions without increasing the complexity of the models. The benefit of employing these new sine-skewed distributions is shown on the basis of protein data, where, in general, the new models outperform their symmetric antecedents

    multimode: An R Package for Mode Assessment

    Get PDF
    In several applied fields, multimodality assessment is a crucial task as a previous exploratory tool or for determining the suitability of certain distributions. The goal of this paper is to present the utilities of the R package multimode, which collects different exploratory and testing non-parametric approaches for determining the number of modes and their estimated location. Specifically, some graphical tools (SiZer map, mode tree or mode forest) are provided, allowing for the identification of mode patterns, based on the kernel density estimation. Several formal testing procedures for determining the number of modes are described in this paper and implemented in the multimode package, including methods based on the ideas of the critical bandwidth, the excess mass or using a combination of both. This package also includes a function for estimating the modes locations and different classical data examples that have been considered in mode testing literature

    Multimode: An R Package for Mode Assessment

    Full text link
    In several applied fields, multimodality assessment is a crucial task as a previous exploratory tool or for determining the suitability of certain distributions. The goal of this paper is to present the utilities of the R package multimode, which collects different exploratory and testing nonparametric approaches for determining the number of modes and their estimated location. Specifically, some graphical tools, allowing for the identification of mode patterns, based on the kernel density estimation are provided (SiZer map, mode tree or mode forest). Several formal testing procedures for determining the number of modes are described in this paper and implemented in the multimode package, including methods based on the ideas of the critical bandwidth, the excess mass or using a combination of both. This package also includes a function for estimating the modes locations and different classical data examples that have been considered in mode testing literature

    Herpesviruses Serology Distinguishes Different Subgroups of Patients From the United Kingdom Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Biobank.

    Get PDF
    The evidence of an association between Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and chronic herpesviruses infections remains inconclusive. Two reasons for the lack of consistent evidence are the large heterogeneity of the patients' population with different disease triggers and the use of arbitrary cutoffs for defining seropositivity. In this work we re-analyzed previously published serological data related to 7 herpesvirus antigens. Patients with ME/CFS were subdivided into four subgroups related to the disease triggers: S0-42 patients who did not know their disease trigger; S1-43 patients who reported a non-infection trigger; S2-93 patients who reported an infection trigger, but that infection was not confirmed by a lab test; and S3-48 patients who reported an infection trigger and that infection was confirmed by a lab test. In accordance with a sensitivity analysis, the data were compared to those from 99 healthy controls allowing the seropositivity cutoffs to vary within a wide range of possible values. We found a negative association between S1 and seropositivity to Epstein-Barr virus (VCA and EBNA1 antigens) and Varicella-Zoster virus using specific seropositivity cutoff. However, this association was not significant when controlling for multiple testing. We also found that S3 had a lower seroprevalence to the human cytomegalovirus when compared to healthy controls for all cutoffs used for seropositivity and after adjusting for multiple testing using the Benjamini-Hochberg procedure. However, this association did not reach statistical significance when using Benjamini-Yekutieli procedure. In summary, herpesviruses serology could distinguish subgroups of ME/CFS patients according to their disease trigger, but this finding could be eventually affected by the problem of multiple testing

    Chronicle of an early demise, surname extinction in the fifteenth and the seventeenth centuries

    Get PDF
    This is the Author’s Original Manuscript of an article published by Taylor & Francis in Historical Methods: A Journal of Quantitative and Interdisciplinary History on 2018, available online: http://www.tandfonline.com/10.1080/01615440.2018.1462747It has been amply demonstrated that individuals' reproductive capability is the key explanatory phenomenon for understanding onomastic disappearance during the early modern period. This article analyzes the evolution and consequences of surname extinction in a specific population: Catalonia in the sixteenth and seventeenth centuries. In this article two aspects are examined. First, the observed disappearance of surnames is estimated through historical data collected in the Llibres d'Esposalles (Marriage Books) from 1481 to 1600 at Barcelona Cathedral. Second, the estimated natural extinction of those surnames registered in 1481 is forecast by applying a statistical branching processResearch has been funded by Projects MTM2016-76969-P (Spanish State Research Agency, AEI) and MTM2013-41383-P (Spanish Ministry of Economy, Industry and Competitiveness), both co-funded by the European Regional Development Fund (ERDF), IAP network from Belgian Science Policy. Work of J. Ameijeiras-Alonso has been supported by the Ph.D. Grant BES-2014-071006 from the Spanish Ministry of Economy, Industry and CompetitivenessNO

    A fresh look at mean-shift based modal clustering

    No full text
    Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift procedure, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets

    Hidden Markov random fields for the spatial segmentation of circular data

    No full text
    The aim is to present a model for providing a spatial segmentation of circular data according to a finite number of latent classes employing a hidden Markov random field. Under this setting, the data are modelled by a finite mixture of parametric densities, whose parameters vary across space according to a latent Markov random field. As such, it can be viewed as an extension of a mixture model to the spatial setting. Motivated by wildfires data in the Iberian Peninsula, a model based on a mixture of Kato-Jones circular densities is suggested. This model takes into account special features of wildfire occurrence data such as multimodality, skewness and kurtosis. The parameters of the model will vary across space according to a latent Potts model, modulated by geo-referenced covariates

    On optimal tests for circular reflective symmetry about an unknown central direction

    Get PDF
    Symmetry is one of the most fundamental of dividing hypotheses, its rejection, or not, heavily influencing subsequent modeling strategies. In this paper, the authors construct tests for circular reflective symmetry about an unknown central direction that are asymptotically valid within a semi-parametric class of distributions and maintain certain parametric local and asymptotic optimality properties. The asymptotic distributions of the test statistics under the null hypothesis and under local alternatives are established, and a pre-existing omnibus test is identified as a special case of the proposed construction. The finite-sample properties of the semi-parametric tests are compared with those of other testing approaches in a simulation experiment, and recommendations made regarding testing for reflective symmetry in practice. Analyses of data on the directions of cracks in hip replacements illustrate the proposed methodology.SCOPUS: ar.jinfo:eu-repo/semantics/inPres
    corecore