16 research outputs found
Sine-skewed toroidal distributions and their application in protein bioinformatics
In the bioinformatics field, there has been a growing interest in modelling
dihedral angles of amino acids by viewing them as data on the torus. This has
motivated, over the past years, new proposals of distributions on the bivariate
torus. The main drawback of most of these models is that the related densities
are (pointwise) symmetric, despite the fact that the data usually present
asymmetric patterns. This motivates the need to find a new way of constructing
asymmetric toroidal distributions starting from a symmetric distribution. We
tackle this problem in this paper by introducing the sine-skewed toroidal
distributions. The general properties of the new models are derived. Based on
the initial symmetric model, explicit expressions for the shape parameters are
obtained, a simple algorithm for generating random numbers is provided, and
asymptotic results for the maximum likelihood estimators are established. An
important feature of our construction is that no normalizing constant needs to
be calculated, leading to more flexible distributions without increasing the
complexity of the models. The benefit of employing these new sine-skewed
distributions is shown on the basis of protein data, where, in general, the new
models outperform their symmetric antecedents
multimode: An R Package for Mode Assessment
In several applied fields, multimodality assessment is a crucial task as a previous exploratory tool or for determining the suitability of certain distributions. The goal of this paper is to present the utilities of the R package multimode, which collects different exploratory and testing non-parametric approaches for determining the number of modes and their estimated location. Specifically, some graphical tools (SiZer map, mode tree or mode forest) are provided, allowing for the identification of mode patterns, based on the kernel density estimation. Several formal testing procedures for determining the number of modes are described in this paper and implemented in the multimode package, including methods based on the ideas of the critical bandwidth, the excess mass or using a combination of both. This package also includes a function for estimating the modes locations and different classical data examples that have been considered in mode testing literature
Multimode: An R Package for Mode Assessment
In several applied fields, multimodality assessment is a crucial task as a
previous exploratory tool or for determining the suitability of certain
distributions. The goal of this paper is to present the utilities of the R
package multimode, which collects different exploratory and testing
nonparametric approaches for determining the number of modes and their
estimated location. Specifically, some graphical tools, allowing for the
identification of mode patterns, based on the kernel density estimation are
provided (SiZer map, mode tree or mode forest). Several formal testing
procedures for determining the number of modes are described in this paper and
implemented in the multimode package, including methods based on the ideas of
the critical bandwidth, the excess mass or using a combination of both. This
package also includes a function for estimating the modes locations and
different classical data examples that have been considered in mode testing
literature
Herpesviruses Serology Distinguishes Different Subgroups of Patients From the United Kingdom Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Biobank.
The evidence of an association between Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and chronic herpesviruses infections remains inconclusive. Two reasons for the lack of consistent evidence are the large heterogeneity of the patients' population with different disease triggers and the use of arbitrary cutoffs for defining seropositivity. In this work we re-analyzed previously published serological data related to 7 herpesvirus antigens. Patients with ME/CFS were subdivided into four subgroups related to the disease triggers: S0-42 patients who did not know their disease trigger; S1-43 patients who reported a non-infection trigger; S2-93 patients who reported an infection trigger, but that infection was not confirmed by a lab test; and S3-48 patients who reported an infection trigger and that infection was confirmed by a lab test. In accordance with a sensitivity analysis, the data were compared to those from 99 healthy controls allowing the seropositivity cutoffs to vary within a wide range of possible values. We found a negative association between S1 and seropositivity to Epstein-Barr virus (VCA and EBNA1 antigens) and Varicella-Zoster virus using specific seropositivity cutoff. However, this association was not significant when controlling for multiple testing. We also found that S3 had a lower seroprevalence to the human cytomegalovirus when compared to healthy controls for all cutoffs used for seropositivity and after adjusting for multiple testing using the Benjamini-Hochberg procedure. However, this association did not reach statistical significance when using Benjamini-Yekutieli procedure. In summary, herpesviruses serology could distinguish subgroups of ME/CFS patients according to their disease trigger, but this finding could be eventually affected by the problem of multiple testing
Chronicle of an early demise, surname extinction in the fifteenth and the seventeenth centuries
This is the Author’s Original Manuscript of an article published by Taylor & Francis in Historical Methods: A Journal of Quantitative and Interdisciplinary History on 2018, available online: http://www.tandfonline.com/10.1080/01615440.2018.1462747It has been amply demonstrated that individuals' reproductive capability is the key explanatory phenomenon for understanding onomastic disappearance during the early modern period. This article analyzes the evolution and consequences of surname extinction in a specific population: Catalonia in the sixteenth and seventeenth centuries. In this article two aspects are examined. First, the observed disappearance of surnames is estimated through historical data collected in the Llibres d'Esposalles (Marriage Books) from 1481 to 1600 at Barcelona Cathedral. Second, the estimated natural extinction of those surnames registered in 1481 is forecast by applying a statistical branching processResearch has been funded by Projects MTM2016-76969-P (Spanish State Research Agency, AEI) and MTM2013-41383-P (Spanish Ministry of Economy, Industry and Competitiveness), both co-funded by the European Regional Development Fund (ERDF), IAP network from Belgian Science Policy. Work of J. Ameijeiras-Alonso has been supported by the Ph.D. Grant BES-2014-071006 from the Spanish Ministry of Economy, Industry and CompetitivenessNO
A fresh look at mean-shift based modal clustering
Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift procedure, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets
Hidden Markov random fields for the spatial segmentation of circular data
The aim is to present a model for providing a spatial segmentation of circular data according to a finite number of latent classes employing a hidden Markov random field. Under this setting, the data are modelled by a finite mixture of parametric densities, whose parameters vary across space according to a latent Markov random field. As such, it can be viewed as an extension of a mixture model to the spatial setting. Motivated by wildfires data in the Iberian Peninsula, a model based on a mixture of Kato-Jones circular densities is suggested. This model takes into account special features of wildfire occurrence data such as multimodality, skewness and kurtosis. The parameters of the model will vary across space according to a latent Potts model, modulated by geo-referenced covariates
On optimal tests for circular reflective symmetry about an unknown central direction
Symmetry is one of the most fundamental of dividing hypotheses, its rejection, or not, heavily influencing subsequent modeling strategies. In this paper, the authors construct tests for circular reflective symmetry about an unknown central direction that are asymptotically valid within a semi-parametric class of distributions and maintain certain parametric local and asymptotic optimality properties. The asymptotic distributions of the test statistics under the null hypothesis and under local alternatives are established, and a pre-existing omnibus test is identified as a special case of the proposed construction. The finite-sample properties of the semi-parametric tests are compared with those of other testing approaches in a simulation experiment, and recommendations made regarding testing for reflective symmetry in practice. Analyses of data on the directions of cracks in hip replacements illustrate the proposed methodology.SCOPUS: ar.jinfo:eu-repo/semantics/inPres