667 research outputs found

    Torus principal component analysis with applications to RNA structure

    Get PDF
    There are several cutting edge applications needing PCA methods for data on tori, and we propose a novel torus-PCA method that adaptively favors low-dimensional representations while preventing overfitting by a new test—both of which can be generally applied and address shortcomings in two previously proposed PCA methods. Unlike tangent space PCA, our torus-PCA features structure fidelity by honoring the cyclic topology of the data space and, unlike geodesic PCA, produces nonwinding, nondense descriptors. These features are achieved by deforming tori into spheres with self-gluing and then using a variant of the recently developed principal nested spheres analysis. This PCA analysis involves a step of subsphere fitting, and we provide a new test to avoid overfitting. We validate our torus-PCA by application to an RNA benchmark data set. Further, using a larger RNA data set, torus-PCA recovers previously found structure, now globally at the one-dimensional representation, which is not accessible via tangent space PCA

    Warped metrics for location-scale models

    Full text link
    This paper argues that a class of Riemannian metrics, called warped metrics, plays a fundamental role in statistical problems involving location-scale models. The paper reports three new results : i) the Rao-Fisher metric of any location-scale model is a warped metric, provided that this model satisfies a natural invariance condition, ii) the analytic expression of the sectional curvature of this metric, iii) the exact analytic solution of the geodesic equation of this metric. The paper applies these new results to several examples of interest, where it shows that warped metrics turn location-scale models into complete Riemannian manifolds of negative sectional curvature. This is a very suitable situation for developing algorithms which solve problems of classification and on-line estimation. Thus, by revealing the connection between warped metrics and location-scale models, the present paper paves the way to the introduction of new efficient statistical algorithms.Comment: preprint of a submission to GSI 2017 conferenc

    A New Unified Approach for the Simulation of a Wide Class of Directional Distributions

    Get PDF
    The need for effective simulation methods for directional distributions has grown as they have become components in more sophisticated statistical models. A new acceptance-rejection method is proposed and investigated for the Bingham distribution on the sphere using the angular central Gaussian distribution as an envelope. It is shown that the proposed method has high efficiency and is also straightforward to use. Next, the simulation method is extended to the Fisher and Fisher-Bingham distributions on spheres and related manifolds. Together, these results provide a widely applicable and efficient methodology to simulate many of the standard models in directional data analysis. An R package simdd, available in the online supplementary material, implements these simulation methods

    A generative angular model of protein structure evolution

    Get PDF
    Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof

    Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm

    Full text link
    Functional endoscopic sinus surgery (FESS) is a surgical procedure used to treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming the preferred choice of treatment due to its minimally invasive nature. However, due to the limited field of view of the endoscope, surgeons rely on navigation systems to guide them within the nasal cavity. State of the art navigation systems report registration accuracy of over 1mm, which is large compared to the size of the nasal airways. We present an anatomically constrained video-CT registration algorithm that incorporates multiple video features. Our algorithm is robust in the presence of outliers. We also test our algorithm on simulated and in-vivo data, and test its accuracy against degrading initializations.Comment: 8 pages, 4 figures, MICCA

    Bayesian protein sequence and structure alignment

    Get PDF
    The structure of a protein is crucial in determining its functionality and is much more conserved than sequence during evolution. A key task in structural biology is to compare protein structures to determine evolutionary relationships, to estimate the function of newly discovered structures and to predict unknown structures. We propose a Bayesian method for protein structure alignment, with the prior on alignments based on functions which penalize ‘gaps’ in the aligned sequences. We show how a broad class of penalty functions fits into this framework, and how the resulting posterior distribution can be efficiently sampled. A commonly used gap penalty function is shown to be a special case, and we propose a new penalty function which alleviates an undesirable feature of the commonly used penalty. We illustrate our method on benchmark data sets and find that it competes well with popular tools from computational biology. Our method has the benefit of being able potentially to explore multiple competing alignments and to quantify their merits probabilistically. The framework naturally enables further information such as amino acid sequence to be included and could be adapted to other situations such as flexible proteins or domain swaps

    Partial distance correlation with methods for dissimilarities

    Get PDF
    Partial distance correlation measures association between two random vectors with respect to a third random vector, analogous to, but more general than (linear) partial correlation. Distance correlation characterizes independence of random vectors in arbitrary dimension. Motivation for the definition is discussed. We introduce a Hilbert space of U-centered distance matrices in which squared distance covariance is the inner product. Simple computation of the sample partial distance correlation and definitions of the population coefficients are presented. Power of the test for zero partial distance correlation is compared with power of the partial correlation test and the partial Mantel test. © Springer International Publishing Switzerland 2016

    Validating protein structure using kernel density estimates

    Get PDF
    Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then, by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles

    No alignment of cattle along geomagnetic field lines found

    Full text link
    This paper presents a study of the body orientation of domestic cattle on free pastures in several European states, based on Google satellite photographs. In sum, 232 herds with 3412 individuals were evaluated. Two independent groups participated in our study and came to the same conclusion that, in contradiction to the recent findings of other researchers, no alignment of the animals and of their herds along geomagnetic field lines could be found. Several possible reasons for this discrepancy should be taken into account: poor quality of Google satellite photographs, difficulties in determining the body axis, selection of herds or animals within herds, lack of blinding in the evaluation, possible subconscious bias, and, most importantly, high sensitivity of the calculated main directions of the Rayleigh vectors to some kind of bias or to some overlooked or ignored confounder. This factor could easily have led to an unsubstantiated positive conclusion about the existence of magnetoreception.Comment: Added electronic supplement with source dat
    • 

    corecore