63,971 research outputs found

    Statistical significance of normalized global alignment

    Get PDF
    The comparison of homologous proteins from different species is a first step toward a function assignment and a reconstruction of the species evolution. Though local alignment is mostly used for this purpose, global alignment is important for constructing multiple alignments or phylogenetic trees. However, statistical significance of global alignments is not completely clear, lacking a specific statistical model to describe alignments or depending on computationally expensive methods like Z-score. Recently we presented a normalized global alignment, defined as the best compromise between global alignment cost and length, and showed that this new technique led to better classification results than Z-score at a much lower computational cost. However, it is necessary to analyze the statistical significance of the normalized global alignment in order to be considered a completely functional algorithm for protein alignment. Experiments with unrelated proteins extracted from the SCOP ASTRAL database showed that normalized global alignment scores can be fitted to a log-normal distribution. This fact, obtained without any theoretical support, can be used to derive statistical significance of normalized global alignments. Results are summarized in a table with fitted parameters for different scoring schemes

    A new analysis of quasar polarisation alignments

    Full text link
    We propose a new method to analyse the alignment of optical polarisation vectors from quasars. This method leads to a definition of intrinsic preferred axes and to a determination of the probability pσp^{\sigma} that the distribution of polarisation directions is random. This probability is found to be as low as 0.003% for one of the regions of redshift.Comment: 20 pages, 9 figure

    Polarization alignments of radio quasars in JVAS/CLASS surveys

    Full text link
    We test the hypothesis that the polarization vectors of flat-spectrum radio sources (FSRS) in the JVAS/CLASS 8.4-GHz surveys are randomly oriented on the sky. The sample with robust polarization measurements is made of 41554155 objects and redshift information is known for 15311531 of them. We performed two statistical analyses: one in two dimensions and the other in three dimensions when distance is available. We find significant large-scale alignments of polarization vectors for samples containing only quasars (QSO) among the varieties of FSRS's. While these correlations prove difficult to explain either by a physical effect or by biases in the dataset, the fact that the QSO's which have significantly aligned polarization vectors are found in regions of the sky where optical polarization alignments were previously found is striking.Comment: 13 pages, 9 figures, submitted to MNRA

    Statistical Power, the Bispectrum and the Search for Non-Gaussianity in the CMB Anisotropy

    Full text link
    We use simulated maps of the cosmic microwave background anisotropy to quantify the ability of different statistical tests to discriminate between Gaussian and non-Gaussian models. Despite the central limit theorem on large angular scales, both the genus and extrema correlation are able to discriminate between Gaussian models and a semi-analytic texture model selected as a physically motivated non-Gaussian model. When run on the COBE 4-year CMB maps, both tests prefer the Gaussian model. Although the bispectrum has comparable statistical power when computed on the full sky, once a Galactic cut is imposed on the data the bispectrum loses the ability to discriminate between models. Off-diagonal elements of the bispectrum are comparable to the diagonal elements for the non-Gaussian texture model and must be included to obtain maximum statistical power.Comment: Accepted for publication in ApJ; 20 pages, 6 figures, uses AASTeX v5.

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    Large Scale Cosmological Anomalies and Inhomogeneous Dark Energy

    Full text link
    A wide range of large scale observations hint towards possible modifications on the standard cosmological model which is based on a homogeneous and isotropic universe with a small cosmological constant and matter. These observations, also known as "cosmic anomalies" include unexpected Cosmic Microwave Background perturbations on large angular scales, large dipolar peculiar velocity flows of galaxies ("bulk flows"), the measurement of inhomogenous values of the fine structure constant on cosmological scales ("alpha dipole") and other effects. The presence of the observational anomalies could either be a large statistical fluctuation in the context of {\lcdm} or it could indicate a non-trivial departure from the cosmological principle on Hubble scales. Such a departure is very much constrained by cosmological observations for matter. For dark energy however there are no significant observational constraints for Hubble scale inhomogeneities. In this brief review I discuss some of the theoretical models that can naturally lead to inhomogeneous dark energy, their observational constraints and their potential to explain the large scale cosmic anomalies.Comment: 42 pages, 15 figures, Invited Review published in 'Galaxies' at http://www.mdpi.com/2075-4434/2/1/2

    An optimized TOPS+ comparison method for enhanced TOPS models

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun
    • …
    corecore