40,071 research outputs found

    Testing the Accuracy of Eukaryotic Phylogenetic Profiles for Prediction of Biological Function

    Get PDF
    A phylogenetic profile captures the pattern of gene gain and loss throughout evolutionary time. Proteins that interact directly or indirectly within the cell to perform a biological function will often co-evolve, and this co-evolution should be well reflected within their phylogenetic profiles. Thus similar phylogenetic profiles are commonly used for grouping proteins into functional groups. However, it remains unclear how the size and content of the phylogenetic profile impacts the ability to predict function, particularly in Eukaryotes. Here we developed a straightforward approach to address this question by constructing a complete set of phylogenetic profiles for 31 fully sequenced Eukaryotes. Using Gene Ontology as our gold standard, we compared the accuracy of functional predictions made by a comprehensive array of permutations on the complete set of genomes. Our permutations showed that phylogenetic profiles containing between 25 and 31 Eukaryotic genomes performed equally well and significantly better than all other permuted genome sets, with one exception: we uncovered a core of group of 18 genomes that achieved statistically identical accuracy. This core group contained genomes from each branch of the eukaryotic phylogeny, but also contained several groups of closely related organisms, suggesting that a balance between phylogenetic breadth and depth may improve our ability to use Eukaryotic specific phylogenetic profiles for functional annotations

    MinMax-Profiles: A Unifying View of Common Intervals, Nested Common Intervals and Conserved Intervals of K Permutations

    Full text link
    Common intervals of K permutations over the same set of n elements were firstly investigated by T. Uno and M.Yagiura (Algorithmica, 26:290:309, 2000), who proposed an efficient algorithm to find common intervals when K=2. Several particular classes of intervals have been defined since then, e.g. conserved intervals and nested common intervals, with applications mainly in genome comparison. Each such class, including common intervals, led to the development of a specific algorithmic approach for K=2, and - except for nested common intervals - for its extension to an arbitrary K. In this paper, we propose a common and efficient algorithmic framework for finding different types of common intervals in a set P of K permutations, with arbitrary K. Our generic algorithm is based on a global representation of the information stored in P, called the MinMax-profile of P, and an efficient data structure, called an LR-stack, that we introduce here. We show that common intervals (and their subclasses of irreducible common intervals and same-sign common intervals), nested common intervals (and their subclass of maximal nested common intervals) as well as conserved intervals (and their subclass of irreducible conserved intervals) may be obtained by appropriately setting the parameters of our algorithm in each case. All the resulting algorithms run in O(Kn+N)-time and need O(n) additional space, where N is the number of solutions. The algorithms for nested common intervals and maximal nested common intervals are new for K>2, in the sense that no other algorithm has been given so far to solve the problem with the same complexity, or better. The other algorithms are as efficient as the best known algorithms.Comment: 25 pages, 2 figure

    A geometric interpretation of the permutation pp-value and its application in eQTL studies

    Get PDF
    Permutation pp-values have been widely used to assess the significance of linkage or association in genetic studies. However, the application in large-scale studies is hindered by a heavy computational burden. We propose a geometric interpretation of permutation pp-values, and based on this geometric interpretation, we develop an efficient permutation pp-value estimation method in the context of regression with binary predictors. An application to a study of gene expression quantitative trait loci (eQTL) shows that our method provides reliable estimates of permutation pp-values while requiring less than 5% of the computational time compared with direct permutations. In fact, our method takes a constant time to estimate permutation pp-values, no matter how small the pp-value. Our method enables a study of the relationship between nominal pp-values and permutation pp-values in a wide range, and provides a geometric perspective on the effective number of independent tests.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS298 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A permutation code preserving a double Eulerian bistatistic

    Full text link
    Visontai conjectured in 2013 that the joint distribution of ascent and distinct nonzero value numbers on the set of subexcedant sequences is the same as that of descent and inverse descent numbers on the set of permutations. This conjecture has been proved by Aas in 2014, and the generating function of the corresponding bistatistics is the double Eulerian polynomial. Among the techniques used by Aas are the M\"obius inversion formula and isomorphism of labeled rooted trees. In this paper we define a permutation code (that is, a bijection between permutations and subexcedant sequences) and show the more general result that two 55-tuples of set-valued statistics on the set of permutations and on the set of subexcedant sequences, respectively, are equidistributed. In particular, these results give a bijective proof of Visontai's conjecture

    Computing the Rank Profile Matrix

    Get PDF
    The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm

    Cardinality of Rauzy classes

    Get PDF
    Rauzy classes define a partition of the set of irreducible (or indecomposable) permutations. They were defined by G. Rauzy as part of an induction algorithm for interval exchange transformations. In this article we prove an explicit formula for the cardinality of all Rauzy classes.Comment: 43 pages, 22 figure
    corecore