113 research outputs found

    Neighborhoods of trees in circular orderings

    Get PDF
    In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species X is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set X. To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (nni), the subtree prune and regraft (spr) and the tree bisection and reconnection (tbr) operations. We show that the size of such a neighborhood of a binary tree associated with the nni operation is independent of the tree’s topology, but that this is not the case for the spr and tbr operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the spr and tbr operations and characterize those trees for which these bounds are attained

    Applying unmixing to gene expression data for tumor phylogeny inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.</p> <p>Results</p> <p>The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.</p> <p>Conclusions</p> <p>Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.</p

    On the accuracy of language trees

    Get PDF
    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.Comment: 36 pages, 14 figure

    A Differentiation-Based Phylogeny of Cancer Subtypes

    Get PDF
    Histopathological classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. In this paper, we introduce a novel computational algorithm to rank tumor subtypes according to the dissimilarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia, breast cancer and liposarcoma subtypes and then apply it to a broader group of sarcomas. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors

    Scaling properties of protein family phylogenies

    Get PDF
    One of the classical questions in evolutionary biology is how evolutionary processes are coupled at the gene and species level. With this motivation, we compare the topological properties (mainly the depth scaling, as a characterization of balance) of a large set of protein phylogenies with a set of species phylogenies. The comparative analysis shows that both sets of phylogenies share remarkably similar scaling behavior, suggesting the universality of branching rules and of the evolutionary processes that drive biological diversification from gene to species level. In order to explain such generality, we propose a simple model which allows us to estimate the proportion of evolvability/robustness needed to approximate the scaling behavior observed in the phylogenies, highlighting the relevance of the robustness of a biological system (species or protein) in the scaling properties of the phylogenetic trees. Thus, the rules that govern the incapability of a biological system to diversify are equally relevant both at the gene and at the species level.Comment: Replaced with final published versio

    A Mathematical Methodology for Determining the Temporal Order of Pathway Alterations Arising during Gliomagenesis

    Get PDF
    Human cancer is caused by the accumulation of genetic alterations in cells. Of special importance are changes that occur early during malignant transformation because they may result in oncogene addiction and thus represent promising targets for therapeutic intervention. We have previously described a computational approach, called Retracing the Evolutionary Steps in Cancer (RESIC), to determine the temporal sequence of genetic alterations during tumorigenesis from cross-sectional genomic data of tumors at their fully transformed stage. Since alterations within a set of genes belonging to a particular signaling pathway may have similar or equivalent effects, we applied a pathway-based systems biology approach to the RESIC methodology. This method was used to determine whether alterations of specific pathways develop early or late during malignant transformation. When applied to primary glioblastoma (GBM) copy number data from The Cancer Genome Atlas (TCGA) project, RESIC identified a temporal order of pathway alterations consistent with the order of events in secondary GBMs. We then further subdivided the samples into the four main GBM subtypes and determined the relative contributions of each subtype to the overall results: we found that the overall ordering applied for the proneural subtype but differed for mesenchymal samples. The temporal sequence of events could not be identified for neural and classical subtypes, possibly due to a limited number of samples. Moreover, for samples of the proneural subtype, we detected two distinct temporal sequences of events: (i) RAS pathway activation was followed by TP53 inactivation and finally PI3K2 activation, and (ii) RAS activation preceded only AKT activation. This extension of the RESIC methodology provides an evolutionary mathematical approach to identify the temporal sequence of pathway changes driving tumorigenesis and may be useful in guiding the understanding of signaling rearrangements in cancer development

    Expression of Distal-less, dachshund, and optomotor blind in Neanthes arenaceodentata (Annelida, Nereididae) does not support homology of appendage-forming mechanisms across the Bilateria

    Get PDF
    The similarity in the genetic regulation of arthropod and vertebrate appendage formation has been interpreted as the product of a plesiomorphic gene network that was primitively involved in bilaterian appendage development and co-opted to build appendages (in modern phyla) that are not historically related as structures. Data from lophotrochozoans are needed to clarify the pervasiveness of plesiomorphic appendage forming mechanisms. We assayed the expression of three arthropod and vertebrate limb gene orthologs, Distal-less (Dll), dachshund (dac), and optomotor blind (omb), in direct-developing juveniles of the polychaete Neanthes arenaceodentata. Parapodial Dll expression marks premorphogenetic notopodia and neuropodia, becoming restricted to the bases of notopodial cirri and to ventral portions of neuropodia. In outgrowing cephalic appendages, Dll activity is primarily restricted to proximal domains. Dll expression is also prominent in the brain. dac expression occurs in the brain, nerve cord ganglia, a pair of pharyngeal ganglia, presumed interneurons linking a pair of segmental nerves, and in newly differentiating mesoderm. Domains of omb expression include the brain, nerve cord ganglia, one pair of anterior cirri, presumed precursors of dorsal musculature, and the same pharyngeal ganglia and presumed interneurons that express dac. Contrary to their roles in outgrowing arthropod and vertebrate appendages, Dll, dac, and omb lack comparable expression in Neanthes appendages, implying independent evolution of annelid appendage development. We infer that parapodia and arthropodia are not structurally or mechanistically homologous (but their primordia might be), that Dll’s ancestral bilaterian function was in sensory and central nervous system differentiation, and that locomotory appendages possibly evolved from sensory outgrowths

    Combining Substrate Specificity Analysis with Support Vector Classifiers Reveals Feruloyl Esterase as a Phylogenetically Informative Protein Group

    Get PDF
    Our understanding of how fungi evolved to develop a variety of ecological niches, is limited but of fundamental biological importance. Specifically, the evolution of enzymes affects how well species can adapt to new environmental conditions. Feruloyl esterases (FAEs) are enzymes able to hydrolyze the ester bonds linking ferulic acid to plant cell wall polysaccharides. The diversity of substrate specificities found in the FAE family shows that this family is old enough to have experienced the emergence and loss of many activities. In this study we evaluate the relative activity of FAEs against a variety of model substrates as a novel predictive tool for Ascomycota taxonomic classification. Our approach consists of two analytical steps; (1) an initial unsupervised analysis to cluster the FAEs substrate specificity data which were generated by cultivation of 34 Ascomycota strains and then an analysis of the produced enzyme cocktail against 10 substituted cinnamate and phenylalkanoate methyl esters, (2) a second, supervised analysis for training a predictor built on these substrate activities. By applying both linear and non-linear models we were able to correctly predict the taxonomic Class (∼86% correct classification), Order (∼88% correct classification) and Family (∼88% correct classification) that the 34 Ascomycota belong to, using the activity profiles of the FAEs. The good correlation with the FAEs substrate specificities that we have defined via our phylogenetic analysis not only suggests that FAEs are phylogenetically informative proteins but it is also a considerable step towards improved FAEs functional prediction.published_or_final_versio
    • …
    corecore