Search CORE

7 research outputs found

Drawing explicit phylogenetic networks and their integration into SplitsTree

Author: Huson Daniel H
Kloepper Tobias H
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background SplitsTree provides a framework for the calculation of phylogenetic trees and networks. It contains a wide variety of methods for the import/export, calculation and visualization of phylogenetic information. The software is developed in Java and implements a command line tool as well as a graphical user interface. Results In this article, we present solutions to two important problems in the field of phylogenetic networks. The first problem is the visualization of explicit phylogenetic networks. To solve this, we present a modified version of the equal angle algorithm that naturally integrates reticulations into the layout process and thus leads to an appealing visualization of these networks. The second problem is the availability of explicit phylogenetic network methods for the general user. To advance the usage of explicit phylogenetic networks by biologists further, we present an extension to the SplitsTree framework that integrates these networks. By addressing these two problems, SplitsTree is among the first programs that incorporates <it>implicit </it>and <it>explicit </it>network methods together with standard phylogenetic tree methods in a graphical user interface environment. Conclusion In this article, we presented an extension of SplitsTree 4 that incorporates explicit phylogenetic networks. The extension provides a set of core classes to handle explicit phylogenetic networks and a visualization of these networks.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Algorithms For Haplotype Inference And Block Partitioning

Author: Vijaya Satya Ravi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2006
Field of study

The completion of the human genome project in 2003 paved the way for studies to better understand and catalog variation in the human genome. The International HapMap Project was started in 2002 with the aim of identifying genetic variation in the human genome and studying the distribution of genetic variation across populations of individuals. The information collected by the HapMap project will enable researchers in associating genetic variations with phenotypic variations. Single Nucleotide Polymorphisms (SNPs) are loci in the genome where two individuals differ in a single base. It is estimated that there are approximately ten million SNPs in the human genome. These ten million SNPS are not completely independent of each other - blocks (contiguous regions) of neighboring SNPs on the same chromosome are inherited together. The pattern of SNPs on a block of the chromosome is called a haplotype. Each block might contain a large number of SNPs, but a small subset of these SNPs are sufficient to uniquely dentify each haplotype in the block. The haplotype map or HapMap is a map of these haplotype blocks. Haplotypes, rather than individual SNP alleles are expected to effect a disease phenotype. The human genome is diploid, meaning that in each cell there are two copies of each chromosome - i.e., each individual has two haplotypes in any region of the chromosome. With the current technology, the cost associated with empirically collecting haplotype data is prohibitively expensive. Therefore, the un-ordered bi-allelic genotype data is collected experimentally. The genotype data gives the two alleles in each SNP locus in an individual, but does not give information about which allele is on which copy of the chromosome. This necessitates computational techniques for inferring haplotypes from genotype data. This computational problem is called the haplotype inference problem. Many statistical approaches have been developed for the haplotype inference problem. Some of these statistical methods have been shown to be reasonably accurate on real genotype data. However, these techniques are very computation-intensive. With the international HapMap project collecting information from nearly 10 million SNPs, and with association studies involving thousands of individuals being undertaken, there is a need for more efficient methods for haplotype inference. This dissertation is an effort to develop efficient perfect phylogeny based combinatorial algorithms for haplotype inference. The perfect phylogeny haplotyping (PPH) problem is to derive a set of haplotypes for a given set of genotypes with the condition that the haplotypes describe a perfect phylogeny. The perfect phylogeny approach to haplotype inference is applicable to the human genome due to the block structure of the human genome. An important contribution of this dissertation is an optimal O(nm) time algorithm for the PPH problem, where n is the number of genotypes and m is the number of SNPs involved. The complexity of the earlier algorithms for this problem was O(nm^2). The O(nm) complexity was achieved by applying some transformations on the input data and by making use of the FlexTree data structure that has been developed as part of this dissertation work, which represents all the possible PPH solution for a given set of genotypes. Real genotype data does not always admit a perfect phylogeny, even within a block of the human genome. Therefore, it is necessary to extend the perfect phylogeny approach to accommodate deviations from perfect phylogeny. Deviations from perfect phylogeny might occur because of recombination events and repeated or back mutations (also referred to as homoplasy events). Another contribution of this dissertation is a set of fixed-parameter tractable algorithms for constructing near-perfect phylogenies with homoplasy events. For the problem of constructing a near perfect phylogeny with q homoplasy events, the algorithm presented here takes O(nm^2+m^(n+m)) time. Empirical analysis on simulated data shows that this algorithm produces more accurate results than PHASE (a popular haplotype inference program), while being approximately 1000 times faster than phase. Another important problem while dealing real genotype or haplotype data is the presence of missing entries. The Incomplete Perfect Phylogeny (IPP) problem is to construct a perfect phylogeny on a set of haplotypes with missing entries. The Incomplete Perfect Phylogeny Haplotyping (IPPH) problem is to construct a perfect phylogeny on a set of genotypes with missing entries. Both the IPP and IPPH problems have been shown to be NP-hard. The earlier approaches for both of these problems dealt with restricted versions of the problem, where the root is either available or can be trivially re-constructed from the data, or certain assumptions were made about the data. We make some novel observations about these problems, and present efficient algorithms for unrestricted versions of these problems. The algorithms have worst-case exponential time complexity, but have been shown to be very fast on practical instances of the problem

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Algorithms, haplotypes and phylogenetic networks

Author: Iersel L.J.J. (Leo) van
Publication venue
Publication date: 01/01/2009
Field of study

Preface. Before I started my PhD in computational biology in 2005, I had never even heard of this term. Now, almost four years later, I think I have some idea of what is meant by it. One of the goals of my PhD was to explore different topics within computational biology and to see where the biggest opportunities for discrete/combinatorial mathematicians could be found. Roughly speaking, the first two years of my PhD I focussed mainly on problems related to haplotyping and genome rearrangements and the last two years on phylogenetic networks. I must say I really enjoyed learning so much about both mathematics and biology. It was especially amazing to learn how exact, theoretical mathematics can be used to solve complex, practical problems from biology. The topics I studied clearly show how extremely useful mathematics can be for biology. But I also learned that there are many more interesting topics in computational biology than the ones that I could study so far. The number of opportunities for discrete mathematicians is absolutely immense. I did not include my studies on genome rearrangements in this thesis, because my most interesting results [Hur07a; Hur07b] are not directly related to biology. This work is nevertheless interesting to mathematicians and I recommend them to read it. I can certainly conclude that also in this field there is a vast number of opportunities for mathematicians and that the topic genome rearrangements provides numerous beautiful mathematical problems. I could never have written this thesis without a great amount of help from many different people. I want to thank my supervisors Leen Stougie and Judith Keijsper for guiding me, for helping me, for correcting my mistakes, for supplying ideas and for the enjoyable time I had while working with them. I also want to thank the Dutch BSIK/BRICKS project for funding my research and Gerhard Woeginger for giving me the opportunity to work in his group and being my second promotor. I want to thank Jens Stoye and Julia Zakotnik for the work we did together and for the great time I had in Bielefeld. I want to thank Ferry Hagen and Teun Boekhout for helping me to make my work relevant for "real" biology. I also want to thank John Tromp, Rudi Cilibrasi, Cor Hurkens and all others I worked with during my PhD. I want to thank Erik de Vink and Mike Steel for reading and commenting my thesis. I want to thank my colleagues from the Combinatorial Optimisation group at the Technische Universiteit Eindhoven for the pleasant working conditions and the fun things we did besides work. I especially want to thank Matthias Mnich, not only a great colleague but also a good friend, for all his ideas, his humour and our good and fruitful cooperation. I also want to thank Steven Kelk. I must say that I was very lucky to work with Steven during my PhD. He introduced me to problems, had an enormous amount of ideas, found the critical mistakes in my proofs and made my PhD a success both in terms of results and in terms of enjoying work. Finally, I want to thank Conno Hendriksen and Bas Heideveld for assisting me during my PhD defence and I want to thank them and all my other friends and family for helping me with everything in my life but research

CWI's Institutional Repository

Pure OAI Repository

Recommended from our members

Topics in Signal Processing: applications in genomics and genetics

Author: Elmas Abdulkadir
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

The information in genomic or genetic data is influenced by various complex processes and appropriate mathematical modeling is required for studying the underlying processes and the data. This dissertation focuses on the formulation of mathematical models for certain problems in genomics and genetics studies and the development of algorithms for proposing efficient solutions. A Bayesian approach for the transcription factor (TF) motif discovery is examined and the extensions are proposed to deal with many interdependent parameters of the TF-DNA binding. The problem is described by statistical terms and a sequential Monte Carlo sampling method is employed for the estimation of unknown parameters. In particular, a class-based resampling approach is applied for the accurate estimation of a set of intrinsic properties of the DNA binding sites. Through statistical analysis of the gene expressions, a motif-based computational approach is developed for the inference of novel regulatory networks in a given bacterial genome. To deal with high false-discovery rates in the genome-wide TF binding predictions, the discriminative learning approaches are examined in the context of sequence classification, and a novel mathematical model is introduced to the family of kernel-based Support Vector Machines classifiers. Furthermore, the problem of haplotype phasing is examined based on the genetic data obtained from cost-effective genotyping technologies. Based on the identification and augmentation of a small and relatively more informative genotype set, a sparse dictionary selection algorithm is developed to infer the haplotype pairs for the sampled population. In a relevant context, to detect redundant information in the single nucleotide polymorphism (SNP) sites, the problem of representative (tag) SNP selection is introduced. An information theoretic heuristic is designed for the accurate selection of tag SNPs that capture the genetic diversity in a large sample set from multiple populations. The method is based on a multi-locus mutual information measure, reflecting a biological principle in the population genetics that is linkage disequilibrium

Columbia University Academic Commons

Combinatorial Methods for the Analysis of Related Genomic Sequences

Author: Bernardini G. (Giulia)
Publication venue
Publication date: 01/01/2020
Field of study

CWI's Institutional Repository

Modelling the Seasonal Growth of the Brown Seaweed Fucus Vesiculosus in the Kiel Outdoor Benthocosms

Author: Eggert A.
Graiff A.
Karsten U.
Radtke H.
Wahl Martin
Publication venue: 'Informa UK Limited'
Publication date: 01/08/2017
Field of study

Warming and acidification of the oceans as a consequence of increasing CO2-concentrations occur globally. In mesocosm experiments, the single and combined impact of elevated seawater temperature and pCO2 (1,100 ppm) on the brown alga Fucus vesiculosus together with its ssociated community (epiphytes and mesograzers) was studied in four consecutive experiments (from April 2013 to April 2014). Based on these experiments, a numerical boxmodel simulating the seasonal growth of F. vesiculosus in the Kiel Outdoor Benthocosms (KOBs) was developed. Nitrogen and carbon cycling in the KOBs were considered and relevant physiological and ecological processes were implemented. To run simulations under present and global change scenarios (e.g. warming, ocean acidification) the model was forced with atmospheric and hydrographic data of the Kiel fjord. DIN and DIC concentration in the water and Fucus growth as carbon and nitrogen increase were explicitly modelled. For instance, the following processes were implemented: (1) Storage of carbon and nitrogen assimilates by Fucus, leading to a temporal decoupling of assimilation and growth. (2) Shading effects of epiphytes. (3) Grazing by Idotea, Gammarus and Littorina on both Fucus and epiphytes, but with species-specific rates and preferences. At present, the model is a suitable scientific tool capable of integrating our knowledge about macroalgal processes, their growth and productivity in coastal areas. It further facilitates the communication of complex knowledge to lay persons. Ultimately, the development of a predictive model, which can be coupled to a 3D-high resolution western Baltic Sea model, is anticipated. This will allow observations on the consequences of global change for the wellbeing and distribution of F. vesiculosus in the western Baltic Sea. Understanding responses of macroalgae and of the associated community is important because changing global temperatures and elevated CO2 may affect the ecological role of Fucus as primary producer, carbon sink, water purifier, and ecosystem engineer in the coastal ecosystem of the Baltic Sea

OceanRep

PROTECTING OUR CROPS - APPROACHES FOR PLANT PARASITIC NEMATODE CONTROL

Author: Hasegawa Koichi
Palomares-Rius Juan Emilio
Shahid Siddique
Vicente Cláudia S. L.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

sem resum

Repositório Científico da Universidade de Évora