239 research outputs found

    Inapproximability of maximal strip recovery

    Get PDF
    In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given dd genomic maps as sequences of gene markers, the objective of \msr{d} is to find dd subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant d2d \ge 2, a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any d2d \ge 2, \msr{d} is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating \msr{d} for all d2d \ge 2. In particular, we show that \msr{d} is NP-hard to approximate within Ω(d/logd)\Omega(d/\log d). From the other direction, we show that the previous 2d-approximation for \msr{d} can be optimized into a polynomial-time algorithm even if dd is not a constant but is part of the input. We then extend our inapproximability results to several related problems including \cmsr{d}, \gapmsr{\delta}{d}, and \gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009) and the Proceedings of the 4th International Frontiers of Algorithmics Workshop (FAW 2010

    Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

    Get PDF
    Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes

    Screening synteny blocks in pairwise genome comparisons through integer programming

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events.</p> <p>Results</p> <p>We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons).</p> <p>Conclusions</p> <p>The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available <url>http://github.com/tanghaibao/quota-alignment</url>. QUOTA-ALIGN program is also integrated as a major component in SynMap <url>http://genomevolution.com/CoGe/SynMap.pl</url>, offering easier access to thousands of genomes for non-programmers.</p

    An Autotetraploid Linkage Map of Rose (Rosa hybrida) Validated Using the Strawberry (Fragaria vesca) Genome Sequence

    Get PDF
    Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map

    SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand

    Get PDF
    The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, evenwhennosuchgeneispresent.Thiscapabilitymeansthatsynteny-basedmethodsarefarmoreeffectivethansequencesimilaritybased methods in identifying true-negatives, a necessity forstudying gene loss and gene transposition. However, the identification of syntenicregionsrequirescomplexanalyseswhichmustberepeatedforpairwisecomparisonsbetweenanytwospecies.Therefore,as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of targetgenomes.SynFindiscapableofreportingper-geneinformation,usefulforresearchersstudyingspecificgenefamilies,aswellas genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc

    SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand

    Get PDF
    The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, evenwhennosuchgeneispresent.Thiscapabilitymeansthatsynteny-basedmethodsarefarmoreeffectivethansequencesimilaritybased methods in identifying true-negatives, a necessity forstudying gene loss and gene transposition. However, the identification of syntenicregionsrequirescomplexanalyseswhichmustberepeatedforpairwisecomparisonsbetweenanytwospecies.Therefore,as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of targetgenomes.SynFindiscapableofreportingper-geneinformation,usefulforresearchersstudyingspecificgenefamilies,aswellas genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc

    The water buffalo: evolutionary, clinical and molecular cytogenetics

    Get PDF
    Although buffalo population is about 1/10 of that of cattle, buffaloes interest a larger human population, especially in the east countries. For this reason, this species is of great economic importance. Two main species of buffalo are found in the world: the Asiatic (water) buffalo (Bubalus bubalis) and the African buffalo (Syncerus caffer). These two different species have both two different sub-species differing in diploid number but interbreeding within the same genus. The water buffalo, especially the river type (2n=50), is the most important one and a summary of the most important cytogenetic findings obtained until now in this species is reported in this paper

    Genome analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea

    Get PDF
    Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38–39 Mb genomes include 11,860–14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared t

    Emergence of novel cephalopod gene regulation and expression through large-scale genome reorganization

    Get PDF
    © The Author(s), 2022. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Schmidbaur, H., Kawaguchi, A., Clarence, T., Fu, X., Hoang, O. P., Zimmermann, B., Ritschard, E. A., Weissenbacher, A., Foster, J. S., Nyholm, S., Bates, P. A., Albertin, C. B., Tanaka, E., & Simakov, O. Emergence of novel cephalopod gene regulation and expression through large-scale genome reorganization. Nature Communications, 13(1), (2022): 2172, https://doi.org/10.1038/s41467-022-29694-7.Coleoid cephalopods (squid, cuttlefish, octopus) have the largest nervous system among invertebrates that together with many lineage-specific morphological traits enables complex behaviors. The genomic basis underlying these innovations remains unknown. Using comparative and functional genomics in the model squid Euprymna scolopes, we reveal the unique genomic, topological, and regulatory organization of cephalopod genomes. We show that coleoid cephalopod genomes have been extensively restructured compared to other animals, leading to the emergence of hundreds of tightly linked and evolutionary unique gene clusters (microsyntenies). Such novel microsyntenies correspond to topological compartments with a distinct regulatory structure and contribute to complex expression patterns. In particular, we identify a set of microsyntenies associated with cephalopod innovations (MACIs) broadly enriched in cephalopod nervous system expression. We posit that the emergence of MACIs was instrumental to cephalopod nervous system evolution and propose that microsyntenic profiling will be central to understanding cephalopod innovations.H.S., O.P.H., E.R., and O.S. were supported by the Austrian Science Fund (FWF) grant P30686-B29. O.S. was supported by Whitman Center Early Career Fellowship (Frank R. Lillie Quasi-Endowment Fund, L. & A. Colwin Summer Research Fellowship, Bell Research Award in Tissue Engineering). H.S. was supported by the short-term grant abroad (KWA) of the University of Vienna. H.S. and O.S. were supported by the University of Chicago/Vienna Strategic Partnership Programme Mobility Grant. A.K. was supported by the JSPS Postdoctoral Fellowship for Overseas Researchers program from Japan. C.B.A. was supported by the Hibbitt Early Career Fellowship. Eggs and paralarvae of E. scolopes were generated in part by support by the NASA Space Biology 80NSSC18K1465 awarded to J.S.F. S.V.N. was supported by the National Science Foundation IOS-1557914. This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC0001003), the UK Medical Research Council (FC001003), and the Wellcome Trust (FC001003)

    The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop improvement

    Get PDF
    Key message: We describe the development and application of the Sorghum QTL Atlas, a high-resolution, open-access research platform to facilitate candidate gene identification across three cereal species, sorghum, maize and rice. Abstract: The mechanisms governing the genetic control of many quantitative traits are only poorly understood and have yet to be fully exploited. Over the last two decades, over a thousand QTL and GWAS studies have been published in the major cereal crops including sorghum, maize and rice. A large body of information has been generated on the genetic basis of quantitative traits, their genomic location, allelic effects and epistatic interactions. However, such QTL information has not been widely applied by cereal improvement programs and genetic researchers worldwide. In part this is due to the heterogeneous nature of QTL studies which leads QTL reliability variation from study to study. Using approaches to adjust the QTL confidence interval, this platform provides access to the most updated sorghum QTL information than any database available, spanning 23 years of research since 1995. The QTL database provides information on the predicted gene models underlying the QTL CI, across all sorghum genome assembly gene sets and maize and rice genome assemblies and also provides information on the diversity of the underlying genes and information on signatures of selection in sorghum. The resulting high-resolution, open-access research platform facilitates candidate gene identification across 3 cereal species, sorghum, maize and rice. Using a number of trait examples, we demonstrate the power and resolution of the resource to facilitate comparative genomics approaches to provide a bridge between genomics and applied breeding. © 2018, Springer-Verlag GmbH Germany, part of Springer Nature
    corecore