Search CORE

4,461 research outputs found

Recommended from our members

Reconstructing an ancestral genotype of two hexachlorocyclohexane-degrading Sphingobium species using metagenomic sequence data.

Author: Gilbert Jack A
Khurana Jitendra P
Khurana Paramjit
Kumar Roshan
Lal Rup
Lax Simon
Negi Vivek
Sangwan Naseer
Verma Helianthous
Publication venue: eScholarship, University of California
Publication date: 01/02/2014
Field of study

Over the last 60 years, the use of hexachlorocyclohexane (HCH) as a pesticide has resulted in the production of >4 million tons of HCH waste, which has been dumped in open sinks across the globe. Here, the combination of the genomes of two genetic subspecies (Sphingobium japonicum UT26 and Sphingobium indicum B90A; isolated from two discrete geographical locations, Japan and India, respectively) capable of degrading HCH, with metagenomic data from an HCH dumpsite (∼450 mg HCH per g soil), enabled the reconstruction and validation of the last-common ancestor (LCA) genotype. Mapping the LCA genotype (3128 genes) to the subspecies genomes demonstrated that >20% of the genes in each subspecies were absent in the LCA. This includes two enzymes from the 'upper' HCH degradation pathway, suggesting that the ancestor was unable to degrade HCH isomers, but descendants acquired lin genes by transposon-mediated lateral gene transfer. In addition, anthranilate and homogentisate degradation traits were found to be strain (selectively retained only by UT26) and environment (absent in the LCA and subspecies, but prevalent in the metagenome) specific, respectively. One draft secondary chromosome, two near complete plasmids and eight complete lin transposons were assembled from the metagenomic DNA. Collectively, these results reinforce the elastic nature of the genus Sphingobium, and describe the evolutionary acquisition mechanism of a xenobiotic degradation phenotype in response to environmental pollution. This also demonstrates for the first time the use of metagenomic data in ancestral genotype reconstruction, highlighting its potential to provide significant insight into the development of such phenotypes

eScholarship - University of California

The amphioxus genome and the evolution of the chordate karyotype

Lancelets ('amphioxus') are the modern survivors of an ancient chordate lineage, with a fossil record dating back to the Cambrian period. Here we describe the structure and gene content of the highly polymorphic approx520-megabase genome of the Florida lancelet Branchiostoma floridae, and analyse it in the context of chordate evolution. Whole-genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets and vertebrates), and allow not only reconstruction of the gene complement of the last common chordate ancestor but also partial reconstruction of its genomic organization, as well as a description of two genome-wide duplications and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution

Serveur académique lausannois

Caltech Authors

Sunderland University Institutional Repository

Oxford University Research Archive

National Taiwan University Repository

UNT Digital Library

University of St. Andrews - Pure

Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes

Author: A Delcher
A Smit
AC Darling
B Ma
C Kemena
CN Dewey
Darren P. Martin
DR Bentley
E Ohlebusch
EJ Vallender
FP Preparata
G Bejerano
G Bourque
Hachiya Tsuyoshi
I Tabus
JT Simpson
K Liolios
K Mathee
Kris Popendorf
LB Kish
M Blanchette
M Brudno
M Farach
P Pevzner
Pearson
R Rivest
RA Gibbs
RH Waterston
S Quinlan
S Schwartz
SF Altschul
T Hachiya
T Hubbard
TF Smith
W Miller
Y Osana
Yasubumi Sakakibara
Yasunori Osana
Publication venue: Public Library of Science
Publication date: 24/09/2010
Field of study

BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available under GPL at http://murasaki.sourceforge.net

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

Author: Darling Aaron E.
Mau Bob
Perna Nicole T.
Publication venue
Publication date: 01/01/2009
Field of study

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

arXiv.org e-Print Archive

CiteSeerX

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

Author: Abi-Rached
Akahoshi
Bailey
Benjamin Dickins
Birney
Cadavid
Cathy Riemer
Chen
Chih-Hao Hsu
Chiu
Colobran
Datta
Degenhardt
Dewey
Dufayard
Edwards
Eric D. Green
Fitch
Fitch
Fitch
Giltae Song
Gish
Gonzalez
Goodstadt
Graef
Guethlein
Guethlein
Han
Hardies
Hardison
Hardison
Hardison
Harris
Hie Lim Kim
Hoffmann
Hou
Hou
Hsu
Hsu
Hu
Huerta-Cepas
Jensen
Johnson
Kim
Kristensen
Lee
Levy
Li
Li
Lopez-Vazquez
Louxin Zhang
Margulies
Martin
Matsuya
Mi
Miyata
Muller
Murphy
NISC Comparative Sequencing Program
Opazo
Opazo
Ostlund
Ouzounis
Parham
Pianezza
Rajalingam
Ross C. Hardison
Sambrook
Shilling
Siepel
Smit
Song
Song
Song
Sonnhammer
Su
Tatusov
The ENCODE Project Consortium
Uchiyama
van der Heijden
Vilella
Wang
Wapinski
Waterhouse
Webb Miller
Wilson
Wilson
Woelk
Yu Zhang
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

Crossref

Nottingham Trent Institutional Repository (IRep)

PubMed Central

ScholarBank@NUS

Towards plant pangenomics

Author: Batley Jacqueline
Edwards David
Golicz Agnieszka A.
Publication venue: 'Wiley'
Publication date: 01/04/2016
Field of study

As an increasing number of genome sequences become available for a wide range of species, there is a growing understanding that the genome of a single individual is insufficient to represent the gene diversity within a whole species. Many studies examine the sequence diversity within genes, and this allelic variation is an important source of phenotypic variation which can be selected for by man or nature. However, the significant gene presence/absence variation that has been observed within species and the impact of this variation on traits is only now being studied in detail. The sum of the genes for a species is termed the pangenome, and the determination and characterization of the pangenome is a requirement to understand variation within a species. In this review, we explore the current progress in pangenomics as well as methods and approaches for the characterization of pangenomes for a wide range of plant species

University of Queensland eSpace

Integration of Alignment and Phylogeny in the Whole-Genome Era

Author: Sun Hongtao
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2015
Field of study

With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new sequences provide valuable information on relationships among species, e.g. genome recombination and conservation. One of the principal ways to investigate such information is multiple sequence alignment (MSA). Currently, there is large amount of MSA data on the internet, such as the UCSC genome database, but how to effectively use this information to solve classical and new problems is still an area lacking of exploration. In this thesis, we explored how to use this information in four problems, i.e. sequence orthology search problem, multiple alignment improvement problem, short read mapping problem, and genome rearrangement inference problem. For the first problem, we developed a EM algorithm to iteratively align a query with a multiple alignment database with the information from a phylogeny relating the query species and the species in the multiple alignment. We also infer the query\u27s location in the phylogeny. We showed that by doing alignment and phylogeny inference together, we can improve the accuracies for both problems. For the second problem, we developed an optimization algorithm to iteratively refine the multiple alignment quality. Experiment results showed our algorithm is very stable in term of resulting alignments. The results showed that our method is more accurate than existing methods, i.e. Mafft, Clustal-O, and Mavid, on test data from three sets of species from the UCSC genome database. For the third problem, we developed a model, PhyMap, to align a read to a multiple alignment allowing mismatches and indels. PhyMap computes local alignments of a query sequence against a fixed multiple-genome alignment of closely related species. PhyMap uses a known phylogenetic tree on the species in the multiple alignment to improve the quality of its computed alignments while also estimating the placement of the query on this tree. Both theoretical computation and experiment results show that our model can differentiate between orthologous and paralogous alignments better than other popular short read mapping tools (BWA, BOWTIE and BLAST). For the fourth problem, we gave a simple genome recombination model which can express insertions, deletions, inversions, translocations and inverted translocations on aligned genome segments. We also developed an MCMC algorithm to infer the order of the query segments. We proved that using any Euclidian metrics to measure distance between two sequence orders in the tree optimization goal function will lead to a degenerated solution where the inferred order will be the order of one of the leaf nodes. We also gave a graph-based formulation of the problem which can represent the probability distribution of the order of the query sequences

Washington University St. Louis: Open Scholarship

The aspartic proteinase family of three Phytophthora species

Author: Have A., ten
Kan J.A.L., van
Kay J.
Meijer H.J.G.
Publication venue
Publication date: 01/01/2011
Field of study

Background - Phytophthora species are oomycete plant pathogens with such major social and economic impact that genome sequences have been determined for Phytophthora infestans, P. sojae and P. ramorum. Pepsin-like aspartic proteinases (APs) are produced in a wide variety of species (from bacteria to humans) and contain conserved motifs and landmark residues. APs fulfil critical roles in infectious organisms and their host cells. Annotation of Phytophthora APs would provide invaluable information for studies into their roles in the physiology of Phytophthora species and interactions with their hosts. Results - Genomes of Phytophthora infestans, P. sojae and P. ramorum contain 11-12 genes encoding APs. Nine of the original gene models in the P. infestans database and several in P. sojae and P. ramorum (three and four, respectively) were erroneous. Gene models were corrected on the basis of EST data, consistent positioning of introns between orthologues and conservation of hallmark motifs. Phylogenetic analysis resolved the Phytophthora APs into 5 clades. Of the 12 sub-families, several contained an unconventional architecture, as they either lacked a signal peptide or a propart region. Remarkably, almost all APs are predicted to be membrane-bound. Conclusions - One of the twelve Phytophthora APs is an unprecedented fusion protein with a putative G-protein coupled receptor as the C-terminal partner. The others appear to be related to well-documented enzymes from other species, including a vacuolar enzyme that is encoded in every fungal genome sequenced to date. Unexpectedly, however, the oomycetes were found to have both active and probably-inactive forms of an AP similar to vertebrate BACE, the enzyme responsible for initiating the processing cascade that generates the Aß peptide central to Alzheimer's Disease. The oomycetes also encode enzymes similar to plasmepsin V, a membrane-bound AP that cleaves effector proteins of the malaria parasite Plasmodium falciparum during their translocation into the host red blood cell. Since the translocation of Phytophthora effector proteins is currently a topic of intense research activity, the identification in Phytophthora of potential functional homologues of plasmepsin V would appear worthy of investigation. Indeed, elucidation of the physiological roles of the APs identified here offers areas for future study. The significant revision of gene models and detailed annotation presented here should significantly facilitate experimental design

Wageningen University & Research Publications

The draft genome of the parasitic nematode Trichinella spiralis

Author: Makedonka Mitreva
Douglas P Jasmer
Dante S Zarlenga
Zhengyuan Wang
Sahar Abubucker
John Martin
Christina M Taylor
Yong Yin
Lucinda Fulton
Pat Minx
Shiaw-Pyng Yang
Wesley C Warren
Robert S Fulton
Veena Bhonagiri
Xu Zhang
Kym Hallsworth-Pepin
Sandra W Clifton
James P McCarter
Judith Appleton
Elaine R Mardis
Richard K Wilson
Publication venue: Nature America
Publication date: 01/01/2011
Field of study

Genome evolution studies for the phylum Nematoda have been limited by focusing on comparisons involving Caenorhabditis elegans. We report a draft genome sequence of Trichinella spiralis, a food-borne zoonotic parasite, which is the most common cause of human trichinellosis. This parasitic nematode is an extant member of a clade that diverged early in the evolution of the phylum, enabling identification of archetypical genes and molecular signatures exclusive to nematodes. We sequenced the 64-Mb nuclear genome,which is estimated to contain 15,808 protein-coding genes,at ~35-fold coverage using whole-genome shotgun and hierarchal map–assisted sequencing. Comparative genome analyses support intrachromosomal rearrangements across the phylum, disproportionate numbers of protein family deaths over births in parasitic compared to a non-parasitic nematode and a preponderance of gene-loss and -gain events in nematodes relative to Drosophila melanogaster. This genome sequence and the identified pan-phylum characteristics will contribute to genome evolution studies of Nematoda as well as strategies to combat global parasites of humans, food animals and crops

Publikationer från KTH

Crossref

UGD Academic Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line