Search CORE

83 research outputs found

Microbial co-habitation and lateral gene transfer: what transposases can tell us

Author: Hooper Sean D
Kyrpides Nikos C
Mavromatis Konstantinos
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Interactions between microbial communities are revealed using a network of lateral gene transfer events

Crossref

Springer - Publisher Connector

PubMed Central

UNT Digital Library

Gene Context Analysis in the Integrated Microbial Genomes (IMG) Data Management System

Author: Ken Chu
Konstantinos Mavromatis
Mikael Rørdam Andersen
Natalia Ivanova
Nikos C. Kyrpides
Sean D. Hooper
Victor M. Markowitz
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UNT Digital Library

Systematic Association of Genes to Phenotypes by Genome and Literature Mining

Author: Andrade Miguel A
Bork Peer
Doerks Tobias
Hooper Sean D
Jensen Lars J
Kaczanowski Szymon
Korbel Jan O
Perez-Iratxeta Carolina
Publication venue: Public Library of Science
Publication date: 05/04/2005
Field of study

One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

FigShare

STRING: known and predicted protein–protein associations, integrated and transferred across organisms

Author: Bork Peer
Foglierini Mathilde
Hooper Sean D.
Huynen Martijn A.
Jensen Lars J.
Jouffre Nelly
Krupp Markus
Snel Berend
von Mering Christian
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

A full description of a protein's function requires knowledge of all partner proteins with which it specifically associates. From a functional perspective, ‘association’ can mean direct physical binding, but can also mean indirect interaction such as participation in the same metabolic pathway or cellular process. Currently, information about protein association is scattered over a wide variety of resources and model organisms. STRING aims to simplify access to this information by providing a comprehensive, yet quality-controlled collection of protein–protein associations for a large number of organisms. The associations are derived from high-throughput experimental data, from the mining of databases and literature, and from predictions based on genomic context analysis. STRING integrates and ranks these associations by benchmarking them against a common reference set, and presents evidence in a consistent and intuitive web interface. Importantly, the associations are extended beyond the organism in which they were originally described, by automatic transfer to orthologous protein pairs in other organisms, where applicable. STRING currently holds 730 000 proteins in 180 fully sequenced organisms, and is available at http://string.embl.de/

Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis

Author: Bork Peer
Boué Stephanie
Furlong Eileen EM
Ghanim Murad
Hooper Sean D
Jensen Lars J
Krause Roland
Mason Christopher E
White Kevin P
Publication venue: Nature Publishing Group
Publication date: 01/01/2007
Field of study

Time-series analysis of whole-genome expression data during Drosophila melanogaster development indicates that up to 86% of its genes change their relative transcript level during embryogenesis. By applying conservative filtering criteria and requiring ‘sharp' transcript changes, we identified 1534 maternal genes, 792 transient zygotic genes, and 1053 genes whose transcript levels increase during embryogenesis. Each of these three categories is dominated by groups of genes where all transcript levels increase and/or decrease at similar times, suggesting a common mode of regulation. For example, 34% of the transiently expressed genes fall into three groups, with increased transcript levels between 2.5–12, 11–20, and 15–20 h of development, respectively. We highlight common and distinctive functional features of these expression groups and identify a coupling between downregulation of transcript levels and targeted protein degradation. By mapping the groups to the protein network, we also predict and experimentally confirm new functional associations

CiteSeerX

PubMed Central

Publications at Bielefeld University

MDC Repository

MPG.PuRe

The Complete Multipartite Genome Sequence of Cupriavidus necator JMP134, a Versatile Pollutant Degrader

Author: Alla Lapidus
Athanasios Lykidis
Bernardo González
Danilo Pérez-Pantoja
Iain J. Anderson
Kostantinos Mavromatis
Natalia N. Ivanova
Nikos C. Kyrpides
Niyaz Ahmed
Sean D. Hooper
Susan Lucas
Thomas Ledger
Publication venue: Public Library of Science
Publication date: 01/02/2010
Field of study

BACKGROUND: Cupriavidus necator JMP134 is a Gram-negative beta-proteobacterium able to grow on a variety of aromatic and chloroaromatic compounds as its sole carbon and energy source. METHODOLOGY/PRINCIPAL FINDINGS: Its genome consists of four replicons (two chromosomes and two plasmids) containing a total of 6631 protein coding genes. Comparative analysis identified 1910 core genes common to the four genomes compared (C. necator JMP134, C. necator H16, C. metallidurans CH34, R. solanacearum GMI1000). Although secondary chromosomes found in the Cupriavidus, Ralstonia, and Burkholderia lineages are all derived from plasmids, analyses of the plasmid partition proteins located on those chromosomes indicate that different plasmids gave rise to the secondary chromosomes in each lineage. The C. necator JMP134 genome contains 300 genes putatively involved in the catabolism of aromatic compounds and encodes most of the central ring-cleavage pathways. This strain also shows additional metabolic capabilities towards alicyclic compounds and the potential for catabolism of almost all proteinogenic amino acids. This remarkable catabolic potential seems to be sustained by a high degree of genetic redundancy, most probably enabling this catabolically versatile bacterium with different levels of metabolic responses and alternative regulation necessary to cope with a challenging environment. From the comparison of Cupriavidus genomes, it is possible to state that a broad metabolic capability is a general trait for Cupriavidus genus, however certain specialization towards a nutritional niche (xenobiotics degradation, chemolithoautotrophy or symbiotic nitrogen fixation) seems to be shaped mostly by the acquisition of "specialized" plasmids. CONCLUSIONS/SIGNIFICANCE: The availability of the complete genome sequence for C. necator JMP134 provides the groundwork for further elucidation of the mechanisms and regulation of chloroaromatic compound biodegradation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UNT Digital Library

Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach

Author: Altschul
Amrita Pati
Angelov
Bateman
Brewer
Brochier-Armanet
Broder
Choi
Daniel Dalevi
Elkins
Fontecave
Forterre
Hetzer
Iain J. Anderson
Konstantinos Mavromatis
Kumagai
Liolios
Makarova
Makarova
Marchler-Bauer
Markowitz
Nikos C. Kyrpides
Ogasahara
Paccanaro
Pellegrini
Rigden
Sean D. Hooper
Tatusov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online

Crossref

PubMed Central

UNT Digital Library

Estimating DNA coverage and abundance in metagenomes using a gamma approximation

Author: Amrita Pati
Angly
Brass
Breitbart
Chao
Chao
Chao
Chevreux
Dalevi
Daniel Dalevi
Dropkin
el-Shaarawi
Heath
Izsák
Kalyuzhnaya
Konstantinos Mavromatis
Kunin
Lander
Mavromatis
Natalia N. Ivanova
Nikos C. Kyrpides
Quail
Quince
Raes
Richter
Schloss
Sean D. Hooper
Simon
Stein
Tringe
Venter
Warnecke
Wendl
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets

Crossref

PubMed Central

eScholarship - University of California

UNT Digital Library

Structural Alterations from Multiple Displacement Amplification of a Human Genome Revealed by Mate-Pair Sequencing

Author: AJ Iafrate
C Tanabe
CA Klein
Christian Tellgren-Roth
FB Dean
H Telenius
Jonathan Mangion
JR Nelson
Jörg D. Hoheisel
KJ McKernan
L Lovmar
L Zhang
Liqun He
Magnus Rosenlund
PJ Campbell
PJ Stephens
RS Lasken
S Volik
Sean D. Hooper
T Sjöblom
Tobias Sjöblom
Xiang Jiao
Yutao Fu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Comprehensive identification of the acquired mutations that cause common cancers will require genomic analyses of large sets of tumor samples. Typically, the tissue material available from tumor specimens is limited, which creates a demand for accurate template amplification. We therefore evaluated whether phi29-mediated whole genome amplification introduces false positive structural mutations by massive mate-pair sequencing of a normal human genome before and after such amplification. Multiple displacement amplification led to a decrease in clone coverage and an increase by two orders of magnitude in the prevalence of inversions, but did not increase the prevalence of translocations. While multiple strand displacement amplification may find uses in translocation analyses, it is likely that alternative amplification strategies need to be developed to meet the demands of cancer genomics

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line