Search CORE

82 research outputs found

Expert Assertions Through Community Annotation Jamborees

Author: Derek Harkins
Elisabet Caler
Granger Sutton
Hernan Lorenzi
Lauren Brinkac
Mathangi Thiagarajan
Ramana Madupu
Publication venue
Publication date: 24/04/2009
Field of study

Although there is significant optimism that community involvement can drive genome curation, results to date are disappointing. The Human Genome and Saccharomyces Genome Databases both tried community annotation experiments and few community contributions were obtained. JCVI’s own early experiences with community curation were also largely unsuccessful. Although community curation tools were publicly available on JCVI web resources and much effort was made by JCVI personnel to advertise these resources, little curation was actually submitted. Starting in late 2007, JCVI’s model for community curation changed. Instead of simply providing curation tools on websites and advertising their utility at meetings and conferences, JCVI instituted a community curation jamboree model. 

Annotation jamborees are an excellent form of outreach to the community. JCVI’s experience conducting jamborees is highly successful, demonstrating that jamborees are effective tools for incorporating expert annotation data into existing genome submissions, updating existing annotation, tagging annotation with updated experimental references and providing the community with opportunities to become familiar with JCVI’s annotation procedures and curation tools. Jamborees provide a means to directly interact with the community and integrate their research expertise into genomic data sets. Jamboree participants are encouraged to provide their expert input by focusing on their genes and gene families of interest, particularly those with supporting experimental evidence. Through JCVI’s NIAID Bioinformatics Resource Center, Pathema ("http://pathema.jcvi.org":http://pathema.jcvi.org), JCVI hosted two annotation jamborees incorporating expert annotation into Entamoeba and Burkholderia genome projects. These jamborees resulted in curation of 1,565 functional assignments, 3,499 Gene Ontology terms, 129 gene structures, and 296 experimental references for 11 genome projects representative of the Pathema data set. Researchers who contributed to annotation at these jamborees are being submitted as contributing authors on annotation update submissions made to GenBank for those organisms. Additionally, the annotation associated with the submission is recognized as part of community curation efforts and collaboration, and all updates and contributions are reflected on the Pathema web resource.

The networking and personal communication that occurs throughout a jamboree facilitates a forum for research and data exchange, solicitation of user feedback and the establishment of new community collaborations. Although integrating and updating annotation data is important, it is our experience that the interactions that occur and collaborations that are formed are the most beneficial long-term results of jamboree efforts. Collaborations we established as a direct result of jamboree activity include continued community annotation, custom data analyses and general informatics support not otherwise solicited by the researcher. For the jamborees JCVI recently hosted, we established successful collaborations with four researchers who continued to provide curation from their own institute

Crossref

Nature Precedings

METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics

Author: Barbara A. Methé
David M. Tanenbaum
Douglas B. Rusch
Huson
Johannes Goll
Kelvin Li
Kristiansson
Markowitz
Mathangi Thiagarajan
Meyer
Shibu Yooseph
Tanenbaum
White
Publication venue: Oxford University Press
Publication date
Field of study

Summary: JCVI Metagenomics Reports (METAREP) is a Web 2.0 application designed to help scientists analyze and compare annotated metagenomics datasets. It utilizes Solr/Lucene, a high-performance scalable search engine, to quickly query large data collections. Furthermore, users can use its SQL-like query syntax to filter and refine datasets. METAREP provides graphical summaries for top taxonomic and functional classifications as well as a GO, NCBI Taxonomy and KEGG Pathway Browser. Users can compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Advanced comparative features comprise statistical tests as well as multidimensional scaling, heatmap and hierarchical clustering plots. Summaries can be exported as tab-delimited files, publication quality plots in PDF format. A data management layer allows collaborative data analysis and result sharing

Crossref

PubMed Central

Recommended from our members

Reconstruction of a Bacterial Genome from DNA Cassettes

Author: Allen Andrew
Allen Lisa Zeigler
Dupont Christopher
Friedman Robert
Glass John
Sheahan Laura
Thiagarajan Mathangi
Venter J. Craig
Yooseph Shibu
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

This basic research program comprised two major areas: (1) acquisition and analysis of marine microbial metagenomic data and development of genomic analysis tools for broad, external community use; (2) development of a minimal bacterial genome. Our Marine Metagenomic Diversity effort generated and analyzed shotgun sequencing data from microbial communities sampled from over 250 sites around the world. About 40% of the 26 Gbp of sequence data has been made publicly available to date with a complete release anticipated in six months. Our results and those mining the deposited data have revealed a vast diversity of genes coding for critical metabolic processes whose phylogenetic and geographic distributions will enable a deeper understanding of carbon and nutrient cycling, microbial ecology, and rapid rate evolutionary processes such as horizontal gene transfer by viruses and plasmids. A global assembly of the generated dataset resulted in a massive set (5Gbp) of genome fragments that provide context to the majority of the generated data that originated from uncultivated organisms. Our Synthetic Biology team has made significant progress towards the goal of synthesizing a minimal mycoplasma genome that will have all of the machinery for independent life. This project, once completed, will provide fundamentally new knowledge about requirements for microbial life and help to lay a basic research foundation for developing microbiological approaches to bioenergy

UNT Digital Library

Foregut microbiome in development of esophageal adenocarcinoma

Author: Aaron Tenney
Carlos W. Nossa
Daniel Brami
Eoin L. Brodie
Erika Gerz
Fritz Francois
Gary L. Andersen
Indresh K. Singh
Karen E. Nelson
Les Foster
Liying Yang
Manolito Torralba
Mathangi Thiagarajan
Mengling Liu
Michael Poles
Monika Bihan
Morris Traube
Navjeet Singh
Pinak Shah
Shibu Yooseph
Stuart M. Brown
Sukhleen Bedi
Tamasha Parsons
Todd Z. DeSantis
William E. Oberdorf
Yu Chen
Yu-Hui Rogers
Zhiheng Pei
Publication venue
Publication date: 18/10/2010
Field of study

Esophageal adenocarcinoma (EA), the type of cancer linked to heartburn due to gastroesophageal reflux diseases (GERD), has increased six fold in the past 30 years. This cannot currently be explained by the usual environmental or by host genetic factors. EA is the end result of a sequence of GERD-related diseases, preceded by reflux esophagitis (RE) and Barrett’s esophagus (BE). Preliminary studies by Pei and colleagues at NYU on elderly male veterans identified two types of microbiotas in the esophagus. Patients who carry the type II microbiota are >15 fold likely to have esophagitis and BE than those harboring the type I microbiota. In a small scale study, we also found that 3 of 3 cases of EA harbored the type II biota. The findings have opened a new approach to understanding the recent surge in the incidence of EA. 

Our long-term goal is to identify the cause of GERD sequence. The hypothesis to be tested is that changes in the foregut microbiome are associated with EA and its precursors, RE and BE in GERD sequence. We will conduct a case control study to demonstrate the microbiome disease association in every stage of GERD sequence, as well as analyze the trend in changes in the microbiome along disease progression toward EA, by two specific aims. Aim 1 is to conduct a comprehensive population survey of the foregut microbiome and demonstrate its association with GERD sequence. Furthermore, spatial relationship between the esophageal microbiota and upstream (mouth) and downstream (stomach) foregut microbiotas as well as temporal stability of the microbiome-disease association will also be examined. Aim 2 is to define the distal esophageal metagenome and demonstrate its association with GERD sequence. Detailed analyses will include pathway-disease and gene-disease associations. Archaea, fungi and viruses, if identified, also will be correlated with the diseases. A significant association between the foregut microbiome and GERD sequence, if demonstrated, will be the first step for eventually testing whether an abnormal microbiome is required for the development of the sequence of phenotypic changes toward EA. If EA and its precursors represent a microecological disease, treating the cause of GERD might become possible, for example, by normalizing the microbiota through use of antibiotics, probiotics, or prebiotics. Causative therapy of GERD could prevent its progression and reverse the current trend of increasing incidence of EA

Crossref

Nature Precedings

Recommended from our members

Microbial Community Function and Biomarker Discovery in the Human Microbiome

Author: Abubucker Sahar
Cantarel Brandi L
Garrett Wendy S.
Gevers Dirk
Goll Johannes
Henrissat Bernard
Huttenhower Curtis
Izard Jacques Georges
Kelley Scott T
Methé Barbara
Mitreva Makedonka
Rodriguez-Mueller Beltran
Schloss Patrick D
Schubert Alyxandria M
Segata Nicola
Thiagarajan Mathangi
Waldron Levi D.
White Owen
Zucker Jeremy Daniel Hofeld
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2013
Field of study

Harvard University - DASH

Springer - Publisher Connector

Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

Author: Cassidy-Hanley Donna M
Collins Kathleen
Couvillion Mary T
Coyne Robert S
Eisen Jonathan A
Garg Jyoti
Haas Brian J
Hamilton Eileen P
Jones Kristie M
Lee Suzanne R
Liu Yifan
Methé Barbara A
Orias Eduardo
Pearlman Ronald E
Smith Joshua J
Tallon Luke J
Thiagarajan Mathangi
Wiley Emily A
Wortman Jennifer R
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background <it>Tetrahymena thermophila</it>, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of <it>Tetrahymena</it>'s coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of <it>Tetrahymena thermophila</it>. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Frozen tissue coring and layered histological analysis improves cell type-specific proteogenomic characterization of pancreatic adenocarcinoma

Author: Bathe Oliver F.
Chen Lijun
Dou Yongchao
Hostetter Galen
Jewell Scott
Li Qing K.
Newton Chelsea
Omenn Gilbert S.
Robles Ana I.
Savage Sara R.
Thiagarajan Mathangi
Wang Yuefan
Zhang Bing
Zhang Hui
Publication venue
Publication date: 04/02/2024
Field of study

Abstract Background Omics characterization of pancreatic adenocarcinoma tissue is complicated by the highly heterogeneous and mixed populations of cells. We evaluate the feasibility and potential benefit of using a coring method to enrich specific regions from bulk tissue and then perform proteogenomic analyses. Methods We used the Biopsy Trifecta Extraction (BioTExt) technique to isolate cores of epithelial-enriched and stroma-enriched tissue from pancreatic tumor and adjacent tissue blocks. Histology was assessed at multiple depths throughout each core. DNA sequencing, RNA sequencing, and proteomics were performed on the cored and bulk tissue samples. Supervised and unsupervised analyses were performed based on integrated molecular and histology data. Results Tissue cores had mixed cell composition at varying depths throughout. Average cell type percentages assessed by histology throughout the core were better associated with KRAS variant allele frequencies than standard histology assessment of the cut surface. Clustering based on serial histology data separated the cores into three groups with enrichment of neoplastic epithelium, stroma, and acinar cells, respectively. Using this classification, tumor overexpressed proteins identified in bulk tissue analysis were assigned into epithelial- or stroma-specific categories, which revealed novel epithelial-specific tumor overexpressed proteins. Conclusions Our study demonstrates the feasibility of multi-omics data generation from tissue cores, the necessity of interval H&E stains in serial histology sections, and the utility of coring to improve analysis over bulk tissue data

PRISM: University of Calgary Digital Repository

A Case Study for Large-Scale Human Microbiome Analysis Using JCVI’s Metagenomics Reports (METAREP)

Author: A Datta
A Helenius
Barbara A. Methé
BE Suzek
BH Hassan
CM Szymanski
CM Szymanski
Consortium Human Microbiome Jumpstart Reference Strains
Curtis Huttenhower
DA Fell
DB Rusch
DL Mager
DM Tanenbaum
E Cardenas
EK Costello
EM Glass
FE Dewhirst
J Arnau
J Goll
J Orvis
J Qin
JA Aas
JA Gilbert
JC Venter
JG Caporaso
JL Martínez
Johannes Goll
JR White
M Arumugam
M Ashburner
M Hess
M Kanehisa
M Leibig
Mathangi Thiagarajan
Michael Edward Zwick
PN Bertin
R Caspi
S Pepke
S Yooseph
Sahar Abubucker
Shibu Yooseph
SR Eddy
W Buckel
X Feng
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP) as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP) (http://nihroadmap.nih.gov). Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN) pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest open source code are available at http://www.jcvi.org/hmp-metarep

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Pathema: a clade-specific bioinformatics resource center for pathogen research

Pathema (http://pathema.jcvi.org) is one of the eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infectious Disease (NIAID) designed to serve as a core resource for the bio-defense and infectious disease research community. Pathema strives to support basic research and accelerate scientific progress for understanding, detecting, diagnosing and treating an established set of six target NIAID Category A–C pathogens: Category A priority pathogens; Bacillus anthracis and Clostridium botulinum, and Category B priority pathogens; Burkholderia mallei, Burkholderia pseudomallei, Clostridium perfringens and Entamoeba histolytica. Each target pathogen is represented in one of four distinct clade-specific Pathema web resources and underlying databases developed to target the specific data and analysis needs of each scientific community. All publicly available complete genome projects of phylogenetically related organisms are also represented, providing a comprehensive collection of organisms for comparative analyses. Pathema facilitates the scientific exploration of genomic and related data through its integration with web-based analysis tools, customized to obtain, display, and compute results relevant to ongoing pathogen research. Pathema serves the bio-defense and infectious disease research community by disseminating data resulting from pathogen genome sequencing projects and providing access to the results of inter-genomic comparisons for these organisms

Crossref

PubMed Central

Influence of nutrients and currents on the genomic composition of microbes across an upwelling mosaic

Metagenomic data sets were generated from samples collected along a coastal to open ocean transect between Southern California Bight and California Current waters during a seasonal upwelling event, providing an opportunity to examine the impact of episodic pulses of cold nutrient-rich water into surface ocean microbial communities. The data set consists of ∼5.8 million predicted proteins across seven sites, from three different size classes: 0.1–0.8, 0.8–3.0 and 3.0–200.0 μm. Taxonomic and metabolic analyses suggest that sequences from the 0.1–0.8 μm size class correlated with their position along the upwelling mosaic. However, taxonomic profiles of bacteria from the larger size classes (0.8–200 μm) were less constrained by habitat and characterized by an increase in Cyanobacteria, Bacteroidetes, Flavobacteria and double-stranded DNA viral sequences. Functional annotation of transmembrane proteins indicate that sites comprised of organisms with small genomes have an enrichment of transporters with substrate specificities for amino acids, iron and cadmium, whereas organisms with larger genomes have a higher percentage of transporters for ammonium and potassium. Eukaryotic-type glutamine synthetase (GS) II proteins were identified and taxonomically classified as viral, most closely related to the GSII in Mimivirus, suggesting that marine Mimivirus-like particles may have played a role in the transfer of GSII gene functions. Additionally, a Planctomycete bloom was sampled from one upwelling site providing a rare opportunity to assess the genomic composition of a marine Planctomycete population. The significant correlations observed between genomic properties, community structure and nutrient availability provide insights into habitat-driven dynamics among oligotrophic versus upwelled marine waters adjoining each other spatially

Crossref

PubMed Central

Macquarie University ResearchOnline