Search CORE

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Author: Adkins Joshua N
Ansong Charles
Cannon William R
Jensen Jeffrey L
Kobold Markus A
McCue Lee Ann
Payne Samuel H
Peterson Elena S
Schrimpe-Rutledge Alexandra C
Walker Hyunjoo
Webb Samantha R
Webb-Robertson Bobbie-Jo M
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (<it>Yersinia pestis </it>Pestoides F and <it>Synechococcus </it>sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at <url>https://www.biopilot.org/docs/Software/Vespa.php</url>.</p

Springer - Publisher Connector

Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure

Author: Bobbie-Jo Webb-Robertson
C Bystroff
C Bystroff
C Bystroff
C Bystroff
C Bystroff
Christopher Bystroff
JA Hanley
KF Han
KT Simons
M Vingron
P Fariselli
Q Yi
U Gobel
U Hobohm
Y Fujitsuka
Y Zhang
Publication venue: BioMed Central
Publication date: 01/10/2008
Field of study

Abstract Background Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field. Results I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures. Conclusion Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at <url>http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php</url>.</p

Springer - Publisher Connector

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Author: A Ben-Hur
A Kumar
AG Murzin
AR Shah
B Liu
BJ Webb-Robertson
BJ Webb-Robertson
BJ Webb-Robertson
Bobbie-Jo M Webb-Robertson
C Leslie
Christopher S Oehmen
CS Leslie
H Rangwala
H Saigo
I Jung
I Melvin
I Melvin
J Weston
Kyle G Ratuiste
L Liao
NH Anderson
QW Dong
R Kuang
S Hochreiter
SF Altschul
SF Altschul
T Damoulas
T Lingner
TF Smith
WS Noble
WS Noble
Y Hou
Y Hou
Y Yang
Y Yuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

Springer - Publisher Connector

An Approach for Assessing the Signature Quality of Various Chemical Assays when Predicting the Culture Media Used to Grow Microorganisms

Author: Anderson Richard M.
Corley Courtney D.
Holmes Aimee E.
Kreuzer Helen W.
Sego Landon H.
Tardiff Mark F.
Unwin Stephen D.
Webb-Robertson Bobbie-Jo M.
Weimar Mark R.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/2013
Field of study

We demonstrate an approach for assessing the quality of a signature system designed to predict the culture medium used to grow a microorganism. The system was comprised of four chemical assays designed to identify various ingredients that could be used to produce the culture medium. The analytical measurements resulting from any combination of these four assays can be used in a Bayesian network to predict the probabilities that the microorganism was grown using one of eleven culture media. We evaluated combinations of the signature system by removing one or more of the assays from the Bayes network. We measured and compared the quality of the various Bayes nets in terms of fidelity, cost, risk, and utility, a method we refer to as Signature Quality Metric

UNT Digital Library

Improved quality control processing of peptide-centric LC-MS proteomics data

Author: Amy C. Sims
Anderson
Barnett
Bobbie-Jo M. Webb-Robertson
Bukhman
Caroni
Cho
Croux
Daly
Dixon
Filzmoser
Grubbs
Hawkins
Hoaglin
Jain
Jaitly
Joel G. Pounds
Jon M. Jacobs
Karpievitch
Katrina M. Waters
Kauffmann
Kemmeren
Lee
Li
MacCoss
Mahalanobis
Melissa M. Matzke
Metz
Monroe
Oberg
Oberg
Piening
Ralph S. Baric
Rocke
Rocke
Rudnick
Schulz-Trieglaff
Smith
Stead
Thomas O. Metz
Webb-Robertson
Wilson
Xia
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values

Carolina Digital Repository

MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses

Author: Amy C. Sims
Anil K. Shukla
Athena A. Schepmoes
Bobbie-Jo Webb-Robertson
Carrie D. Nicora
Ernesto S. Nakayasu
Jennifer E. Kyle
Jon M. Jacobs
Kristin E. Burnum-Johnson
Melissa M. Matzke
Nicholas Chia
Ralph S. Baric
Richard D. Smith
Rosalie K. Chu
Thomas O. Metz
Young-Mo Kim
Publication venue
Publication date: 01/01/2016
Field of study

ABSTRACT Integrative multi-omics analyses can empower more effective investigation and complete understanding of complex biological systems. Despite recent advances in a range of omics analyses, multi-omic measurements of the same sample are still challenging and current methods have not been well evaluated in terms of reproducibility and broad applicability. Here we adapted a solvent-based method, widely applied for extracting lipids and metabolites, to add proteomics to mass spectrometry-based multi-omics measurements. The m etabolite, p rotein, and l ipid ex traction (MPLEx) protocol proved to be robust and applicable to a diverse set of sample types, including cell cultures, microbial communities, and tissues. To illustrate the utility of this protocol, an integrative multi-omics analysis was performed using a lung epithelial cell line infected with Middle East respiratory syndrome coronavirus, which showed the impact of this virus on the host glycolytic pathway and also suggested a role for lipids during infection. The MPLEx method is a simple, fast, and robust protocol that can be applied for integrative multi-omic measurements from diverse sample types (e.g., environmental, in vitro , and clinical). IMPORTANCE In systems biology studies, the integration of multiple omics measurements (i.e., genomics, transcriptomics, proteomics, metabolomics, and lipidomics) has been shown to provide a more complete and informative view of biological pathways. Thus, the prospect of extracting different types of molecules (e.g., DNAs, RNAs, proteins, and metabolites) and performing multiple omics measurements on single samples is very attractive, but such studies are challenging due to the fact that the extraction conditions differ according to the molecule type. Here, we adapted an organic solvent-based extraction method that demonstrated broad applicability and robustness, which enabled comprehensive proteomics, metabolomics, and lipidomics analyses from the same sample