Search CORE

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Author: Adkins Joshua N
Ansong Charles
Cannon William R
Jensen Jeffrey L
Kobold Markus A
McCue Lee Ann
Payne Samuel H
Peterson Elena S
Schrimpe-Rutledge Alexandra C
Walker Hyunjoo
Webb Samantha R
Webb-Robertson Bobbie-Jo M
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (<it>Yersinia pestis </it>Pestoides F and <it>Synechococcus </it>sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at <url>https://www.biopilot.org/docs/Software/Vespa.php</url>.</p

Springer - Publisher Connector

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Author: A Ben-Hur
A Kumar
AG Murzin
AR Shah
B Liu
BJ Webb-Robertson
BJ Webb-Robertson
BJ Webb-Robertson
Bobbie-Jo M Webb-Robertson
C Leslie
Christopher S Oehmen
CS Leslie
H Rangwala
H Saigo
I Jung
I Melvin
I Melvin
J Weston
Kyle G Ratuiste
L Liao
NH Anderson
QW Dong
R Kuang
S Hochreiter
SF Altschul
SF Altschul
T Damoulas
T Lingner
TF Smith
WS Noble
WS Noble
Y Hou
Y Hou
Y Yang
Y Yuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

Springer - Publisher Connector

Leucine Biosynthesis Is Involved in Regulating High Lipid Accumulation in Yarrowia lipolytica

Author: Bobbie-Jo Webb-Robertson
Carrie D. Nicora
Eduard J. Kerkhoven
Jens Nielsen
Richard D. Smith
Samuel O. Purvine
Sang Yup Lee
Scott E. Baker
Siwei Wei
Thomas L. Fillmore
Thomas O. Metz
Young-Mo Kim
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2017
Field of study

The yeast Yarrowia lipolytica is a potent accumulator of lipids, and lipogenesis in this organism can be influenced by a variety of factors, such as genetics and environmental conditions. Using a multifactorial study, we elucidated the effects of both genetic and environmental factors on regulation of lipogenesis in Y. lipolytica and identified how two opposite regulatory states both result in lipid accumulation. This study involved comparison of a strain overexpressing diacylglycerol acyltransferase (DGA1) with a control strain grown under either nitrogen or carbon limitation conditions. A strong correlation was observed between the responses on the transcript and protein levels. Combination of DGA1 overexpression with nitrogen limitation resulted in a high level of lipid accumulation accompanied by downregulation of several amino acid biosynthetic pathways, including that of leucine in particular, and these changes were further correlated with a decrease in metabolic fluxes. This downregulation was supported by the measured decrease in the level of 2-isopropylmalate, an intermediate of leucine biosynthesis. Combining the multi-omics data with putative transcription factor binding motifs uncovered a contradictory role for TORC1 in controlling lipid accumulation, likely mediated through 2-isopropylmalate and a Leu3-like transcription factor

Online Research Database In Technology

Chalmers Research

Chalmers Publication Library

Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure

Author: Bobbie-Jo Webb-Robertson
C Bystroff
C Bystroff
C Bystroff
C Bystroff
C Bystroff
Christopher Bystroff
JA Hanley
KF Han
KT Simons
M Vingron
P Fariselli
Q Yi
U Gobel
U Hobohm
Y Fujitsuka
Y Zhang
Publication venue: BioMed Central
Publication date: 01/10/2008
Field of study

Abstract Background Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field. Results I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures. Conclusion Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at <url>http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php</url>.</p

Springer - Publisher Connector

An Approach for Assessing the Signature Quality of Various Chemical Assays when Predicting the Culture Media Used to Grow Microorganisms

Author: Anderson Richard M.
Corley Courtney D.
Holmes Aimee E.
Kreuzer Helen W.
Sego Landon H.
Tardiff Mark F.
Unwin Stephen D.
Webb-Robertson Bobbie-Jo M.
Weimar Mark R.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/2013
Field of study

We demonstrate an approach for assessing the quality of a signature system designed to predict the culture medium used to grow a microorganism. The system was comprised of four chemical assays designed to identify various ingredients that could be used to produce the culture medium. The analytical measurements resulting from any combination of these four assays can be used in a Bayesian network to predict the probabilities that the microorganism was grown using one of eleven culture media. We evaluated combinations of the signature system by removing one or more of the assays from the Bayes network. We measured and compared the quality of the various Bayes nets in terms of fidelity, cost, risk, and utility, a method we refer to as Signature Quality Metric