135 research outputs found
Bayesian integration of isotope ratio for geographic sourcing of castor beans
pre-printRecent years have seen an increase in the forensic interest associated with the poison ricin, which is extracted from the seeds of the Ricinus communis plant. Both light element (C, N, O, and H) and strontium (Sr) isotope ratios have previously been used to associate organic material with geographic regions of origin. We present a Bayesian integration methodology that can more accurately predict the region of origin for a castor bean than individual models developed independently for light element stable isotopes or Sr isotope ratios. Our results demonstrate a clear improvement in the ability to correctly classify regions based on the integrated model with a class accuracy of 60.9 Ā± 2.1% versus 55.9 Ā± 2.1% and 40.2 Ā± 1.8% for the light element and strontium (Sr) isotope ratios, respectively. In addition, we show graphically the strengths and weaknesses of each dataset in respect to class prediction and how the integration of these datasets strengthens the overall model
Recommended from our members
Computational Proteomics: High-throughput Analysis for Systems Biology
High-throughput (HTP) proteomics is a rapidly developing field that offers the global profiling of proteins from a biological system. The HTP technological advances are fueling a revolution in biology, enabling analyses at the scales of entire systems (e.g., whole cells, tumors, or environmental communities). However, simply identifying the proteins in a cell is insufficient for understanding the underlying complexity and operating mechanisms of the overall system. Systems level investigations are relying more and more on computational analyses, especially in the field of proteomics generating large-scale global data
Plasma Biomarkers for Detecting Hodgkin's Lymphoma in HIV Patients
The lifespan of people with human immunodeficiency virus (HIV) infection has increased as a result of effective antiretroviral therapy, and the incidences of the AIDS-defining cancers, non-Hodgkin's lymphoma and Kaposi sarcoma, have declined. Even so, HIV-infected individuals are now at greater risk of other cancers, including Hodgkin's lymphoma (HL). To identify candidate biomarkers for the early detection of HL, we undertook an accurate mass and elution time tag proteomics analysis of individual plasma samples from either HIV-infected patients without HL (controls; nā=ā14) and from HIV-infected patient samples with HL (nā=ā22). This analysis identified 60 proteins that were statistically (p<0.05) altered and at least 1.5-fold different between the two groups. At least three of these proteins have previously been reported to be altered in the blood of HL patients that were not known to be HIV positive, suggesting that these markers may be broadly useful for detecting HL. Ingenuity Pathway Analysis software identified āinflammatory responseā and ācancerā as the top two biological functions associated with these proteins. Overall, this study validated three plasma proteins as candidate biomarkers for detecting HL, and identified 57 novel candidate biomarkers that remain to be validated. The relationship of these novel candidate biomarkers with cancer and inflammation suggests that they are truly associated with HL and therefore may be useful for the early detection of this cancer in susceptible populations
VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data
<p>Abstract</p> <p>Background</p> <p>The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.</p> <p>Results</p> <p>VESPA is a desktop Javaā¢ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (<it>Yersinia pestis </it>Pestoides F and <it>Synechococcus </it>sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.</p> <p>Conclusions</p> <p>VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at <url>https://www.biopilot.org/docs/Software/Vespa.php</url>.</p
Physicochemical property distributions for accurate and rapid pairwise protein homology detection
<p>Abstract</p> <p>Background</p> <p>The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</p> <p>Results</p> <p>We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.</p> <p>Conclusions</p> <p>A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p
Leucine Biosynthesis Is Involved in Regulating High Lipid Accumulation in Yarrowia lipolytica
The yeast Yarrowia lipolytica is a potent accumulator of lipids, and lipogenesis in this organism can be influenced by a variety of factors, such as genetics and environmental conditions. Using a multifactorial study, we elucidated the effects of both genetic and environmental factors on regulation of lipogenesis in Y.Ā lipolytica and identified how two opposite regulatory states both result in lipid accumulation. This study involved comparison of a strain overexpressing diacylglycerol acyltransferase (DGA1) with a control strain grown under either nitrogen or carbon limitation conditions. A strong correlation was observed between the responses on the transcript and protein levels. Combination of DGA1 overexpression with nitrogen limitation resulted in a high level of lipid accumulation accompanied by downregulation of several amino acid biosynthetic pathways, including that of leucine in particular, and these changes were further correlated with a decrease in metabolic fluxes. This downregulation was supported by the measured decrease in the level of 2-isopropylmalate, an intermediate of leucine biosynthesis. Combining the multi-omics data with putative transcription factor binding motifs uncovered a contradictory role for TORC1 in controlling lipid accumulation, likely mediated through 2-isopropylmalate and a Leu3-like transcription factor
Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure
<p>Abstract</p> <p>Background</p> <p>Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field.</p> <p>Results</p> <p>I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures.</p> <p>Conclusion</p> <p>Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at <url>http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php</url>.</p
An Approach for Assessing the Signature Quality of Various Chemical Assays when Predicting the Culture Media Used to Grow Microorganisms
We demonstrate an approach for assessing the quality of a signature system designed to predict the culture medium used to grow a microorganism. The system was comprised of four chemical assays designed to identify various ingredients that could be used to produce the culture medium. The analytical measurements resulting from any combination of these four assays can be used in a Bayesian network to predict the probabilities that the microorganism was grown using one of eleven culture media. We evaluated combinations of the signature system by removing one or more of the assays from the Bayes network. We measured and compared the quality of the various Bayes nets in terms of fidelity, cost, risk, and utility, a method we refer to as Signature Quality Metric
- ā¦