79 research outputs found

    Predicting protein linkages in bacteria: Which method is best depends on task

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p

    Improving protein function prediction methods with integrated literature data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.</p> <p>Results</p> <p>We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.</p> <p>Conclusion</p> <p>Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

    Biomedical Discovery Acceleration, with Applications to Craniofacial Development

    Get PDF
    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

    Inactivation of Pmel Alters Melanosome Shape But Has Only a Subtle Effect on Visible Pigmentation

    Get PDF
    PMEL is an amyloidogenic protein that appears to be exclusively expressed in pigment cells and forms intralumenal fibrils within early stage melanosomes upon which eumelanins deposit in later stages. PMEL is well conserved among vertebrates, and allelic variants in several species are associated with reduced levels of eumelanin in epidermal tissues. However, in most of these cases it is not clear whether the allelic variants reflect gain-of-function or loss-of-function, and no complete PMEL loss-of-function has been reported in a mammal. Here, we have created a mouse line in which the Pmel gene has been inactivated (Pmel−/−). These mice are fully viable, fertile, and display no obvious developmental defects. Melanosomes within Pmel−/− melanocytes are spherical in contrast to the oblong shape present in wild-type animals. This feature was documented in primary cultures of skin-derived melanocytes as well as in retinal pigment epithelium cells and in uveal melanocytes. Inactivation of Pmel has only a mild effect on the coat color phenotype in four different genetic backgrounds, with the clearest effect in mice also carrying the brown/Tyrp1 mutation. This phenotype, which is similar to that observed with the spontaneous silver mutation in mice, strongly suggests that other previously described alleles in vertebrates with more striking effects on pigmentation are dominant-negative mutations. Despite a mild effect on visible pigmentation, inactivation of Pmel led to a substantial reduction in eumelanin content in hair, which demonstrates that PMEL has a critical role for maintaining efficient epidermal pigmentation

    Targeting Huntington’s disease through histone deacetylases

    Get PDF
    Huntington’s disease (HD) is a debilitating neurodegenerative condition with significant burdens on both patient and healthcare costs. Despite extensive research, treatment options for patients with this condition remain limited. Aberrant post-translational modification (PTM) of proteins is emerging as an important element in the pathogenesis of HD. These PTMs include acetylation, phosphorylation, methylation, sumoylation and ubiquitination. Several families of proteins are involved with the regulation of these PTMs. In this review, I discuss the current evidence linking aberrant PTMs and/or aberrant regulation of the cellular machinery regulating these PTMs to HD pathogenesis. Finally, I discuss the evidence suggesting that pharmacologically targeting one of these protein families the histone deacetylases may be of potential therapeutic benefit in the treatment of HD

    Aminoguanidine inhibits reactive oxygen species formation, lipid peroxidation and oxidant-induced apoptosis

    No full text

    Understanding the social and economic factors affecting adverse events in an active theater of war: A neural network approach

    No full text
    AHFE International Conference on Cross-Cultural Decision Making (CCDM) -- JUL 17-21, 2017 -- Los Angeles, CAWOS: 000451449700020This study focused on the application of artificial neural networks (ANNs) to model the effect of infrastructure development projects on terrorism security events in Afghanistan. The dataset include adverse events and infrastructure aid activity in Afghanistan from 2001 to 2010. Several ANN models were generated and investigated for Afghanistan and its seven regions. In addition to a soft-computing approach, a multiple linear regression (MLR) analysis was also performed to evaluate whether or not the ANN approach showed superior predictive performance compared to a classical statistical approach. According to the performance comparison, the developed ANN model provided better prediction accuracy with respect to the MLR approach. The results obtained from this analysis demonstrate that ANNs can predict the occurrence of adverse events according to economic infrastructure aid activity data.Office of Naval Research (ONR) [1052339]The authors are grateful for the support of the Office of Naval Research (ONR) under Grant No. 1052339, Complex Systems Engineering for Rapid Computational Socio-Cultural Network Analysis, and the helpful guidance of ONR Program Management and the technical team
    corecore