76 research outputs found

    The topology of the bacterial co-conserved protein network and its implications for predicting protein function

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in <it>E. coli </it>K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.</p> <p>Results</p> <p>Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.</p> <p>Conclusion</p> <p>Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.</p

    Improving protein function prediction methods with integrated literature data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.</p> <p>Results</p> <p>We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.</p> <p>Conclusion</p> <p>Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

    Predicting protein linkages in bacteria: Which method is best depends on task

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p

    Sputum is a surrogate for bronchoalveolar lavage for monitoring Mycobacterium tuberculosis transcriptional profiles in TB patients

    Get PDF
    SummaryPathogen-targeted transcriptional profiling in human sputum may elucidate the physiologic state of Mycobacterium tuberculosis (M. tuberculosis) during infection and treatment. However, whether M. tuberculosis transcription in sputum recapitulates transcription in the lung is uncertain. We therefore compared M. tuberculosis transcription in human sputum and bronchoalveolar lavage (BAL) samples from 11 HIV-negative South African patients with pulmonary tuberculosis. We additionally compared these clinical samples with inΒ vitro log phase aerobic growth and hypoxic non-replicating persistence (NRP-2). Of 2179Β M. tuberculosis transcripts assayed in sputum and BAL via multiplex RT-PCR, 194 (8.9%) had a p-value <0.05, but none were significant after correction for multiple testing. Categorical enrichment analysis indicated that expression of the hypoxia-responsive DosR regulon was higher in BAL than in sputum. M. tuberculosis transcription in BAL and sputum was distinct from both aerobic growth and NRP-2, with a range of 396–1020 transcripts significantly differentially expressed after multiple testing correction. Collectively, our results indicate that M. tuberculosis transcription in sputum approximates M. tuberculosis transcription in the lung. Minor differences between M. tuberculosis transcription in BAL and sputum suggested lower oxygen concentrations or higher nitric oxide concentrations in BAL. M. tuberculosis-targeted transcriptional profiling of sputa may be a powerful tool for understanding M. tuberculosis pathogenesis and monitoring treatment responses inΒ vivo

    Use of intervention mapping to adapt a health behavior change intervention for endometrial cancer survivors: The shape-up following cancer treatment program

    Get PDF
    Background: About 80% of endometrial cancer survivors (ECS) are overweight or obese and have sedentary behaviors. Lifestyle behavior interventions are promising for improving dietary and physical activity behaviors, but the constructs associated with their effectiveness are often inadequately reported. The aim of this study was to systematically adapt an evidence-based behavior change program to improve healthy lifestyle behaviors in ECS. Methods: Following a review of the literature, focus groups and interviews were conducted with ECS (n = 16). An intervention mapping protocol was used for the program adaptation, which consisted of six steps: a needs assessment, formulation of matrices of change objectives, selection of theoretical methods and practical applications, program production, adoption and implementation planning, and evaluation planning. Social Cognitive Theory and Control Theory guided the adaptation of the intervention. Results: The process consisted of eight 90-min group sessions focusing on shaping outcome expectations, knowledge, self-efficacy, and goals about healthy eating and physical activity. The adapted performance objectives included establishment of regular eating, balanced diet, and portion sizes, reduction in sedentary behaviors, increase in lifestyle and organized activities, formulation of a discrepancy-reducing feedback loop for all above behaviors, and trigger management. Information on managing fatigue and bowel issues unique to ECS were added. Conclusions: Systematic intervention mapping provided a framework to design a cancer survivor-centered lifestyle intervention. ECS welcomed the intervention and provided essential feedback for its adaptation. The program has been evaluated through a randomized controlled trial

    Biomedical Discovery Acceleration, with Applications to Craniofacial Development

    Get PDF
    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

    The Peripheral Blood Transcriptome Identifies the Presence and Extent of Disease in Idiopathic Pulmonary Fibrosis

    Get PDF
    <div><h3>Rationale</h3><p>Peripheral blood biomarkers are needed to identify and determine the extent of idiopathic pulmonary fibrosis (IPF). Current physiologic and radiographic prognostic indicators diagnose IPF too late in the course of disease. We hypothesize that peripheral blood biomarkers will identify disease in its early stages, and facilitate monitoring for disease progression.</p> <h3>Methods</h3><p>Gene expression profiles of peripheral blood RNA from 130 IPF patients were collected on Agilent microarrays. Significance analysis of microarrays (SAM) with a false discovery rate (FDR) of 1% was utilized to identify genes that were differentially-expressed in samples categorized based on percent predicted D<sub>L</sub>CO and FVC.</p> <h3>Main Measurements and Results</h3><p>At 1% FDR, 1428 genes were differentially-expressed in mild IPF (D<sub>L</sub>CO >65%) compared to controls and 2790 transcripts were differentially- expressed in severe IPF (D<sub>L</sub>CO >35%) compared to controls. When categorized by percent predicted D<sub>L</sub>CO, SAM demonstrated 13 differentially-expressed transcripts between mild and severe IPF (< 5% FDR). These include CAMP, CEACAM6, CTSG, DEFA3 and A4, OLFM4, HLTF, PACSIN1, GABBR1, IGHM, and 3 unknown genes. Principal component analysis (PCA) was performed to determine outliers based on severity of disease, and demonstrated 1 mild case to be clinically misclassified as a severe case of IPF. No differentially-expressed transcripts were identified between mild and severe IPF when categorized by percent predicted FVC.</p> <h3>Conclusions</h3><p>These results demonstrate that the peripheral blood transcriptome has the potential to distinguish normal individuals from patients with IPF, as well as extent of disease when samples were classified by percent predicted D<sub>L</sub>CO, but not FVC.</p> </div

    Loci influencing blood pressure identified using a cardiovascular gene-centric array

    Get PDF
    Blood pressure (BP) is a heritable determinant of risk for cardiovascular disease (CVD). To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP) and pulse pressure (PP), we genotyped 50 000 single-nucleotide polymorphisms (SNPs) that capture variation in 2100 candidate genes for cardiovascular phenotypes in 61 619 individuals of European ancestry from cohort studies in the USA and Europe. We identified novel associations between rs347591 and SBP (chromosome 3p25.3, in an intron of HRH1) and between rs2169137 and DBP (chromosome1q32.1 in an intron of MDM4) and between rs2014408 and SBP (chromosome 11p15 in an intron of SOX6), previously reported to be associated with MAP. We also confirmed 10 previously known loci associated with SBP, DBP, MAP or PP (ADRB1, ATP2B1, SH2B3/ATXN2, CSK, CYP17A1, FURIN, HFE, LSP1, MTHFR, SOX6) at array-wide significance (P 2.4 10(6)). We then replicated these associations in an independent set of 65 886 individuals of European ancestry. The findings from expression QTL (eQTL) analysis showed associations of SNPs in the MDM4 region with MDM4 expression. We did not find any evidence of association of the two novel SNPs in MDM4 and HRH1 with sequelae of high BP including coronary artery disease (CAD), left ventricular hypertrophy (LVH) or stroke. In summary, we identified two novel loci associated with BP and confirmed multiple previously reported associations. Our findings extend our understanding of genes involved in BP regulation, some of which may eventually provide new targets for therapeutic intervention.</p
    • …
    corecore