131 research outputs found

    ESG: Extended Similarity Group method for automated protein function prediction

    Get PDF
    We present here the Extended Similarity Group (ESG) method, which annotates query sequences with Gene Ontology (GO) terms by assigning probability to each annotation computed based on iterative PSI-BLAST searches. Conventionally sequence homology based function annotation methods, such as BLAST, retrieve function information from top hits with a significant score (E-values). In contrast, the PFP method, which we have presented previously, goes one step ahead in utilizing a PSI-BLAST result by considering very weak hits even an E-value of up to 100 and also by incorporating the functional association between GO terms (FAM matrix) computed using term co-occurrence frequencies in the UniProt database. PFP is very successful which is evidenced by the top rank in the function prediction category in CASP7 competition. Our new approach, ESG method, further improves the accuracy of PFP by essentially employing PFP in an iterative fashion. An advantage of ESG is that it is built in a rigorous statistical framework: Unlike PFP method that assigns a weighted score to each GO term, ESG assigns a probability based on weights computed using the E-value of each hit sequence on the path between the original query sequence and the current hit sequence

    The Greenhouse Stakes of Globalization

    Get PDF

    Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance.</p> <p>Results</p> <p>Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae</it>, and <it>Plasmodium falciparum </it>(malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, <it>i.e. </it>Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the <it>funSim </it>score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the <it>funSim </it>score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted.</p> <p>Conclusion</p> <p>The analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.</p

    Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools

    Get PDF
    The immensely popular fields of cancer research and bioinformatics overlap in many different areas, e.g. large data repositories that allow for users to analyze data from many experiments (data handling, databases), pattern mining, microarray data analysis, and interpretation of proteomics data. There are many newly available resources in these areas that may be unfamiliar to most cancer researchers wanting to incorporate bioinformatics tools and analyses into their work, and also to bioinformaticians looking for real data to develop and test algorithms. This review reveals the interdependence of cancer research and bioinformatics, and highlight the most appropriate and useful resources available to cancer researchers. These include not only public databases, but general and specific bioinformatics tools which can be useful to the cancer researcher. The primary foci are function and structure prediction tools of protein genes. The result is a useful reference to cancer researchers and bioinformaticians studying cancer alike

    Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing

    Get PDF
    Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis

    The Assessment of Fecal Volatile Organic Compounds in Healthy Infants: Electronic Nose Device Predicts Patient Demographics and Microbial Enterotype

    Get PDF
    Background: The assessment of fecal volatile organic compounds (VOCs) has emerged as a noninvasive biomarker in many different pathologies. Before assessing whether VOCs can be used to diagnose intestinal diseases, including necrotizing enterocolitis (NEC), it is necessary to measure the impact of variable infant demographic factors on VOC signals. Materials and methods: Stool samples were collected from term infants at four hospitals in a large metropolitan area. Samples were heated, and fecal VOCs assessed by the Cyranose 320 Electronic Nose. Twenty-eight sensors were combined into an overall smellprint and were also assessed individually. 16s rRNA gene sequencing was used to categorize infant microbiomes. Smellprints were correlated to feeding type (formula versus breastmilk), sex, hospital of birth, and microbial enterotype. Overall smellprints were assessed by PERMANOVA with Euclidean distances, and individual sensors from each smellprint were assessed by Mann-Whitney U-tests. P < 0.05 was significant. Results: Overall smellprints were significantly different according to diet. Individual sensors were significantly different according to sex and hospital of birth, but overall smellprints were not significantly different. Using a decision tree model, two individual sensors could reliably predict microbial enterotype. Conclusions: Assessment of fecal VOCs with an electronic nose is impacted by several demographic characteristics of infants and can be used to predict microbiome composition. Further studies are needed to design appropriate algorithms that are able to predict NEC based on fecal VOC profiles

    ESG: Extended Similarity Group method for automated protein function prediction

    Get PDF

    Mutation in erythroid specific transcription factor KLF1 causes Hereditary Spherocytosis in the Nan hemolytic anemia mouse model

    Get PDF
    KLF1 regulates definitive erythropoiesis of red blood cells by facilitating transcription through high affinity binding to CACCC elements within its erythroid specific target genes including those encoding erythrocyte membrane skeleton (EMS) proteins. Deficiencies of EMS proteins in humans lead to the hemolytic anemia Hereditary Spherocytosis (HS) which includes a subpopulation with no known genetic defect. Here we report that a mutation, E339D, in the second zinc finger domain of KLF1 is responsible for HS in the mouse model Nan. The causative nature of this mutation was verified with an allelic test cross between Nan/+ and heterozygous Klf1(+/-) knockout mice. Homology modeling predicted Nan KLF1 binds CACCC elements more tightly, suggesting that Nan KLF1 is a competitive inhibitor of wild-type KLF1. This is the first association of a KLF1 mutation with a disease state in adult mammals and also presents the possibility of being another causative gene for HS in humans

    The genetic consequences of dog breed formation-Accumulation of deleterious genetic variation and fixation of mutations associated with myxomatous mitral valve disease in cavalier King Charles spaniels

    Get PDF
    Selective breeding for desirable traits in strictly controlled populations has generated an extraordinary diversity in canine morphology and behaviour, but has also led to loss of genetic variation and random entrapment of disease alleles. As a consequence, specific diseases are now prevalent in certain breeds, but whether the recent breeding practice led to an overall increase in genetic load remains unclear. Here we generate whole genome sequencing (WGS) data from 20 dogs per breed from eight breeds and document a similar to 10% rise in the number of derived alleles per genome at evolutionarily conserved sites in the heavily bottlenecked cavalier King Charles spaniel breed (cKCs) relative to in most breeds studied here. Our finding represents the first clear indication of a relative increase in levels of deleterious genetic variation in a specific breed, arguing that recent breeding practices probably were associated with an accumulation of genetic load in dogs. We then use the WGS data to identify candidate risk alleles for the most common cause for veterinary care in cKCs-the heart disease myxomatous mitral valve disease (MMVD). We verify a potential link to MMVD for candidate variants near the heart specific NEBL gene in a dachshund population and show that two of the NEBL candidate variants have regulatory potential in heartderived cell lines and are associated with reduced NEBL isoform nebulette expression in papillary muscle (but not in mitral valve, nor in left ventricular wall). Alleles linked to reduced nebulette expression may hence predispose cKCs and other breeds to MMVD via loss of papillary muscle integrity
    • ā€¦
    corecore