38 research outputs found

    The what, where, how and why of gene ontology—a primer for bioinformaticians

    Get PDF
    With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications

    REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms

    Get PDF
    Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret

    Toward community standards in the quest for orthologs

    Get PDF
    The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact: [email protected]

    Translational Selection Is Ubiquitous in Prokaryotes

    Get PDF
    Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea

    Analysis of PKS genes in the Dictyostelium discoideum genome

    No full text
    Ameboidna protozoa Dictyostelium discoideum je važan modelni organizam za istraživanje razvoja stanica i evolucije. Nedavno završeno sekvenciranje genoma amebe D. discoideum (6 kromosoma - 34 Mb) pokazalo je oko 12.500 gena, među kojima je više od 40 gena s genetičkom uputom za poliketidsintaze (PKS). Do sada nije poznat niti jedan drugi organizam s toliko mnogo gena PKS. Zbog toga je, u ovom diplomskom radu, bioinformatičkim pristupom započeta analiza gena PKS. Od 45 prepoznatih gena PKS, 29 se nalazi unutar 12 potencijalnih genskih nakupina, od po 2-5 gena, koje možda odgovaraju biosintetskim putovima za različite prirodne produkte. Šesnaest je gena PKS pojedinačno. Svi su geni po strukturi slični genima PKS ponavljajućeg tipa I, osim što sadržavaju dva dodatna sačuvana područja aminokiselina (450-1500 i 300-600 aminokiselina) nepoznatih funkcija koje ne pokazuju sličnost ni s jednim poznatim proteinom. Nije prepoznata niti jedna domena odgovorna za početak sinteze poliketida, što upućuje na mogući novi mehanizam početka sinteze. Nije prepoznat niti jedan gen PKS tipa II. Provedena je detaljna analiza dvaju sačuvanih područja aminokiselina nepoznatih funkcija i njihova anotacija. Utvrđeno je da one sadržavaju domene metiltransferaze i dehidrataze. Dakle, opsežna je anotacija svih gena PKS pokazala da se oni sastoje od tipičnih domena: β-ketoacilsintaze, aciltransferaze, dehidrataze, metiltransferaze, alkenreduktaze, ketoreduktaze i maloga polipeptida nosača acila, očekivanih u svim genima PKS. Pored toga, provedena je analiza domena s aciltransferaznom aktivnošću i filogenetska analiza evolucije gena PKS u genomu amebe D. discoideum.The social amoeba (cellular slime mould) Dictyostelium discoideum is an important model organism for studies of development and evolution. The genome sequence was recently completed showing around 12,500 genes in a relatively small genome of 34 Mb. The organism is exceptionally rich in polyketide synthases (PKS) encoded by more than 40 recognizable genes spread on all six chromosomes. Up till now there is no other organism with as many PKS genes. Therefore, in this diploma thesis the analysis of potential PKS genes using bioinformatics methods was started. From the 45 PKS genes identified, 29 occur in 12 potential gene-clusters (containing 2-5 genes each), which might correspond to the biosynthetic pathways of different natural products. Sixteen genes are individual genes. All genes show similar domain structure to Type I iterative PKS genes except that they contain two additional conserved amino acid stretches (450-1500 and 300-600 amino acids) of unknown function that does not show any homology to any known protein. No loading domains could be identified, which suggests there might be a novel initiation mechanism and no Type II genes were found. Detailed analysis of two conserved amino acid sequences of unknown function showed the presence of methyltransferase and dehydratase domains. The annotation of all genes showed that they encode β-ketoacylsynthase, acyltransferase, dehydratase, methyltransferase, enoylreductase, ketoreductase and acyl carrier protein domains expected in all PKSs. In addition to that, the analysis of AT domains and phylogenetic analysis of PKS gene evolution in the D. discoideum genome were done

    Analysis of PKS genes in the Dictyostelium discoideum genome

    No full text
    Ameboidna protozoa Dictyostelium discoideum je važan modelni organizam za istraživanje razvoja stanica i evolucije. Nedavno završeno sekvenciranje genoma amebe D. discoideum (6 kromosoma - 34 Mb) pokazalo je oko 12.500 gena, među kojima je više od 40 gena s genetičkom uputom za poliketidsintaze (PKS). Do sada nije poznat niti jedan drugi organizam s toliko mnogo gena PKS. Zbog toga je, u ovom diplomskom radu, bioinformatičkim pristupom započeta analiza gena PKS. Od 45 prepoznatih gena PKS, 29 se nalazi unutar 12 potencijalnih genskih nakupina, od po 2-5 gena, koje možda odgovaraju biosintetskim putovima za različite prirodne produkte. Šesnaest je gena PKS pojedinačno. Svi su geni po strukturi slični genima PKS ponavljajućeg tipa I, osim što sadržavaju dva dodatna sačuvana područja aminokiselina (450-1500 i 300-600 aminokiselina) nepoznatih funkcija koje ne pokazuju sličnost ni s jednim poznatim proteinom. Nije prepoznata niti jedna domena odgovorna za početak sinteze poliketida, što upućuje na mogući novi mehanizam početka sinteze. Nije prepoznat niti jedan gen PKS tipa II. Provedena je detaljna analiza dvaju sačuvanih područja aminokiselina nepoznatih funkcija i njihova anotacija. Utvrđeno je da one sadržavaju domene metiltransferaze i dehidrataze. Dakle, opsežna je anotacija svih gena PKS pokazala da se oni sastoje od tipičnih domena: β-ketoacilsintaze, aciltransferaze, dehidrataze, metiltransferaze, alkenreduktaze, ketoreduktaze i maloga polipeptida nosača acila, očekivanih u svim genima PKS. Pored toga, provedena je analiza domena s aciltransferaznom aktivnošću i filogenetska analiza evolucije gena PKS u genomu amebe D. discoideum.The social amoeba (cellular slime mould) Dictyostelium discoideum is an important model organism for studies of development and evolution. The genome sequence was recently completed showing around 12,500 genes in a relatively small genome of 34 Mb. The organism is exceptionally rich in polyketide synthases (PKS) encoded by more than 40 recognizable genes spread on all six chromosomes. Up till now there is no other organism with as many PKS genes. Therefore, in this diploma thesis the analysis of potential PKS genes using bioinformatics methods was started. From the 45 PKS genes identified, 29 occur in 12 potential gene-clusters (containing 2-5 genes each), which might correspond to the biosynthetic pathways of different natural products. Sixteen genes are individual genes. All genes show similar domain structure to Type I iterative PKS genes except that they contain two additional conserved amino acid stretches (450-1500 and 300-600 amino acids) of unknown function that does not show any homology to any known protein. No loading domains could be identified, which suggests there might be a novel initiation mechanism and no Type II genes were found. Detailed analysis of two conserved amino acid sequences of unknown function showed the presence of methyltransferase and dehydratase domains. The annotation of all genes showed that they encode β-ketoacylsynthase, acyltransferase, dehydratase, methyltransferase, enoylreductase, ketoreductase and acyl carrier protein domains expected in all PKSs. In addition to that, the analysis of AT domains and phylogenetic analysis of PKS gene evolution in the D. discoideum genome were done

    Predictive accuracy of phylogenetic profiling when we control for the influence of the Open World Assumption.

    No full text
    <p>Two sets of experiments are denoted with colours: experiments when we include only the well-annotated proteins (purple) and experiments where we randomly remove 60% of the available annotations (red). Dashed and full lines connect the dots of the mean AUPRC scores for two sets of experiments: random sub-selection of genomes (full lines) and sub-selection to keep maximum diversity among the selected genomes (dashed lines). Each dot represents the mean AUPRC for the GO terms we use in annotating. The final point denotes the mean AUPRC score when we include all the available bacteria in the used OMA database release (1078 bacteria).</p
    corecore