95 research outputs found

    The what, where, how and why of gene ontology—a primer for bioinformaticians

    Get PDF
    With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications

    Ontologies in medicinal chemistry: current status and future challenges

    Get PDF
    [Abstract] Recent years have seen a dramatic increase in the amount and availability of data in the diverse areas of medicinal chemistry, making it possible to achieve significant advances in fields such as the design, synthesis and biological evaluation of compounds. However, with this data explosion, the storage, management and analysis of available data to extract relevant information has become even a more complex task that offers challenging research issues to Artificial Intelligence (AI) scientists. Ontologies have emerged in AI as a key tool to formally represent and semantically organize aspects of the real world. Beyond glossaries or thesauri, ontologies facilitate communication between experts and allow the application of computational techniques to extract useful information from available data. In medicinal chemistry, multiple ontologies have been developed during the last years which contain knowledge about chemical compounds and processes of synthesis of pharmaceutical products. This article reviews the principal standards and ontologies in medicinal chemistry, analyzes their main applications and suggests future directions.Instituto de Salud Carlos III; FIS-PI10/02180Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT0366Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; CN2012/217Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; CN2011/034Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; CN2012/21

    Extralinguistic arguments in 21st century language planning discourse: a “superdictionary” between language standardization from above and below

    Get PDF
    Language standardization has historically been a critical area of inquiry in language policy and planning (LPP) research. This is a political matter, which contributes to “more (and hierarchical) heterogeneity” rather than linguistic homogeneity (Gal 2006: 171). The paper empirically explores extralinguistic arguments, which are used by language professionals (planners, academics, educators) in mainstream media discourse. This public discourse is initiated by the launch of an Estonian “superdictionary” in 2019 (see Tavast et al. 2020), and its public reception. By using Critical Discourse Analysis (CDA) as a method, the paper also provides insight into the discursive construction of language as such and (Standard) Estonian by different LPP actors. Above all, it aims to understand the issues of power and authority in language standardization. The discourse illustrates the paradigmatic change in standardization and lexicography: from including selected language samples to the acceptance of non-elite language variants and varieties. This change has generated a polarization of stance among language professionals, and similar discursive moves, e.g., references to the past and future dangers, metaphors and other comparisons are used. Kokkuvõte. Kadri Koreinik: Keelevälised argumendid 21. sajandi keelekorraldusdiskursuses: „supersõnaraamat“ ülalt-alla ja alt-üles keelekorralduse vahel. Keele standardimist peetakse keelepoliitika ja -korraldusuuringute üheks keskseks huviobjektiks. Standardimine ehk normimine on poliitiline valik, mis selmet keelelist homogeensust luua, tekitab hoopis heterogeensust, mis on olemuselt hierarhiline: normikeelt kas osatakse või mitte (Gal 2006). Artiklis uuritakse keeleväliseid argumente, mida kasutavad peavoolumeedias normimise üle arutlemiseks keelega professionaalselt seotud inimesed (keelekorraldajad, -teadlased, haridustegelased). Uuritav diskursus lähtub nn sõnastikureformist ja selle retseptsioonist. Analüüsimeetodina kasutatakse kriitilist diskursusanalüüsi, mis võimaldab süvitsi uurida, kuidas avalikkuses konstrueeritakse keelega, sh eesti kirjakeelega seotud tähendusi. Lisaks aitab analüüs mõista võimu ja domineerimise küsimusi keele normimisel. Diskursuses kajastub paradigmaatiline muutus, mis väärtustab kasutus- ja korpuspõhist lähenemist leksikograafias, aga ka keele normimisel. Kuigi muutus on löönud „keeleinimesed“ kahte lehte, kasutatakse poolt- või vastuargumentides samu võtteid: viiteid autoriteetidele, keele ohustatusele, metafoore ja teisi võrdlusi

    A FAIR approach to genomics

    Get PDF
    The aim of this thesis was to increase our understanding on how genome information leads to function and phenotype. To address these questions, I developed a semantic systems biology framework capable of extracting knowledge, biological concepts and emergent system properties, from a vast array of publicly available genome information. In chapter 2, Empusa is described as an infrastructure that bridges the gap between the intended and actual content of a database. This infrastructure was used in chapters 3 and 4 to develop the framework. Chapter 3 describes the development of the Genome Biology Ontology Language and the GBOL stack of supporting tools enforcing consistency within and between the GBOL definitions in the ontology (OWL) and the Shape Expressions (ShEx) language describing the graph structure. A practical implementation of a semantic systems biology framework for FAIR (de novo) genome annotation is provided in chapter 4. The semantic framework and genome annotation tool described in this chapter has been used throughout this thesis to consistently, structurally and functionally annotate and mine microbial genomes used in chapter 5-10. In chapter 5, we introduced how the concept of protein domains and corresponding architectures can be used in comparative functional genomics to provide for a fast, efficient and scalable alternative to sequence-based methods. This allowed us to effectively compare and identify functional variations between hundreds to thousands of genomes. In chapter 6, we used 432 available complete Pseudomonas genomes to study the relationship between domain essentiality and persistence. In this chapter the focus was mainly on domains involved in metabolic functions. The metabolic domain space was explored for domain essentiality and persistence through the integration of heterogeneous data sources including six published metabolic models, a vast gene expression repository and transposon data. In chapter 7, the correlation between the expected and observed genotypes was explored using 16S-rRNA phylogeny and protein domain class content as input. In this chapter it was shown that domain class content yields a higher resolution in comparison to 16S-rRNA when analysing evolutionary distances. Using protein domain classes, we also were able to identify signifying domains, which may have important roles in shaping a species. To demonstrate the use of semantic systems biology workflows in a biotechnological setting we expanded the resource with more than 80.000 bacterial genomes. The genomic information of this resource was mined using a top down approach to identify strains having the trait for 1,3-propanediol production. This resulted in the molecular identification of 49 new species. In addition, we also experimentally verified that 4 species were capable of producing 1,3-propanediol. As discussed in chapter 10, the here developed semantic systems biology workflows were successfully applied in the discovery of key elements in symbiotic relationships, to improve functional genome annotation and in comparative genomics studies. Wet/dry-lab collaboration was often at the basis of the obtained results. The success of the collaboration between the wet and dry field, prompted me to develop an undergraduate course in which the concept of the “Moist” workflow was introduced (Chapter 9).</p

    Handling word formation in comparative linguistics

    Get PDF
    Word formation plays a central role in human language. Yet computational approaches to historical linguistics often pay little attention to it. This means that the detailed findings of classical historical linguistics are often only used in qualitative studies, yet not in quantitative studies. Based on human- and machine-readable formats suggested by the CLDF-initiative, we propose a framework for the annotation of cross-linguistic etymological relations that allows for the differentiation between etymologies that involve only regular sound change and those that involve linear and non-linear processes of word formation. This paper introduces this approach by means of sample datasets and a small Python library to facilitate annotation

    The Gene Ontology Handbook

    Get PDF
    bioinformatics; biotechnolog

    Emerging model spedies driven by transciptomics

    Get PDF
    This work is focused on 'emerging model species', i.e. question-driven model species which have sufficient molecular resources to investigate a specific phenomenon in molecular biology, developmental biology, molecular ecology and evolution or related molecular fields. This thesis shows how transcriptomic data can be generated, analyzed, and used to investigate such phenomena of interest even in species lacking a reference genome. The initial ButterflyBase resource has proven to be useful to researchers of species without a reference genome but is limited to the Lepidoptera and supports only the older Sanger sequencing technologies. Thanks to Next Generation Sequencing, transcriptome sequencing is more cost effective but the bottleneck of transcriptomic projects is now the bioinformatic analysis and data mining/dissemination. Therefore, this work continues with presenting novel and innovative approaches which effectively overcome this bottleneck. The est2assembly software produces deeply annotated reference transcriptomes stored in the Chado database. The Drupal Bioinformatic Server Framework and genes4all provide species-neutral and an innovative approach in building standardized online databases and associated web services. All public insect mRNA data were analyzed with est2assembly and genes4all to produce the InsectaCentral. With InsectaCentral, a powerful resource is now available to assist molecular biology in any question-driven model insect species. The software presented here was developed according to specifications of the General Model Organism Database (GMOD) community. All software specifications are species-neutral and can be seamlessly deployed to assist any research community. Further through a case studies chapter, it becomes apparent that the transcriptomic approach is more cost-effective than a genomic approach and therefore sequence-driven evolutionary biology will benefit faster with this field
    corecore