282 research outputs found

    Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    Get PDF
    Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist

    Rewriting and suppressing UMLS terms for improved biomedical term identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</p> <p>Results</p> <p>Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</p> <p>Conclusions</p> <p>We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at <url>http://biosemantics.org/casper</url>.</p

    Comparing regional organizations in global multilateral institutions:ASEAN, the EU and the UN

    Get PDF
    Structural change brought about by the end of the Cold War and accelerated globalisation have transformed the global environment. A global governance complex is emerging, characterised by an ever-greater functional and regulatory role for multilateral organisations such as the United Nations (UN) and its associated agencies. The evolving global governance framework has created opportunities for regional organisations to participate as actors within the UN (and other multilateral institutions). This article compares the European Union (EU) and Association of Southeast Asian Nations (ASEAN) as actors within the UN network. It begins by extrapolating framework conditions for the emergence of EU and ASEAN actorness from the literature. The core argument of this article is that EU and ASEAN actorness is evolving in two succinct stages: Changes in the global environment create opportunities for the participation of regional organisations in global governance institutions, exposing representation and cohesion problems at the regional level. In response, ASEAN and the EU have initiated processes of institutional adaptation

    Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    Get PDF
    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results: Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions: Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect

    Regional actorness and interregional relations:ASEAN, the EU and Mercosur

    Get PDF
    The European Union (EU) has a long tradition of interregional dialogue mechanisms with other regional organisations and is using these relations to project its own model of institutionalised actorness. This is partly motivated by the emerging actorness of the EU itself, which benefits from fostering capable regional counterparts in other parts of the world. This article advances the argument that actorness, which we conceptualise in terms of institutions, recognition and identity, is a relational concept, dependent on context and perception. Taking the Association of Southeast Asian Nations (ASEAN) and the Common Market of the South (Mercosur) and their relations with the EU as case studies, this article demonstrates that the actorness capabilities of all three organisations have been enhanced as result of ASEAN-EU and Mercosur-EU relations. However, there are clear limits to the development of the three components of regional actorness and to the interregional relations themselves. These limits stem both from the type of interregionalism at play and from the different regional models the actors incorporate. While there is evidence of institutional enhancement in ASEAN and Mercosur, these formal changes have been grafted on top of firmly entrenched normative underpinnings. Within the regional organisations, interactions with the EU generate centrifugal forces concerning the model to pursue, thus limiting their institutional cohesion and capacity. In addition, group-to-group relations have reinforced ASEAN and Mercosur identities in contrast to the EU. The formation of such differences has narrowed the scope of EU interregionalism despite the initial success of improved regional actorness

    Literature-aided interpretation of gene expression data with the weighted global test

    Get PDF
    Most methods for the interpretation of gene expression profiling experiments rely on the categorization of genes, as provided by the Gene Ontology (GO) and pathway databases. Due to the manual curation process, such databases are never up-to-date and tend to be limited in focus and coverage. Automated literature mining tools provide an attractive, alternative approach. We review how they can be employed for the interpretation of gene expression profiling experiments. We illustrate that their comprehensive scope aids the interpretation of data from domains poorly covered by GO or alternative databases, and allows for the linking of gene expression with diseases, drugs, tissues and other types of concepts. A framework for proper statistical evaluation of the associations between gene expression values and literature concepts was lacking and is now implemented in a weighted extension of global test. The weights are the literature association scores and reflect the importance of a gene for the concept of interest. In a direct comparison with classical GO-based gene sets, we show that use of literature-based associations results in the identification of much more specific GO categories. We demonstrate the possibilities for linking of gene expression data to patient survival in breast cancer and the action and metabolism of drugs. Coupling with online literature mining tools ensures transparency and allows further study of the identified associations. Literature mining tools are therefore powerful additions to the toolbox for the interpretation of high-throughput genomics data.UB – Publicatie

    Shaping the global communications milieu : the EU's influence on internet and telecommunications governance

    Get PDF
    This article evaluates the European Union's (EU) influence in shaping the global governance for telecommunications and the Internet. Through analysing EU behaviour within an actorness framework, we demonstrate how the external opportunity structure and the EU's internal environment has impacted on its ability to exert and maximize its presence in order to meet its goals and aims in these two very different sub-sectors of global communications in terms of evolution and development. Such a comparison of EU actorness, we argue, is revealing in terms of uncovering the underlying factors and conditions that allow the EU to influence two important and dynamic communications sub-sectors

    Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    Get PDF
    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

    Venezuela e ALBA: regionalismo contra-hegemônico e ensino superior para todos

    Full text link
    Partindo de um quadro teórico neo-gramsciano crítico à globalização, este artigo aplica a nova teoria do regionalismo (NTR) e a teoria do regionalismo regulatório (TRR) à sua análise e teorização dos tratados de comércio da Aliança Bolivariana para os Povos da Nossa América (ALBA-TCP) como regionalismo contra-hegemônico na América Latina e Caribe (ALC). A ALBA está centrada na ideia de um Socialismo do Século XXI, que, como (inicialmente) também a Revolução Bolivariana da Venezuela, substitui a 'vantagem competitiva' pela 'vantagem cooperativa'. Em seu caráter de conjunto de processos multidimensionais e transnacionais a ALBA-TCP opera dentro de/transversalmente a um número de setores e escalas, ao mesmo passo que as transformações estruturais são movidas pela interação de agentes do Estado e agentes não estatais. A política de Educação Superior para Todos (ESPT) do governo venezuelano rejeita a agenda neoliberal globalizada de mercadorização, privatização e elitismo e reinvindica educação pública gratuita em todos os níveis como um direito humano fundamental. A ESPT está sendo regionalizado em um espaço educacional emergente da ALBA e assume um papel-chave nos processos de democracia direta e participatória, dos quais a construção popular (bottom-up) da contra-hegemonia e a redefinição política e econômica da ALC dependem. Antes de produzir sujeitos empreendedores conformes ao capitalismo global, a ESPT procura formar subjetividades ao longo de valores morais de solidariedade e cooperação. Isso será ilustrado com referência a um estudo etnográfico de caso da Universidade Bolivariana da Venezuela (UBV).This paper employs new regionalism theory and regulatory regionalism theory in its analysis and theorisation of the Bolivarian Alliance for the Peoples of Our America (ALBA) as a counter-hegemonic Latin American and Caribbean (LAC) regionalism. As (initially) the regionalisation of Venezuela's Bolivarian Revolution, ALBA is centred around the idea of a 21st Century Socialism that replaces the 'competitive advantage' with the 'cooperative advantage'. ALBA, as a set of multi-dimensional inter- and transnational processes, operates within and across a range of sectors and scales whilst the structural transformations are driven by the interplay of state and non-state actors. The Venezuelan government's Higher Education For All (HEFA) policy, which is being regionalised within an emergent ALBA education space, assumes a key role in the direct democratic and participatory democratic processes upon which a bottom-up construction of counter-hegemony depends. HEFA challenges the globalised neoliberal higher education agenda of commoditisation, privatisation and elitism. Rather than producing enterprising subjects fashioned for global capitalism, HEFA seeks to form subjectivities along the moral values of solidarity and cooperation

    A Practical Guide to Preprints: Accelerating Scholarly Communication

    Get PDF
    This guide is the translation adapted to the French background of "A Practical Guide to Preprints: Accelerating Scholarly Communication"International audienceThis guide is the translation adapted to the French background of "A Practical Guide to Preprints: Accelerating Scholarly Communication" prepared and distributed by a team of Dutch researchers and librarians). It is intended for researchers who wish to deposit preprints in repositories even before their manuscript is accepted by a publisher and addresses a number of their questions and concerns related to community review, publication in scientific and scholarly journals, evaluation and assessment, and the visibility of their work.The guide also includes explanations and advice on the use, understanding and interpretation of the preprint for members of the public, who may find it useful as well.Ce guide est la traduction adaptée au contexte français de "A Practical Guide to Preprints: Accelerating Scholarly Communication" préparé et diffusé par une équipe de chercheurs et bibliothécaires néerlandais . Il s'adresse aux chercheuses et chercheurs qui désirent déposer des prépublications dans des archives même avant l'acceptation de leur manuscrit auprès d'un éditeur et répond à un certain nombre de leurs questions et préoccupations en lien avec l'appréciation communautaire, la publication dans des revues scientifiques et savantes, l'évaluation et la visibilité de leur travail.L'ouvrage offre également des explications et des conseils pour l'utilisation, la compréhension et l'interprétation de cet objet particulier qu'est le préprint aux membres du public auprès duquel il trouve également une certaine utilité
    corecore