657,103 research outputs found
Normalization And Matching Of Chemical Compound Names
We have developed ChemHits (http://sabio.h-its.org/chemHits/), an application which detects and matches synonymic names of chemical compounds. The tool is based on natural language processing (NLP) methods and applies rules to systematically normalize chemical compound names. Subsequently, matching of synonymous names is achieved by comparison of the normalized name forms. The tool is capable of normalizing a given name of a chemical compound and matching it against names in (bio-)chemical databases, like SABIO-RK, PubChem, ChEBI or KEGG, even when there is no exact name-to-name-match
Semantic spaces
Any natural language can be considered as a tool for producing large
databases (consisting of texts, written, or discursive). This tool for its
description in turn requires other large databases (dictionaries, grammars
etc.). Nowadays, the notion of database is associated with computer processing
and computer memory. However, a natural language resides also in human brains
and functions in human communication, from interpersonal to intergenerational
one. We discuss in this survey/research paper mathematical, in particular
geometric, constructions, which help to bridge these two worlds. In particular,
in this paper we consider the Vector Space Model of semantics based on
frequency matrices, as used in Natural Language Processing. We investigate
underlying geometries, formulated in terms of Grassmannians, projective spaces,
and flag varieties. We formulate the relation between vector space models and
semantic spaces based on semic axes in terms of projectability of subvarieties
in Grassmannians and projective spaces. We interpret Latent Semantics as a
geometric flow on Grassmannians. We also discuss how to formulate G\"ardenfors'
notion of "meeting of minds" in our geometric setting.Comment: 32 pages, TeX, 1 eps figur
brat: a Web-based Tool for NLP-Assisted Text Annotation
We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15 % decrease in total annotation time. BRAT is available under an opensource license from
SupWSD: a flexible toolkit for supervised word sense disambiguation
In this demonstration we present SupWSD, a Java API for supervised Word Sense Disambiguation (WSD). This toolkit includes the implementation of a state-of-the-art supervised WSD system, together with a Natural Language Processing pipeline for preprocessing and feature extraction. Our aim is to provide an easy-to-use tool for the research community, designed to be modular, fast and scalable for training and testing on large datasets. The source code of SupWSD is available at http://github.com/SI3P/SupWSD
- …
