2,414 research outputs found
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
We design a new technique for the distributional semantic modeling with a
neural network-based approach to learn distributed term representations (or
term embeddings) - term vector space models as a result, inspired by the recent
ontology-related approach (using different types of contextual knowledge such
as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to
the identification of terms (term extraction) and relations between them
(relation extraction) called semantic pre-processing technology - SPT. Our
method relies on automatic term extraction from the natural language texts and
subsequent formation of the problem-oriented or application-oriented (also
deeply annotated) text corpora where the fundamental entity is the term
(includes non-compositional and compositional terms). This gives us an
opportunity to changeover from distributed word representations (or word
embeddings) to distributed term representations (or term embeddings). This
transition will allow to generate more accurate semantic maps of different
subject domains (also, of relations between input terms - it is useful to
explore clusters and oppositions, or to test your hypotheses about them). The
semantic map can be represented as a graph using Vec2graph - a Python library
for visualizing word embeddings (term embeddings in our case) as dynamic and
interactive graphs. The Vec2graph library coupled with term embeddings will not
only improve accuracy in solving standard NLP tasks, but also update the
conventional concept of automated ontology development. The main practical
result of our work is the development kit (set of toolkits represented as web
service APIs and web application), which provides all necessary routines for
the basic linguistic pre-processing and the semantic pre-processing of the
natural language texts in Ukrainian for future training of term vector space
models.Comment: In English, 9 pages, 2 figures. Not published yet. Prepared for
special issue (UkrPROG 2020 conference) of the scientific journal "Problems
in programming" (Founder: National Academy of Sciences of Ukraine, Institute
of Software Systems of NAS Ukraine
- …