Search CORE

1,246 research outputs found

Mapping WordNet Instances to Wikipedia

Author: John P. McCrae
Publication venue
Publication date
Field of study

Lexical resource differ from encyclopaedic resources and represent two distinct types of resource covering general language and named entities respectively. However, many lexical resources, including Princeton WordNet, contain many proper nouns, referring to named entities in the world yet it is not possible or desirable for a lexical resource to cover all named entities that may reasonably occur in a text. In this paper, we propose that instead of including synsets for instance concepts PWN should instead provide links to Wikipedia articles describing the concept. In order to enable this we have created a gold-quality mapping between all of the 7,742 instances in PWN and Wikipedia (where such a mapping is possible). As such, this resource aims to provide a gold standard for link discovery, while also allowing PWN to distinguish itself from other resources such as DBpedia or BabelNet. Moreover, this linking connects PWN to the Linguistic Linked Open Data cloud, thus creating a richer, more usable resource for natural language processing

ZENODO

Synonym set extraction from the biomedical literature by lexical pattern discovery

Author: Collier Nigel
McCrae John
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy. The assumption of extant resources such as parsers is also a limiting factor for many languages, so it is desirable to find patterns that do not use syntactical analysis. Finally to give a more consistent and applicable result it is desirable to use these patterns to form synonym sets in a sound way. Results We present a method that automatically generates regular expression patterns by expanding seed patterns in a heuristic search and then develops a feature vector based on the occurrence of term pairs in each developed pattern. This allows for a binary classifications of term pairs as synonymous or non-synonymous. We then model this result as a probability graph to find synonym sets, which is equivalent to the well-studied problem of finding an optimal set cover. We achieved 73.2% precision and 29.7% recall by our method, out-performing hand-made resources such as MeSH and Wikipedia. Conclusion We conclude that automatic methods can play a practical role in developing new thesauri or expanding on existing ones, and this can be done with only a small amount of training data and no need for resources such as parsers. We also concluded that the accuracy can be improved by grouping into synonym sets.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

In Flanders Fields the Poppies Grow

Author: McCrae John
Sousa John Philip, 1854-1932
Publication venue: DigitalCommons@UMaine
Publication date: 01/01/1918
Field of study

https://digitalcommons.library.umaine.edu/mmb-vp/1838/thumbnail.jp

University of Maine

Orthonormal Explicit Topic Analysis for Cross-lingual Document Matching

Author: Cimiano Philipp
Klinger Roman
McCrae John
Publication venue
Publication date: 01/01/2013
Field of study

McCrae J, Cimiano P, Klinger R. Orthonormal Explicit Topic Analysis for Cross-lingual Document Matching. In: Proceedings of the 2013 Conference on Empirical Natural Language Processing. 2013: 1732-1740

Publications at Bielefeld University

Encoder-Attention-Based Automatic Term Recognition (EA-ATR)

Author: Manjunath Sampritha H.
McCrae John P.
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

Automated Term Recognition (ATR) is the task of finding terminology from raw text. It involves designing and developing techniques for the mining of possible terms from the text and filtering these identified terms based on their scores calculated using scoring methodologies like frequency of occurrence and then ranking the terms. Current approaches often rely on statistics and regular expressions over part-of-speech tags to identify terms, but this is error-prone. We propose a deep learning technique to improve the process of identifying a possible sequence of terms. We improve the term recognition by using Bidirectional Encoder Representations from Transformers (BERT) based embeddings to identify which sequence of words is a term. This model is trained on Wikipedia titles. We assume all Wikipedia titles to be the positive set, and random n-grams generated from the raw text as a weak negative set. The positive and negative set will be trained using the Embed, Encode, Attend and Predict (EEAP) formulation using BERT as embeddings. The model will then be evaluated against different domain-specific corpora like GENIA - annotated biological terms and Krapivin - scientific papers from the computer science domain

ZENODO

Dagstuhl Research Online Publication Server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

An Introduction to the Five-Factor Model and Its Applications

Author: John Oliver P.
McCrae Robert R.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/1992
Field of study

The five-factor model of personality is a hierarchical organization of personality traits in terms of five basic dimensions: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience. Research using both natural language adjectives and theoretically based personality questionnaires supports the comprehensiveness of the model and its applicability across observers and cultures. This article summarizes the history of the model and its supporting evidence; discusses conceptions of the nature of the factors; and outlines an agenda for theorizing about the origins and operation of the factors. We argue that the model should prove useful both for individual assessment and for the elucidation of a number of topics of interest to personality psychologists