Search CORE

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Learning to distinguish hypernyms and co-hyponyms

Author: Clarke Daoud
Keller Bill
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: Dublin City University and Association for Computational Linguistics
Publication date: 01/08/2014
Field of study

This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches

arXiv.org e-Print Archive

Aligning packed dependency trees: a theory of composition for distributional semantics

Author: Kober Thomas
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: 'MIT Press - Journals'
Publication date: 25/08/2016
Field of study

We present a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We show that these structures have the potential to capture the full sentential contexts of a lexeme and provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambiguation and generalization

arXiv.org e-Print Archive

Improving Semantic Composition with Offset Inference

Author: Kober Thomas
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Count-based distributional semantic models suffer from sparsity due to unobserved but plausible co-occurrences in any text collection. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. We therefore introduce a novel form of distributional inference that exploits the rich type structure in APTs and infers missing data by the same mechanism that is used for semantic composition.Comment: to appear at ACL 2017 (short papers

arXiv.org e-Print Archive

Improving sparse word representations with distributional inference for semantic composition

Author: Kober Thomas
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Distributional models are derived from co- occurrences in a corpus, where only a small proportion of all possible plausible co-occurrences will be observed. This results in a very sparse vector space, requiring a mechanism for inferring missing knowledge. Most methods face this challenge in ways that render the resulting word representations uninterpretable, with the consequence that semantic composition becomes hard to model. In this paper we explore an alternative which involves explicitly inferring unobserved co-occurrences using the distributional neighbourhood. We show that distributional inference improves sparse word repre- sentations on several word similarity benchmarks and demonstrate that our model is competitive with the state-of-the-art for adjective- noun, noun-noun and verb-object compositions while being fully interpretable

A critique of word similarity as a method for evaluating distributional semantic models

Author: Batchkarov Miroslav
Kober Thomas
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper aims to re-think the role of the word similarity task in distributional semantics research. We argue while it is a valuable tool, it should be used with care because it provides only an approximate measure of the quality of a distributional model. Word similarity evaluations assume there exists a single notion of similarity that is independent of a particular application. Further, the small size and low inter-annotator agreement of existing data sets makes it challenging to find significant differences between models

Virtual Commons - Bridgewater State University

Social Media and Professionalism Through Technology

Author: Komack Julie
Weir Lori
Publication venue: Virtual Commons - Bridgewater State University
Publication date: 18/08/2011
Field of study

This workshop will provide faculty and staff with a brief overview of how Social Media has changed the way we communicate, how to best utilize it to reach and engage our students in and out of the classroom, and how to build and keep a clean e-image

Recommended from our members

Structure-aware sentence encoder in Bert-based siamese network

Author: Peng Qiwei
Weeds Julie
Weir David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 06/08/2021
Field of study

Recently, impressive performance on various natural language understanding tasks has been achieved by explicitly incorporating syntax and semantic information into pre-trained models, such as BERT and RoBERTa. However, this approach depends on problem-specific fine-tuning, and as widely noted, BERT-like models exhibit weak performance, and are inefficient, when applied to unsupervised similarity comparison tasks. Sentence-BERT (SBERT) has been proposed as a general-purpose sentence embedding method, suited to both similarity comparison and downstream tasks. In this work, we show that by incorporating structural information into SBERT, the resulting model outperforms SBERT and previous general sentence encoders on unsupervised semantic textual similarity (STS) datasets and transfer classification tasks