Search CORE

13 research outputs found

Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules

Author: Korhonen A
Mrkšic N
Reichart R
Séaghdha D
Vulic I
Young S
Publication venue: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Publication date: 01/01/2017
Field of study

Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that have similar distributional signatures. These effects are detrimental for language understanding systems, which may infer that inexpensive is a rephrasing for expensive or may not associate acquire with acquires. In this work, we propose a novel morph-fitting procedure which moves past the use of curated semantic lexicons for improving distributional vector spaces. Instead, our method injects morphological constraints generated using simple language-specific rules, pulling inflectional forms of the same word close together and pushing derivational antonyms far apart. In intrinsic evaluation over four languages, we show that our approach: 1) improves low-frequency word estimates; and 2) boosts the semantic quality of the entire word vector collection. Finally, we show that morph-fitted vectors yield large gains in the downstream task of dialogue state tracking, highlighting the importance of morphology for tackling long-tail phenomena in language understanding tasks

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)

A Framework for Interpreting Bridging Anaphora

Author: C. Butnariu
D. Bean
D. Ó Séaghdha
I. Hendrickx
J. Levi
J.N. Levi
J.R. Hobbs
K. Fraurud
M. Lauer
M. Poesio
P. Downing
R. Girju
R. Vieira
S. Tratz
S.-N. Kim
S.N. Kim
T. Sanders
Publication venue: Springer
Publication date: 01/01/2013
Field of study

In this paper we present a novel framework for resolving bridging anaphora.We argue that anaphora, particularly bridging anaphora, is used as a shortcut device similar to the use of compound nouns. Hence, the two natural language usage phenomena would have to be based on the same theoretical framework. We use an existing theory on compound nouns to test its validity for anaphora usages. To do this, we used hu- man annotators to interpret indirect anaphora from naturally occurring discourses. The annotators were asked to classify the relations between anaphor-antecedent pairs into relation types that have been previously used to describe the relations between a modi er and the head noun of a compound noun. We obtained very encouraging results with an average Fleiss's value of 0.66 for inter-annotation agreement. The results were evaluated against other similar natural language interpretation annota- tion experiments and were found to compare well. In order to determine the prevalence of the proposed set of anaphora relations we did a detailed analysis of a subset 20 newspaper articles. The results obtained from this also indicated that a majority (98%) of the relations could be described by the relations in the framework. The results from this analysis also showed the distribution of the relation types in the genre of news paper article discourses

Crossref

AUT Scholarly Commons

Text Mining for Literature Review and Knowledge Discovery in Cancer Risk Assessment and Research

Author: A Keselman
A Kolman
A Korhonen
Anna Korhonen
AR Feinstein
B Alex
C Boström
C Cortes
C Leslie
CC Chang
D Hattis
D McGregor
D Ó Séaghdha
Diarmuid Ó Séaghdha
DV Cicchetti
H Wang
H Wang
Ilona Silins
J Cohen
J Lin
J Shawe-Taylor
Johan Högberg
K Bouker
K Morgan
KB Cohen
L Hunter
Lin Sun
M Hein
M Jackson
N Cristianini
N Karamanis
Neil R. Smalheiser
P Zweigenbaum
Products EFSA Panel on Plant Protection
R Frijters
R Jelier
R Judson
RB Altman
S Ananiadou
S Cohen
Science US National Academy of
T Byrt
T Joachims
TC Rindesch
TG Dietterich
Ulla Stenius
Y Guo
YW Chen
Publication venue: Public Library of Science
Publication date: 12/04/2012
Field of study

Research in biomedical text mining is starting to produce technology which can make information in biomedical literature more accessible for bio-scientists. One of the current challenges is to integrate and refine this technology to support real-life scientific tasks in biomedicine, and to evaluate its usefulness in the context of such tasks. We describe CRAB – a fully integrated text mining tool designed to support chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means. Our tool automates the process by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users. We present a direct and user-based evaluation which shows that the technology integrated in the tool is highly accurate, and report a number of case studies which demonstrate how the tool can be used to support scientific discovery in cancer risk assessment and research. Our work demonstrates the usefulness of a text mining pipeline in facilitating complex research tasks in biomedicine. We discuss further development and application of our technology to other types of chemical risk assessment in the future

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Intelligent Assistant Language Understanding On Device

Author: Aas Cecilia
Abdelsalam Hisham
Belousova Irina
Bhargava Shruti
Cheng Jianpeng
Daland Robert
Del Vecchio Marco
Driesen Joris
Flego Federico
Guigue Tristan
Johannsen Anders
Lal Partha
Lu Jiarui
Moniz Joel Ruben Antony
Perkins Nathan
Piraviperumal Dhivya
Pulman Stephen
Sun David Q.
Séaghdha Diarmuid Ó
Torr John
Wacker Jay
Williams Jason D.
Yu Hong
Publication venue
Publication date: 07/08/2023
Field of study

It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and technologies. For example, some approaches in the dialog systems literature are difficult to maintain over time in a deployment setting. We hope that sharing learnings from our practical experiences may help inform future work in the research community

arXiv.org e-Print Archive

Modelling semantic transparency

Author: A. Ferraresi
A. Lenci
A. Pollatsek
B. J. Juhasz
B. Warren
C. L. Gagné
C. L. Gagné
C. L. Gagné
C. L. Gagné
C. L. Gagné
C. L. Gagné
C. L. Gagné
C. L. Gagné
D. Sandra
D. Ó Séaghdha
G. Fanselow
G. Jarema
G. Libben
G. Libben
H. Ji
I. Plag
I. Plag
I. Plag
J. N. Levi
J. S. Bowers
K. Erk
M. J. Bell
M. J. Bell
M. J. Bell
M. Marelli
M. Sahlgren
Martin Schäfer
Melanie J. Bell
P. Downing
P. Maguire
P. Zwitserlood
R. B. Lees
R. El-Bialy
R. H. Baayen
R. H. Baayen
R. Huddleston
S. C. Levinson
S. Frisson
S. Hoffmann
S. Monsell
S. Nakagawa
S. Reddy
T. L. Spalding
T. L. Spalding
V. Kuperman
V. Kuperman
W. Marslen-Wilson
Z. Estes
Z. Estes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present models of semantic transparency in which the perceived trans- parency of English noun–noun compounds, and of their constituent words, is pre- dicted on the basis of the expectedness of their semantic structure. We show that such compounds are perceived as more transparent when the first noun is more frequent, hence more expected, in the language generally; when the compound semantic rela- tion is more frequent, hence more expected, in association with the first noun; and when the second noun is more productive, hence more expected, as the second ele- ment of a noun–noun compound. Taken together, our models of compound and con- stituent transparency lead us to two conclusions. Firstly, although compound trans- parency is a function of the transparencies of the constituents, the two constituents differ in the nature of their contribution. Secondly, since all the significant predictors in our models of compound transparency are also known predictors of processing speed, perceived transparency may itself be a reflex of ease of processing

Crossref

Springer - Publisher Connector

Anglia Ruskin Research

SemEval-2013 task 4: Free paraphrases of noun compounds

Author: Hendrickx I.H.E.
Kozareva Z.
Nakov P
Szpakowicz S.
Veale T.
Ó Séaghdha D.
Publication venue: Atlanta, Georgia, USA : NAACL
Publication date: 01/01/2013
Field of study

Contains fulltext : 122615.pdf (publisher's version ) (Open Access

Radboud Repository

Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules

Author: Korhonen A
Mrkšic N
Reichart R
Séaghdha D
Vulic I
Young S
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

CUED - Cambridge University Engineering Department

Semantic interpretation of noun compounds using verbal and other paraphrases

Author: Baker C. F.
Baldwin T.
Butnariu C.
Butnariu C.
Butnariu C.
Fillmore C. J.
Girju R.
Goldberg A. E.
Grefenstette G.
Hendrickx I.
Hendrickx I.
Hendrickx I.
Kim S. N.
Kim S. N.
Koehn P.
Koehn P.
Lin D.
Marti A. Hearst
Moldovan D.
Nakov P.
Nakov P.
Nakov P.
Nakov P.
Nakov P.
Nakov P.
Nakov P.
Nastase V.
Preslav I. Nakov
Shinyama Y.
Snover M.
Séaghdha D.
Séaghdha D.
Séaghdha D. O.
Séaghdha D. O.
Tratz S.
Warren B.
Zhang Y.
Zhou L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals

Author: D. Ó Séaghdha
I. Hendrickx
L. Romano
M. Pennacchiotti
P. Nakov
S. N. Kim
S. Padó
S. Szpakowicz
Z. Kozareva
Publication venue
Publication date: 01/01/2009
Field of study

We present a brief overview of the main challenges in the extraction of semantic relations from English text, and discuss the shortcomings of previous data sets and shared tasks. This leads us to introduce a new task, which will be part of SemEval-2010: multi-way classification of mutually exclusive semantic relations between pairs of common nominals. The task is designed to compare different approaches to the problem and to provide a standard testbed for future research, which can benefit many applications in Natural Language Processing.

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

Word sense and semantic relations in noun compounds

Author: Agirre E.
Agirre E.
Agirre E.
Banerjee S.
Butnariu C.
Girju R.
Hermjakob U.
Ide N.
Kim S. N.
Kim S. N.
Kim S. N.
Lapata M.
Mackinlay A.
Mihalcea R.
Mihalcea R.
Moldovan D.
Nakov P.
Nastase V.
Palmer M.
Prager J.
Rosario B.
Su Nam Kim
Séaghdha D.
Timothy Baldwin
Tratz S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref