78 research outputs found
Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection
The ubiquity of metaphor in our everyday communication makes it an important problem for natural language understanding.
Yet, the majority of metaphor processing systems to date rely on hand engineered features and there is still no consensus in the field as to which features are optimal for this task. In this paper, we present the first deep learning architecture designed to capture metaphorical composition. Our results demonstrate that it outperforms the existing approaches in the metaphor identification task
Evaluation by association: A systematic study of quantitative word association evaluation
Recent work on evaluating representation learning architectures in NLP has established a need for evaluation protocols based on subconscious cognitive measures rather than manually tailored intrinsic similarity and relatedness tasks. In this work, we propose a novel evaluation framework that enables large-scale evaluation of such architectures in the free word association (WA) task, which is firmly grounded in cognitive theories of human semantic representation. This evaluation is facilitated by the existence of large manually constructed repositories of word association data. In this paper, we (1) present a detailed analysis of the new quantitative WA evaluation protocol, (2) suggest new evaluation metrics for the WA task inspired by its direct analogy with information retrieval problems, (3) evaluate various state-of-the-art representation models on this task, and (4) discuss the relationship between WA and prior evaluations of semantic representation with well-known similarity and relatedness evaluation sets. We have made the WA evaluation toolkit publicly available
Recommended from our members
Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps
Property norms have the potential to aid a wide range of semantic tasks, provided that they can be obtained for large numbers of concepts. Recent work has focused on text as the main source of information for automatic property extraction. In this paper we examine property norm prediction from visual, rather than textual, data, using cross-modal maps learnt between property norm and visual spaces. We also investigate the importance of having a complete feature norm dataset, for both training and testing. Finally, we evaluate how these datasets and cross-modal maps can be used in an image retrieval task.LB is supported by an EPSRC Doctoral Training Grant. DK is supported by EPSRC grant EP/I037512/1. SC is supported by ERC Starting Grant DisCoTex (306920) and EPSRC grant EP/I037512/1
Recommended from our members
Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics
Multi-modal distributional models learn grounded representations for improved performance in semantics. Deep visual representations, learned using convolutional neural networks, have been shown to achieve particularly high performance. In this study, we systematically compare deep visual representation learning techniques, experimenting with three well-known network architectures. In addition, we explore the various data sources that can be used for retrieving relevant images, showing that images from search engines perform as well as, or better than, those from manually crafted resources such as ImageNet. Furthermore, we explore the optimal number of images and the multi-lingual applicability of multi-modal semantics. We hope that these findings can serve as a guide for future research in the field.Anita Verõ is supported by the Nuance Foundation Grant: Learning Type-Driven Distributed Representations of Language. Stephen Clark is supported by the ERC Starting Grant: DisCoTex (306920)
Recommended from our members
Multi-Modal Representations for Improved Bilingual Lexicon Learning
Recent work has revealed the potential of using visual representations for bilingual lexicon learning (BLL). Such image-based BLL methods, however, still fall short of linguistic approaches. In this paper, we propose a simple yet effective multimodal approach that learns bilingual semantic representations that fuse linguistic and visual input. These new bilingual multi-modal embeddings display significant performance gains in the BLL task for three language pairs on two benchmarking test sets, outperforming linguistic-only BLL models using three different types of state-of-the-art bilingual word embeddings, as well as visual-only BLL models.This work is supported by ERC Consolidator Grant LEXICAL (648909) and KU Leuven Grant PDMK/14/117. SC is supported by ERC Starting Grant DisCoTex (306920)
Recommended from our members
Visually Grounded and Textual Semantic Models Differentially Decode Brain Activity Associated with Concrete and Abstract Nouns
Important advances have recently been made using computational semantic
models to decode brain activity patterns associated with concepts; however,
this work has almost exclusively focused on concrete nouns. How well these
models extend to decoding abstract nouns is largely unknown. We address this
question by applying state-of-the-art computational models to decode
functional Magnetic Resonance Imaging (fMRI) activity patterns, elicited by
participants reading and imagining a diverse set of both concrete and
abstract nouns. One of the models we use is linguistic, exploiting the
recent word2vec skipgram approach trained on Wikipedia. The second is
visually grounded, using deep convolutional neural networks trained on
Google Images. Dual coding theory considers concrete concepts to be encoded
in the brain both linguistically and visually, and abstract concepts only
linguistically. Splitting the fMRI data according to human concreteness
ratings, we indeed observe that both models significantly decode the most
concrete nouns; however, accuracy is significantly greater using the
text-based models for the most abstract nouns. More generally this confirms
that current computational models are sufficiently advanced to assist in
investigating the representational structure of abstract concepts in the
brain.Stephen Clark is supported by ERC Starting Grant DisCoTex (306920)
HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
We introduce HyperLex — a dataset and evaluation resource that quantifies the extent of of the semantic category membership, that is, type-of relation also known as hyponymy–hypernymy or lexical entailment (LE) relation between 2,616 concept pairs. Cognitive psychology research
has established that typicality and category/class membership are computed in human semantic memory as a gradual rather than binary relation. Nevertheless, most NLP research, and existing large-scale inventories of concept category membership (WordNet, DBPedia, etc.) treat category membership and LE as binary. To address this, we asked hundreds of native English speakers to indicate typicality and strength of category membership between a diverse range of concept pairs on a crowdsourcing platform. Our results confirm that category membership and LE are indeed more gradual than binary. We then compare these human judgments with the predictions of automatic systems, which reveals a huge gap between human performance and state-of-the-art LE,
distributional and representation learning models, and substantial differences between the models themselves. We discuss a pathway for improving semantic models to overcome this discrepancy, and indicate future application areas for improved graded LE systems.This work is supported by the ERC Consolidator Grant (no 648909)
Non-Compositional Term Dependence for Information Retrieval
Modelling term dependence in IR aims to identify co-occurring terms that are
too heavily dependent on each other to be treated as a bag of words, and to
adapt the indexing and ranking accordingly. Dependent terms are predominantly
identified using lexical frequency statistics, assuming that (a) if terms
co-occur often enough in some corpus, they are semantically dependent; (b) the
more often they co-occur, the more semantically dependent they are. This
assumption is not always correct: the frequency of co-occurring terms can be
separate from the strength of their semantic dependence. E.g. "red tape" might
be overall less frequent than "tape measure" in some corpus, but this does not
mean that "red"+"tape" are less dependent than "tape"+"measure". This is
especially the case for non-compositional phrases, i.e. phrases whose meaning
cannot be composed from the individual meanings of their terms (such as the
phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction
between the frequency and strength of term dependence in IR, we present a
principled approach for handling term dependence in queries, using both lexical
frequency and semantic evidence. We focus on non-compositional phrases,
extending a recent unsupervised model for their detection [21] to IR. Our
approach, integrated into ranking using Markov Random Fields [31], yields
effectiveness gains over competitive TREC baselines, showing that there is
still room for improvement in the very well-studied area of term dependence in
IR
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time
Post-Translational Loss of Renal TRPV5 Calcium Channel Expression, Ca2+ Wasting, and Bone Loss in Experimental Colitis
Dysregulated Ca2+ homeostasis likely contributes to the etiology of IBD-associated loss of bone mineral density (BMD). Experimental colitis leads to decreased expression of Klotho, a protein which supports renal Ca2+ reabsorption by stabilizing TRPV5 channel on the apical membrane of distal tubule epithelial cells
- …