93 research outputs found
Exploring Semantic Textual Similarity
[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself
Exploiting domain information for Word Sense Disambiguation of medical documents
OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN: The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. MEASUREMENTS: A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. RESULTS AND DISCUSSION: The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency
Analyzing the Limitations of Cross-lingual Word Embedding Mappings
Recent research in cross-lingual word embeddings has almost exclusively
focused on offline methods, which independently train word embeddings in
different languages and map them to a shared space through linear
transformations. While several authors have questioned the underlying
isomorphism assumption, which states that word embeddings in different
languages have approximately the same structure, it is not clear whether this
is an inherent limitation of mapping approaches or a more general issue when
learning cross-lingual embeddings. So as to answer this question, we experiment
with parallel corpora, which allows us to compare offline mapping to an
extension of skip-gram that jointly learns both embedding spaces. We observe
that, under these ideal conditions, joint learning yields to more isomorphic
embeddings, is less sensitive to hubness, and obtains stronger results in
bilingual lexicon induction. We thus conclude that current mapping methods do
have strong limitations, calling for further research to jointly learn
cross-lingual embeddings with a weaker cross-lingual signal.Comment: ACL 201
Semantic Services in FreeLing 2.1: WordNet and UKB
FreeLing is an open-source open-source multilingual
language processing library providing a wide
range of language analyzers for several languages.
It offers text processing and language annotation facilities
to natural language processing application
developers, simplifying the task of building those
applications. FreeLing is customizable and extensible.
Developers can use the default linguistic resources
(dictionaries, lexicons, grammars, etc.) directly,
or extend them, adapt them to specific domains,
or even develop new ones for specific languages.
This paper presents the semantic services included
in FreeLing, which are based on WordNet and EuroWordNet
databases. The recent release of the
UKB program under a GPL license made it possible
to integrate a long awaited word sense disambiguation
module into FreeLing. UKB provides state of
the art all-words sense disambiguation for any language
with an available WordNet.Postprint (published version
Improving search over Electronic Health Records using UMLS-based query expansion through random walks
ObjectiveMost of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs.Materials and methodsThe method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets.ResultsOur experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline.DiscussionOur analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms.ConclusionsExpansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts
Towards zero-shot cross-lingual named entity disambiguation
[EN]In cross-Lingual Named Entity Disambiguation (XNED) the task is to link Named Entity mentions in text in some native language to English entities in a knowledge graph. XNED systems usually require training data for each native language, limiting their application for low resource languages with small amounts of training data. Prior work have proposed so-called zero-shot transfer systems which are only trained in English training data, but required native prior probabilities of entities with respect to mentions, which had to be estimated from native training examples, limiting their practical interest. In this work we present a zero-shot XNED architecture where, instead of a single disambiguation model, we have a model for each possible mention string, thus eliminating the need for native prior probabilities. Our system improves over prior work in XNED datasets in Spanish and Chinese by 32 and 27 points, and matches the systems which do require native prior information. We experiment with different multilingual transfer strategies, showing that better results are obtained with a purpose-built multilingual pre-training method compared to state-of-the-art generic multilingual models such as XLM-R. We also discovered, surprisingly, that English is not necessarily the most effective zero-shot training language for XNED into English. For instance, Spanish is more effective when training a zero-shot XNED system that dis-ambiguates Basque mentions with respect to an English knowledge graph.This work has been partially funded by the Basque Government (IXA excellence research group (IT1343-19) and DeepText project), Project BigKnowledge (Ayudas Fundacion BBVA a equipos de investigacion cientifica 2018) and via the IARPA BETTER Program contract 2019-19051600006 (ODNI, IARPA activity). Ander Barrena enjoys a post-doctoral grant ESPDOC18/101 from the UPV/EHU and also acknowledges the support of the NVIDIA Corporation with the donation of a Titan V GPU used for this research. The author thankfully acknowledges the computer resources at CTE-Power9 + V100 and technical support provided by Barcelona Supercomputing Center (RES-IM-2020-1-0020)
Youth interaction with television and online video content in the digital age
This article examines the relationship of university students with television and online video content. Convergence processes in many areas during the digital age have significantly changed both audiovisual content consumption patterns and the content on offer itself. In addition, Web 2.0 has made it possible for interaction to go beyond mere consumption. The purpose of this research study was to ascertain what kind of interaction takes place between young people and audiovisual content. The categories analyzed are watch, share and create, with a focus on students’ everyday life. A mixed-method approach was used across a sample of 475 students from Mondragon University. Our main finding is that, although young people have the resources necessary to interact with media, this condition is not sufficient to favor behaviors that are more active. Young people show different practices and attitudes depending on the individual, the content, and the context but, in general, the interactive patterns that they have with television and online video content have more links with the mass communication paradigm than with the new communicative paradigm that arose in the Web 2.0 era
- …