28 research outputs found
Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
This paper proposes a novel approach for relation extraction from free text
which is trained to jointly use information from the text and from existing
knowledge. Our model is based on two scoring functions that operate by learning
low-dimensional embeddings of words and of entities and relationships from a
knowledge base. We empirically show on New York Times articles aligned with
Freebase relations that our approach is able to efficiently use the extra
information provided by a large subset of Freebase data (4M entities, 23k
relationships) to improve over existing methods that rely on text features
alone
A Generative Model of Words and Relationships from Multiple Sources
Neural language models are a powerful tool to embed words into semantic
vector spaces. However, learning such models generally relies on the
availability of abundant and diverse training examples. In highly specialised
domains this requirement may not be met due to difficulties in obtaining a
large corpus, or the limited range of expression in average use. Such domains
may encode prior knowledge about entities in a knowledge base or ontology. We
propose a generative model which integrates evidence from diverse data sources,
enabling the sharing of semantic information. We achieve this by generalising
the concept of co-occurrence from distributional semantics to include other
relationships between entities or words, which we model as affine
transformations on the embedding space. We demonstrate the effectiveness of
this approach by outperforming recent models on a link prediction task and
demonstrating its ability to profit from partially or fully unobserved data
training labels. We further demonstrate the usefulness of learning from
different data sources with overlapping vocabularies.Comment: 8 pages, 5 figures; incorporated feedback from reviewers; to appear
in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
201
StarSpace: Embed All The Things!
We present StarSpace, a general-purpose neural embedding model that can solve
a wide variety of problems: labeling tasks such as text classification, ranking
tasks such as information retrieval/web search, collaborative filtering-based
or content-based recommendation, embedding of multi-relational graphs, and
learning word, sentence or document level embeddings. In each case the model
works by embedding those entities comprised of discrete features and comparing
them against each other -- learning similarities dependent on the task.
Empirical results on a number of tasks show that StarSpace is highly
competitive with existing methods, whilst also being generally applicable to
new cases where those methods are not
NASARI: a novel approach to a Semantically-Aware Representation of items
The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/
Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of
current research. Combining such an embedding model with logic rules has
recently attracted increasing attention. Most previous attempts made a one-time
injection of logic rules, ignoring the interactive nature between embedding
learning and logical inference. And they focused only on hard rules, which
always hold with no exception and usually require extensive manual effort to
create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a
novel paradigm of KG embedding with iterative guidance from soft rules. RUGE
enables an embedding model to learn simultaneously from 1) labeled triples that
have been directly observed in a given KG, 2) unlabeled triples whose labels
are going to be predicted iteratively, and 3) soft rules with various
confidence levels extracted automatically from the KG. In the learning process,
RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and
integrates such newly labeled triples to update the embedding model. Through
this iterative procedure, knowledge embodied in logic rules may be better
transferred into the learned embeddings. We evaluate RUGE in link prediction on
Freebase and YAGO. Experimental results show that: 1) with rule knowledge
injected iteratively, RUGE achieves significant and consistent improvements
over state-of-the-art baselines; and 2) despite their uncertainties,
automatically extracted soft rules are highly beneficial to KG embedding, even
those with moderate confidence levels. The code and data used for this paper
can be obtained from https://github.com/iieir-km/RUGE.Comment: To appear in AAAI 201
Exploring Embeddings for Measuring Text Relatedness: Unveiling Sentiments and Relationships in Online Comments
After the COVID-19 pandemic caused internet usage to grow by 70%, there has
been an increased number of people all across the world using social media.
Applications like Twitter, Meta Threads, YouTube, and Reddit have become
increasingly pervasive, leaving almost no digital space where public opinion is
not expressed. This paper investigates sentiment and semantic relationships
among comments across various social media platforms, as well as discusses the
importance of shared opinions across these different media platforms, using
word embeddings to analyze components in sentences and documents. It allows
researchers, politicians, and business representatives to trace a path of
shared sentiment among users across the world. This research paper presents
multiple approaches that measure the relatedness of text extracted from user
comments on these popular online platforms. By leveraging embeddings, which
capture semantic relationships between words and help analyze sentiments across
the web, we can uncover connections regarding public opinion as a whole. The
study utilizes pre-existing datasets from YouTube, Reddit, Twitter, and more.
We made use of popular natural language processing models like Bidirectional
Encoder Representations from Transformers (BERT) to analyze sentiments and
explore relationships between comment embeddings. Additionally, we aim to
utilize clustering and Kl-divergence to find semantic relationships within
these comment embeddings across various social media platforms. Our analysis
will enable a deeper understanding of the interconnectedness of online comments
and will investigate the notion of the internet functioning as a large
interconnected brain.Comment: 6 pages, 5 figures, 3 tables, accepted to the Second International
Conference on Informatics (ICI-2023
Automatic Synonym Discovery with Knowledge Bases
Recognizing entity synonyms from text has become a crucial task in many
entity-leveraging applications. However, discovering entity synonyms from
domain-specific text corpora (e.g., news articles, scientific papers) is rather
challenging. Current systems take an entity name string as input to find out
other names that are synonymous, ignoring the fact that often times a name
string can refer to multiple entities (e.g., "apple" could refer to both Apple
Inc and the fruit apple). Moreover, most existing methods require training data
manually created by domain experts to construct supervised-learning systems. In
this paper, we study the problem of automatic synonym discovery with knowledge
bases, that is, identifying synonyms for knowledge base entities in a given
domain-specific corpus. The manually-curated synonyms for each entity stored in
a knowledge base not only form a set of name strings to disambiguate the
meaning for each other, but also can serve as "distant" supervision to help
determine important features for the task. We propose a novel framework, called
DPE, to integrate two kinds of mutually-complementing signals for synonym
discovery, i.e., distributional features based on corpus-level statistics and
textual patterns based on local contexts. In particular, DPE jointly optimizes
the two kinds of signals in conjunction with distant supervision, so that they
can mutually enhance each other in the training stage. At the inference stage,
both signals will be utilized to discover synonyms for the given entities.
Experimental results prove the effectiveness of the proposed framework