17 research outputs found
Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Path
Relation classification is an important research arena in the field of
natural language processing (NLP). In this paper, we present SDP-LSTM, a novel
neural network to classify the relation of two entities in a sentence. Our
neural architecture leverages the shortest dependency path (SDP) between two
entities; multichannel recurrent neural networks, with long short term memory
(LSTM) units, pick up heterogeneous information along the SDP. Our proposed
model has several distinct features: (1) The shortest dependency paths retain
most relevant information (to relation classification), while eliminating
irrelevant words in the sentence. (2) The multichannel LSTM networks allow
effective information integration from heterogeneous sources over the
dependency paths. (3) A customized dropout strategy regularizes the neural
network to alleviate overfitting. We test our model on the SemEval 2010
relation classification task, and achieve an -score of 83.7\%, higher than
competing methods in the literature.Comment: EMNLP '1
Multi-lingual Opinion Mining on YouTube
In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available
Relation Adversarial Network for Low Resource Knowledge Graph Completion
Knowledge Graph Completion (KGC) has been proposed to improve Knowledge
Graphs by filling in missing connections via link prediction or relation
extraction. One of the main difficulties for KGC is a low resource problem.
Previous approaches assume sufficient training triples to learn versatile
vectors for entities and relations, or a satisfactory number of labeled
sentences to train a competent relation extraction model. However, low resource
relations are very common in KGs, and those newly added relations often do not
have many known samples for training. In this work, we aim at predicting new
facts under a challenging setting where only limited training instances are
available. We propose a general framework called Weighted Relation Adversarial
Network, which utilizes an adversarial procedure to help adapt
knowledge/features learned from high resource relations to different but
related low resource relations. Specifically, the framework takes advantage of
a relation discriminator to distinguish between samples from different
relations, and help learn relation-invariant features more transferable from
source relations to target relations. Experimental results show that the
proposed approach outperforms previous methods regarding low resource settings
for both link prediction and relation extraction.Comment: WWW202
BEKG: A Built Environment Knowledge Graph
Practices in the built environment have become more digitalized with the
rapid development of modern design and construction technologies. However, the
requirement of practitioners or scholars to gather complicated professional
knowledge in the built environment has not been satisfied yet. In this paper,
more than 80,000 paper abstracts in the built environment field were obtained
to build a knowledge graph, a knowledge base storing entities and their
connective relations in a graph-structured data model. To ensure the retrieval
accuracy of the entities and relations in the knowledge graph, two
well-annotated datasets have been created, containing 2,000 instances and 1,450
instances each in 29 relations for the named entity recognition task and
relation extraction task respectively. These two tasks were solved by two
BERT-based models trained on the proposed dataset. Both models attained an
accuracy above 85% on these two tasks. More than 200,000 high-quality relations
and entities were obtained using these models to extract all abstract data.
Finally, this knowledge graph is presented as a self-developed visualization
system to reveal relations between various entities in the domain. Both the
source code and the annotated dataset can be found here:
https://github.com/HKUST-KnowComp/BEKG