696 research outputs found
Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification
Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus
Convolution Kernels for Subjectivity Detection
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 254-261.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Root-Weighted Tree Automata and their Applications to Tree Kernels
In this paper, we define a new kind of weighted tree automata where the
weights are only supported by final states. We show that these automata are
sequentializable and we study their closures under classical regular and
algebraic operations. We then use these automata to compute the subtree kernel
of two finite tree languages in an efficient way. Finally, we present some
perspectives involving the root-weighted tree automata
Handling Tree-Structured Values in RapidMiner
Attribute value types play an important role in mostly every datamin-
ing task. Most learners, for instance, are restricted to particular value
types. The usage of such learners is just possible after special forms of
preprocessing. RapidMiner most commonly distinguishes between nom-
inal and numerical values which are well-known to every RapidMiner-
user. Although, covering a great fraction of attribute types being present
in nowadays datamining tasks, nominal and numerical attribute values
are not sufficient for every type of feature. In this work we are focusing
on attribute values containing a tree-structure. We are presenting the
handling and especially the possibilities to use tree-structured data for
modelling. Additionally, we are introducing particular tasks which are
offering tree-structured data and might benefit from using those struc-
tures for modelling. All methods presented in this paper are contained
in the Information Extraction Plugin1 for RapidMiner
Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data
In practical scenario, relation extraction needs to first identify entity
pairs that have relation and then assign a correct relation class. However, the
number of non-relation entity pairs in context (negative instances) usually far
exceeds the others (positive instances), which negatively affects a model's
performance. To mitigate this problem, we propose a multi-task architecture
which jointly trains a model to perform relation identification with
cross-entropy loss and relation classification with ranking loss. Meanwhile, we
observe that a sentence may have multiple entities and relation mentions, and
the patterns in which the entities appear in a sentence may contain useful
semantic information that can be utilized to distinguish between positive and
negative instances. Thus we further incorporate the embeddings of
character-wise/word-wise BIO tag from the named entity recognition task into
character/word embeddings to enrich the input representation. Experiment
results show that our proposed approach can significantly improve the
performance of a baseline model with more than 10% absolute increase in
F1-score, and outperform the state-of-the-art models on ACE 2005 Chinese and
English corpus. Moreover, BIO tag embeddings are particularly effective and can
be used to improve other models as well
- …