Search CORE

304 research outputs found

Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification

Author: Foster Jennifer
He Yifan
Liu Qun
Shouxun Lin
Tu Zhaopeng
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/07/2012
Field of study

Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus

Irish Universities

DCU Online Research Access Service

Structured lexical similarity via convolution Kernels on dependency trees

Author: Basili R
Croce D
Moschitti A
Publication venue: Association for computational linguistics
Publication date: 01/01/2011
Field of study

A central topic in natural language process-ing is the design of lexical and syntactic fea-tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical simi-larities. We define efficient and powerful ker-nels for measuring the similarity between de-pendency structures, whose surface forms of the lexical nodes are in part or completely dif-ferent. The experiments with such kernels for question classification show an unprecedented results, e.g. 41 % of error reduction of the for-mer state-of-the-art. Additionally, semantic role classification confirms the benefit of se-mantic smoothing for dependency kernels.

CiteSeerX

ART

Tree similarity measurement for classifying questions by syntactic structures

Author: A Moschitti
B Croft
C Elzinga
D Croce
J Shawe-Taylor
K Zhang
M Mittendorfer
Z Lin
Z Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/07/2016
Field of study

Queen's University Belfast Research Portal

Crossref

Ulster University's Research Portal

Convolution Kernels for Subjectivity Detection

Author: Klakow Dietrich
Wiegand Michael
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 254-261. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Root-Weighted Tree Automata and their Applications to Tree Kernels

Author: Mignot Ludovic
Ouali-Sebti Nadia
Ziadi Djelloul
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we define a new kind of weighted tree automata where the weights are only supported by final states. We show that these automata are sequentializable and we study their closures under classical regular and algebraic operations. We then use these automata to compute the subtree kernel of two finite tree languages in an efficient way. Finally, we present some perspectives involving the root-weighted tree automata

arXiv.org e-Print Archive

HAL - Normandie Université

A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Survey on Kernel-Based Relation Extraction

Author: Choi Sung-Pil
Jung Hanmin
Lee Seungwoo
Song Sa-Kwang
Publication venue: 'IntechOpen'
Publication date: 21/11/2012
Field of study

IntechOpen

Linguistic feature analysis for protein interaction extraction

Author: A Airola
A Moschitti
A Yakushiji
B Schölkopf
C Cortes
C Giuliano
C Nedellec
CC Chang
Chris Cornelis
D Haussler
H Lodhi
J Ding
J Xiao
JH Eom
K Fundel
M Collins
Martine De Cock
MF Porter
R Bunescu
R Saetre
RC Bunescu
S Katrenko
S Kim
S Pyysalo
S Pyysalo
S Van Landeghem
T Fayruzov
T Fayruzov
Timur Fayruzov
Veronique Hoste
Y Saeys
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels. Results Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared. Conclusion Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central