15 research outputs found

    Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification

    Get PDF
    Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus

    Distributional lexical semantics: toward uniform representation paradigms for advanced acquisition and processing tasks

    Get PDF
    The distributional hypothesis states that words with similar distributional properties have similar semantic properties (Harris 1968). This perspective on word semantics, was early discussed in linguistics (Firth 1957; Harris 1968), and then successfully applied to Information Retrieval (Salton, Wong and Yang 1975). In Information Retrieval, distributional notions (e.g. document frequency and word co-occurrence counts) have proved a key factor of success, as opposed to early logic-based approaches to relevance modeling (van Rijsbergen 1986; Chiaramella and Chevallet 1992; van Rijsbergen and Lalmas 1996).</jats:p

    Cross-language frame semantics transfer in bilingual corpora

    Get PDF
    Abstract. Recent work on the transfer of semantic information across languages has been recently applied to the development of resources annotated with Frame information for different non-English European languages. These works are based on the assumption that parallel corpora annotated for English can be used to transfer the semantic information to the other target languages. In this paper, a robust method based on a statistical machine translation step augmented with simple rule-based post-processing is presented. It alleviates problems related to preprocessing errors and the complex optimization required by syntax-dependent models of the cross-lingual mapping. Different alignment strategies are here in-vestigated against the Europarl corpus. Results suggest that the quality of the de-rived annotations is surprisingly good and well suited for training semantic role labeling systems.

    Because Syntax does Matter: Improving Predicate-Argument Structures Parsing Using Syntactic Features

    Get PDF
    International audienceParsing full-fledged predicate-argument structures in a deep syntax framework requires graphs to be predicted. Using the DeepBank (Flickinger et al., 2012) and the Predicate-Argument Structure treebank (Miyao and Tsujii, 2005) as a test field, we show how transition-based parsers, extended to handle connected graphs, benefit from the use of topologically different syntactic features such as dependencies, tree fragments, spines or syntactic paths, bringing a much needed context to the parsing models, improving notably over long distance dependencies and elided coordinate structures. By confirming this positive impact on an accurate 2nd-order graph-based parser (Martins and Almeida, 2014), we establish a new state-of-the-art on these data sets

    Structured lexical similarity via convolution Kernels on dependency trees

    Get PDF
    A central topic in natural language process-ing is the design of lexical and syntactic fea-tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical simi-larities. We define efficient and powerful ker-nels for measuring the similarity between de-pendency structures, whose surface forms of the lexical nodes are in part or completely dif-ferent. The experiments with such kernels for question classification show an unprecedented results, e.g. 41 % of error reduction of the for-mer state-of-the-art. Additionally, semantic role classification confirms the benefit of se-mantic smoothing for dependency kernels.

    Enhanced discriminative models with tree kernels and unsupervised training for entity detection

    Get PDF
    International audienceThis work explores two approaches to improve the discriminative models that are commonly used nowadays for entity detection: tree-kernels and unsupervised training. Feature-rich classifiers have been widely adopted by the Natural Language processing (NLP) community because of their powerful modeling capacity and their support for correlated features, which allow separating the expert task of designing features from the core learning method. The first proposed approach consists in leveraging the fast and efficient linear models with unsupervised training, thanks to a recently proposed approximation of the classifier risk, an appealing method that provably converges towards the minimum risk without any labeled corpus. In the second proposed approach, tree kernels are used with support vector machines to exploit dependency structures for entity detection , which relieve designers from the burden of carefully design rich syntactic features manually. We study both approaches on the same task and corpus and show that they offer interesting alternatives to supervised learning for entity recognition

    Discriminative Reranking for Spoken Language Understanding

    Full text link

    Kernel engineering on parse trees

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore