6 research outputs found
Pre-train, Interact, Fine-tune: A Novel Interaction Representation for Text Classification
Text representation can aid machines in understanding text. Previous work on
text representation often focuses on the so-called forward implication, i.e.,
preceding words are taken as the context of later words for creating
representations, thus ignoring the fact that the semantics of a text segment is
a product of the mutual implication of words in the text: later words
contribute to the meaning of preceding words. We introduce the concept of
interaction and propose a two-perspective interaction representation, that
encapsulates a local and a global interaction representation. Here, a local
interaction representation is one that interacts among words with
parent-children relationships on the syntactic trees and a global interaction
interpretation is one that interacts among all the words in a sentence. We
combine the two interaction representations to develop a Hybrid Interaction
Representation (HIR).
Inspired by existing feature-based and fine-tuning-based pretrain-finetuning
approaches to language models, we integrate the advantages of feature-based and
fine-tuning-based methods to propose the Pre-train, Interact, Fine-tune (PIF)
architecture.
We evaluate our proposed models on five widely-used datasets for text
classification tasks. Our ensemble method, outperforms state-of-the-art
baselines with improvements ranging from 2.03% to 3.15% in terms of error rate.
In addition, we find that, the improvements of PIF against most
state-of-the-art methods is not affected by increasing of the length of the
text.Comment: 32 pages, 5 figure
Robust Parsing for Ungrammatical Sentences
Natural Language Processing (NLP) is a research area that specializes in studying computational approaches to human language. However, not all of the natural language sentences are grammatically correct. Sentences that are ungrammatical, awkward, or too casual/colloquial tend to appear in a variety of NLP applications, from product reviews and social media analysis to intelligent language tutors or multilingual processing. In this thesis, we focus on parsing, because it is an essential component of many NLP applications. We investigate in what ways the performances of statistical parsers degrade when dealing with ungrammatical sentences. We also hypothesize that breaking up parse trees from problematic parts prevents NLP applications from degrading due to incorrect syntactic analysis.
A parser is robust if it can overlook problems such as grammar mistakes and produce a parse tree that closely resembles the correct analysis for the intended sentence. We develop a robustness evaluation metric and conduct a series of experiments to compare the performances of state-of-the-art parsers on the ungrammatical sentences. The evaluation results show that ungrammatical sentences present challenges for statistical parsers, because the well-formed syntactic trees they produce may not be appropriate for ungrammatical sentences. We also define a new framework for reviewing the parses of ungrammatical sentences and extracting the coherent parts whose syntactic analyses make sense. We call this task parse tree fragmentation. The experimental results suggest that the proposed overall fragmentation framework is a promising way to handle syntactically unusual sentences