Search CORE

4 research outputs found

An Empirical Comparison of Parsing Methods for Stanford Dependencies

Author: Lingpeng Kong (5361833)
Noah A. Smith (663492)
Publication venue
Publication date: 29/06/2018
Field of study

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems. In light of the evolving definition of the Stanford dependencies and developments in statistical dependency parsing algorithms, this paper revisits the question of Cer et al. (2010): what is the tradeoff between accuracy and speed in obtaining Stanford dependencies in particular? We also explore the effects of input representations on this tradeoff: part-of-speech tags, the novel use of an alternative dependency representation as input, and distributional representaions of words. We find that direct dependency parsing is a more viable solution than it was found to be in the past. An accompanying software release can be found at: http://www.ark.cs.cmu.edu/TBSD</p

FigShare

Transforming Dependencies into Phrase Structures

Author: Alexander M. Rush (5362793)
Lingpeng Kong (5361833)
Noah A. Smith (663492)
Publication venue
Publication date: 29/06/2018
Field of study

We present a new algorithm for transforming dependency parse trees into phrase-structure parse trees. We cast the problem as structured prediction and learn a statistical model. Our algorithm is faster than traditional phrasestructure parsing and achieves 90.4% English parsing accuracy and 82.4% Chinese parsing accuracy, near to the state of the art on both benchmarks.</p

FigShare

Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach

Author: Kathryn Mazaitis (3888793)
Lingpeng Kong (5361833)
William W. Cohen (5359088)
William Yang Wang (5358836)
Publication venue
Publication date: 29/06/2018
Field of study

Dependency parsing is a core task in NLP, and it is widely used by many applications such as information extraction, question answering, and machine translation. In the era of social media, a big challenge is that parsers trained on traditional newswire corpora typically suffer from the domain mismatch issue, and thus perform poorly on social media data. We present a new GFL/FUDG-annotated Chinese treebank with more than 18K tokens from Sina Weibo (the Chinese equivalent of Twitter). We formulate the dependency parsing problem as many small and parallelizable arc prediction tasks: for each task, we use a programmable probabilistic firstorder logic to infer the dependency arc of a token in the sentence. In experiments, we show that the proposed model outperforms an off-the-shelf Stanford Chinese parser, as well as a strong MaltParser baseline that is trained on the same in-domain data.</p

FigShare

A Dependency Parser for Tweets

Author: Archna Bhatia (5361830)
Chris Dyer (5361836)
Lingpeng Kong (5361833)
Nathan Schneider (5361824)
Noah A. Smith (663492)
Swabha Swayamdipta (5361827)
Publication venue
Publication date: 29/06/2018
Field of study

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions. Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.</p

FigShare