Search CORE

85 research outputs found

Fast Rhetorical Structure Theory Discourse Parsing

Author: Heilman Michael
Sagae Kenji
Publication venue
Publication date: 01/01/2015
Field of study

In recent years, There has been a variety of research on discourse parsing, particularly RST discourse parsing. Most of the recent work on RST parsing has focused on implementing new types of features or learning algorithms in order to improve accuracy, with relatively little focus on efficiency, robustness, or practical use. Also, most implementations are not widely available. Here, we describe an RST segmentation and parsing system that adapts models and feature sets from various previous work, as described below. Its accuracy is near state-of-the-art, and it was developed to be fast, robust, and practical. For example, it can process short documents such as news articles or essays in less than a second

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Dependency Length Minimization and Lexical Frequency in Prepositional Phrase Ordering in English

Author: Liu Zoey
Sagae Kenji
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

Previous research has shown cross-linguistically that the human language parser prefers constituent orders that minimize the distance between syntactic heads and their dependents, but the interaction between dependency length minimization (DLM) and other factors governing linear word ordering is still unknown. We examine the effects of DLM, lexical frequency, and the traditional rule of Manner before Place before Time (MPT) in ordering of prepositional phrase (PP) adjuncts in English using corpora in different language genres annotated with syntactic structure. While MPT and DLM were consistently predictive of PP ordering in our analysis, lexical frequency information was sensitive to language genre

ScholarWorks@UMass Amherst

Incremental interpretation and prediction of utterance meaning for interactive dialogue

Author: DeVault David
Sagae Kenji
Traum David
Publication venue: University of Illinois at Chicago Library
Publication date: 03/05/2011
Field of study

                                                                                                                We present techniques for the incremental interpretation and prediction of utterance meaning in dialogue systems. These techniques open possibilities for systems to initiate responsive overlap behaviors during user speech, such as interrupting, acknowledging, or completing a user's utterance while it is still in progress. In an implemented system, we show that relatively high accuracy can be achieved in understanding of spontaneous utterances before utterances are completed. Further, we present a method for determining when a system has reached a point of maximal understanding of an ongoing user utterance, and show that this determination can be made with high precision. Finally, we discuss a prototype implementation that shows how systems can use these abilities to strategically initiate system completions of user utterances. More broadly, this framework facilitates the implementation of a range of overlap behaviors that are common in human dialogue, but have been largely absent in dialogue systems

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Evaluating contributions of natural language parsers to protein–protein interaction extraction

Author: Matsuzaki Takuya
Miyao Yusuke
Sagae Kenji
Sætre Rune
Tsujii Jun'ichi
Publication venue: Oxford University Press
Publication date: 01/02/2009
Field of study

Motivation: While text mining technologies for biomedical research have gained popularity as a way to take advantage of the explosive growth of information in text form in biomedical papers, selecting appropriate natural language processing (NLP) tools is still difficult for researchers who are not familiar with recent advances in NLP. This article provides a comparative evaluation of several state-of-the-art natural language parsers, focusing on the task of extracting protein–protein interaction (PPI) from biomedical papers. We measure how each parser, and its output representation, contributes to accuracy improvement when the parser is used as a component in a PPI system

PubMed Central

eScholarship - University of California

Towards Understanding What Code Language Models Learned

Author: Ahmed Toufique
Devanbu Prem
Huang Chengxuan
Sagae Kenji
Wang Cathy
Yu Dian
Publication venue
Publication date: 27/02/2024
Field of study

Pre-trained language models are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which language models can learn some form of meaning, we investigate their ability to capture semantics of code beyond superficial frequency and co-occurrence. In contrast to previous research on probing models for linguistic features, we study pre-trained models in a setting that allows for objective and straightforward evaluation of a model's ability to learn semantics. In this paper, we examine whether such models capture the semantics of code, which is precisely and formally defined. Through experiments involving the manipulation of code fragments, we show that code pre-trained models of code learn a robust representation of the computational semantics of code that goes beyond superficial features of form alon

arXiv.org e-Print Archive

Tracking the Evolution of Written Language Competence in L2 Spanish Learners

Author: Alessio Miaschi
Claudia Sánchez-Gutiérrez
Dominique Brunato
Felice Dell'Orletta
Giulia Venturi
Kenji Sagae
Sam Davidson
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students' written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflects the explicit instruction that students receive during each course

Crossref

Open Access Repository