Search CORE

37,581 research outputs found

A Data-Oriented Model of Literary Language

Author: Bod Rens
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2017
Field of study

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page

arXiv.org e-Print Archive

Proceedings - University of Groningen

UvA-DARE

Dissertations of the University of Groningen

Mental distress detection and triage in forum posts: the LT3 CLPsych 2016 shared task system

Author: Desmet Bart
Hoste Veronique
Jacobs Gilles
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper describes the contribution of LT3 for the CLPsych 2016 Shared Task on automatic triage of mental health forum posts. Our systems use multiclass Support Vector Machines (SVM), cascaded binary SVMs and ensembles with a rich feature set. The best systems obtain macro-averaged F-scores of 40% on the full task and 80% on the green versus alarming distinction. Multiclass SVMs with all features score best in terms of F-score, whereas feature filtering with bi-normal separation and classifier ensembling are found to improve recall of alarming posts

On-the-fly Table Generation

Author: Nguyen Thanh Tam
Sekhavat Yoones A.
Yahya Mohamed
Yin Pengcheng
Zwicklbauer Stefan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2018
Field of study

Many information needs revolve around entities, which would be better answered by summarizing results in a tabular format, rather than presenting them as a ranked list. Unlike previous work, which is limited to retrieving existing tables, we aim to answer queries by automatically compiling a table in response to a query. We introduce and address the task of on-the-fly table generation: given a query, generate a relational table that contains relevant entities (as rows) along with their key properties (as columns). This problem is decomposed into three specific subtasks: (i) core column entity ranking, (ii) schema determination, and (iii) value lookup. We employ a feature-based approach for entity ranking and schema determination, combining deep semantic features with task-specific signals. We further show that these two subtasks are not independent of each other and can assist each other in an iterative manner. For value lookup, we combine information from existing tables and a knowledge base. Using two sets of entity-oriented queries, we evaluate our approach both on the component level and on the end-to-end table generation task.Comment: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieva

arXiv.org e-Print Archive

Modeling Global Syntactic Variation in English Using Dialect Classification

Author: Dunn Jonathan
Publication venue
Publication date: 11/04/2019
Field of study

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

arXiv.org e-Print Archive