37,581 research outputs found
A Data-Oriented Model of Literary Language
We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page
Mental distress detection and triage in forum posts: the LT3 CLPsych 2016 shared task system
This paper describes the contribution of LT3 for the CLPsych 2016 Shared Task on automatic triage of mental health forum posts. Our systems use multiclass Support Vector Machines (SVM), cascaded binary SVMs and ensembles with a rich feature set. The best systems obtain macro-averaged F-scores of 40% on the full task and 80% on the green versus alarming distinction. Multiclass SVMs with all features score best in terms of F-score, whereas feature filtering with bi-normal separation and classifier ensembling are found to improve recall of alarming posts
On-the-fly Table Generation
Many information needs revolve around entities, which would be better
answered by summarizing results in a tabular format, rather than presenting
them as a ranked list. Unlike previous work, which is limited to retrieving
existing tables, we aim to answer queries by automatically compiling a table in
response to a query. We introduce and address the task of on-the-fly table
generation: given a query, generate a relational table that contains relevant
entities (as rows) along with their key properties (as columns). This problem
is decomposed into three specific subtasks: (i) core column entity ranking,
(ii) schema determination, and (iii) value lookup. We employ a feature-based
approach for entity ranking and schema determination, combining deep semantic
features with task-specific signals. We further show that these two subtasks
are not independent of each other and can assist each other in an iterative
manner. For value lookup, we combine information from existing tables and a
knowledge base. Using two sets of entity-oriented queries, we evaluate our
approach both on the component level and on the end-to-end table generation
task.Comment: The 41st International ACM SIGIR Conference on Research and
Development in Information Retrieva
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
- …