1,059 research outputs found
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems
This paper proposes a novel statistical corpus analysis framework targeted
towards the interpretation of Natural Language Processing (NLP) architectural
patterns at scale. The proposed approach combines saturation-based lexicon
construction, statistical corpus analysis methods and graph collocations to
induce a synthesis representation of NLP architectural patterns from corpora.
The framework is validated in the full corpus of Semeval tasks and demonstrated
coherent architectural patterns which can be used to answer architectural
questions on a data-driven fashion, providing a systematic mechanism to
interpret a largely dynamic and exponentially growing field.Comment: 20 pages, 6 figures, 9 supplementary figures, Lexicon.txt in the
appendi
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.Erasmus Mundus EMMA-WEST projec
Improved Neural Relation Detection for Knowledge Base Question Answering
Relation detection is a core component for many NLP applications including
Knowledge Base Question Answering (KBQA). In this paper, we propose a
hierarchical recurrent neural network enhanced by residual learning that
detects KB relations given an input question. Our method uses deep residual
bidirectional LSTMs to compare questions and relation names via different
hierarchies of abstraction. Additionally, we propose a simple KBQA system that
integrates entity linking and our proposed relation detector to enable one
enhance another. Experimental results evidence that our approach achieves not
only outstanding relation detection performance, but more importantly, it helps
our KBQA system to achieve state-of-the-art accuracy for both single-relation
(SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.Comment: Accepted by ACL 2017 (updated for camera-ready
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.Non peer reviewe
- …