Search CORE

1,059 research outputs found

Listening between the Lines: Learning Personal Attributes from Conversations

Author: Mirza Paramita
Tigunova Anna
Weikum Gerhard
Yates Andrew
Publication venue
Publication date: 01/01/2019
Field of study

Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.Comment: published in WWW'1

arXiv.org e-Print Archive

MPG.PuRe

Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems

Author: Florea Malina
Freitas Andre
Landers Donal
Wysocki Oskar
Publication venue
Publication date: 16/07/2021
Field of study

This paper proposes a novel statistical corpus analysis framework targeted towards the interpretation of Natural Language Processing (NLP) architectural patterns at scale. The proposed approach combines saturation-based lexicon construction, statistical corpus analysis methods and graph collocations to induce a synthesis representation of NLP architectural patterns from corpora. The framework is validated in the full corpus of Semeval tasks and demonstrated coherent architectural patterns which can be used to answer architectural questions on a data-driven fashion, providing a systematic mechanism to interpret a largely dynamic and exponentially growing field.Comment: 20 pages, 6 figures, 9 supplementary figures, Lexicon.txt in the appendi

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016

Author: Bayot Roy
Gonçalves Teresa
Publication venue: CEUR
Publication date: 01/09/2016
Field of study

In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.Erasmus Mundus EMMA-WEST projec

Repositório Científico da Universidade de Évora

Improved Neural Relation Detection for Knowledge Base Question Answering

Author: Hasan Kazi Saidul
Santos Cicero dos
Xiang Bing
Yin Wenpeng
Yu Mo
Zhou Bowen
Publication venue
Publication date: 01/01/2017
Field of study

Relation detection is a core component for many NLP applications including Knowledge Base Question Answering (KBQA). In this paper, we propose a hierarchical recurrent neural network enhanced by residual learning that detects KB relations given an input question. Our method uses deep residual bidirectional LSTMs to compare questions and relation names via different hierarchies of abstraction. Additionally, we propose a simple KBQA system that integrates entity linking and our proposed relation detector to enable one enhance another. Experimental results evidence that our approach achieves not only outstanding relation detection performance, but more importantly, it helps our KBQA system to achieve state-of-the-art accuracy for both single-relation (SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.Comment: Accepted by ACL 2017 (updated for camera-ready

arXiv.org e-Print Archive

Crossref

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

Author: Ali Ahmed
Glass James
Grondelaers Stefan
Jain Mayank
Kumar Ritesh
Lahiri Bornini
Ljubešić Nikola
Malmasi Shervin
Nakov Preslav
Oostdijk Nelleke
Samardžić Tanja
Scherrer Yves
Shon Suwon
Speelman Dirk
Tiedemann Jörg
van den Bosch Antal
van der Lee Chris
Zampieri Marcos
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.Non peer reviewe

Radboud Repository

Helsingin yliopiston digitaalinen arkisto

Tilburg University Repository