16 research outputs found
Natural Language Processing: Emerging Neural Approaches and Applications
This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution
Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding
Recommended from our members
Data-Driven Solutions to Bottlenecks in Natural Language Generation
Concept-to-text generation suffers from what can be called generation bottlenecks - aspects of the generated text which should change for different subject domains, and which are usually hard to obtain or require manual work. Some examples are domain-specific content, a type system, a dictionary, discourse style and lexical style. These bottlenecks have stifled attempts to create generation systems that are generic, or at least apply to a wide range of domains in non-trivial applications.
This thesis is comprised of two parts. In the first, we propose data-driven solutions that automate obtaining the information and models required to solve some of these bottlenecks. Specifically, we present an approach to mining domain-specific paraphrasal templates from a simple text corpus; an approach to extracting a domain-specific taxonomic thesaurus from Wikipedia; and a novel document planning model which determines both ordering and discourse relations, and which can be extracted from a domain corpus. We evaluate each solution individually and independently from its ultimate use in generation, and show significant improvements in each.
In the second part of the thesis, we describe a framework for creating generation systems that rely on these solutions, as well as on hybrid concept-to-text and text-to-text generation, and which can be automatically adapted to any domain using only a domain-specific corpus. We illustrate the breadth of applications that this framework applies to with three examples: biography generation and company description generation, which we use to evaluate the framework itself and the contribution of our solutions; and justification of machine learning predictions, a novel application which we evaluate in a task-based study to show its importance to users
Extracting personal information from conversations
Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications. Background facts about users can allow chatbot assistants to produce more topical and empathic replies. In the context of recommendation and retrieval models, personal facts can be used to customize the ranking results for individual users. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge. Such knowledge bases are easily interpretable and can provide users with full control over their own personal knowledge, including revising stored facts and managing access by downstream services for personalization purposes. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Mainstream extraction methods specialize on well-structured data, such as biographical texts or encyclopedic articles, which are rare for most people. In turn, conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts. In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakersâ personal attributes: âą Demographic attributes, age, gender, profession and family status, are inferred by HAMs - hierarchical neural classifiers with attention mechanism. Trained HAMs can be transferred between different types of conversational data and provide interpretable predictions. âą Long-tailed personal attributes, hobby and profession, are predicted with CHARM - a zero-shot learning model, overcoming the lack of labeled training samples for rare attribute values. By linking conversational utterances to external sources, CHARM is able to predict attribute values which it never saw during training. âą Interpersonal relationships are inferred with PRIDE - a hierarchical transformer-based model. To accurately predict fine-grained relationships, PRIDE leverages personal traits of the speakers and the style of conversational utterances. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.Personengebundene Fakten sind eine vielseitig nutzbare Quelle fĂŒr die verschiedensten Anwendungen. Hintergrundfakten ĂŒber Nutzer können es Chatbot-Assistenten ermöglichen, relevantere und persönlichere Antworten zu geben. Im Kontext von Empfehlungs- und Retrievalmodellen können personengebundene Fakten dazu verwendet werden, die Ranking-Ergebnisse fĂŒr Nutzer individuell anzupassen. Eine Personengebundene Wissensdatenbank, gefĂŒllt mit persönlichen Daten wie demografischen Angaben, Interessen und Beziehungen, kann eine universelle Schnittstelle fĂŒr die Speicherung und Abfrage solcher Fakten sein. Wissensdatenbanken sind leicht zu interpretieren und bieten dem Nutzer die vollstĂ€ndige Kontrolle ĂŒber seine personenbezogenen Fakten, einschlieĂlich der Ăberarbeitung und der Verwaltung des Zugriffs durch nachgelagerte Dienste, etwa fĂŒr Personalisierungszwecke. Um den Nutzern den aufwĂ€ndigen manuellen Aufbau einer solchen persönlichen Wissensdatenbank zu ersparen, können automatisierte Extraktionsmethoden auf den textuellen Inhalten der Nutzer â wie z.B. Konversationen oder BeitrĂ€ge in sozialen Medien â angewendet werden. Die ĂŒblichen Extraktionsmethoden sind auf strukturierte Daten wie biografische Texte oder enzyklopĂ€dische Artikel spezialisiert, die bei den meisten Menschen keine Rolle spielen. In dieser Dissertation beschĂ€ftigen wir uns mit der Gewinnung von persönlichem Wissen aus Dialogdaten und schlagen mehrere neuartige Deep-Learning-Modelle zur Ableitung persönlicher Attribute von Sprechern vor: âą Demographische Attribute wie Alter, Geschlecht, Beruf und Familienstand werden durch HAMs - Hierarchische Neuronale Klassifikatoren mit Attention-Mechanismus - abgeleitet. Trainierte HAMs können zwischen verschiedenen Arten von GesprĂ€chsdaten ĂŒbertragen werden und liefern interpretierbare Vorhersagen âą Vielseitige persönliche Attribute wie Hobbys oder Beruf werden mit CHARM ermittelt - einem Zero-Shot-Lernmodell, das den Mangel an markierten Trainingsbeispielen fĂŒr seltene Attributwerte ĂŒberwindet. Durch die VerknĂŒpfung von GesprĂ€chsĂ€uĂerungen mit externen Quellen ist CHARM in der Lage, Attributwerte zu ermitteln, die es beim Training nie gesehen hat âą Zwischenmenschliche Beziehungen werden mit PRIDE, einem hierarchischen transformerbasierten Modell, abgeleitet. Um prĂ€zise Beziehungen vorhersagen zu können, nutzt PRIDE persönliche Eigenschaften der Sprecher und den Stil von KonversationsĂ€uĂerungen Experimente mit verschiedenen Konversationstexten, inklusive Reddit-Diskussionen und Filmskripten, demonstrieren die Praxistauglichkeit unserer Methoden und ihre hervorragende Leistung im Vergleich zum aktuellen Stand der Technik
Recent Advances in Social Data and Artificial Intelligence 2019
The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Peer reviewe