Search CORE

15,475 research outputs found

Template Mining for Information Extraction from Digital Documents

Author: Chowdhury Gobinda G.
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Crowdsourcing for Reminiscence Chatbot Design

Author: Baez Marcos
Casati Fabio
Daniel Florian
Kopanitsa Georgy
Nikitina Svetlana
Publication venue
Publication date: 01/01/2018
Field of study

In this work-in-progress paper we discuss the challenges in identifying effective and scalable crowd-based strategies for designing content, conversation logic, and meaningful metrics for a reminiscence chatbot targeted at older adults. We formalize the problem and outline the main research questions that drive the research agenda in chatbot design for reminiscence and for relational agents for older adults in general

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands

Author: Alvarez-Melis David
Banarescu Laura
Chen David L
Chu Shumo
Ganitkevitch Juri
Kate Rohit J
Kingma Diederik P
Pasupat Panupong
Quirk Chris
Shetty Jitesh
Steedman Mark
Trakhtenbrot Boris A.
Wang Yushi
Wong Yuk Wah
Xu Xiaojun
Zelle John M
Zettlemoyer Luke S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/04/2019
Field of study

To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code. Genie needs only a small realistic set of input sentences for validating the neural model. Developers write templates to synthesize data; Genie uses crowdsourced paraphrases and data augmentation, along with the synthesized data, to train a semantic parser. We also propose design principles that make VAPL languages amenable to natural language translation. We apply these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted free-form parameters. Genie achieves a 62% accuracy on realistic user inputs. We demonstrate Genie's generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control.Comment: To appear in PLDI 201

arXiv.org e-Print Archive

Crossref

Knowledge Rich Natural Language Queries over Structured Biological Databases

Author: Chu W. W.
Goldsmith E. J.
InterProlog
Kossmann D.
Lawrence C.
Maio C. D.
Mir S.
Mou X.
Nandi A.
Novik L.
Safran M.
Swofford D. L.
Publication venue
Publication date: 30/03/2017
Field of study

Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

arXiv.org e-Print Archive

Crossref

Two-phased knowledge formalisation for hydrometallurgical gold ore process recommendation and validation

Author: Rintala Lotta
Roth-Berghofer Thomas
Sauer Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/04/2014
Field of study

This paper describes an approach to externalising and formalising expert knowledge involved in the design and evaluation of hydrometallurgical process chains for gold ore treatment. The objective was to create a case-based reasoning application for recommending and validating a treatment process of gold ores. We describe a twofold approach. Formalising human expert knowledge about gold mining situations enables the retrieval of similar mining contexts and respective process chains, based on prospection data gathered from a potential gold mining site. Secondly, empirical knowledge on hydrometallurgical treatments is formalised. This enabled us to evaluate and, where needed, redesign the process chain that was recommended by the first aspect of our approach. The main problems with formalisation of knowledge in the domain of gold ore refinement are the diversity and the amount of parameters used in literature and by experts to describe a mining context. We demonstrate how similarity knowledge was used to formalise literature knowledge. The evaluation of data gathered from experiments with an initial prototype workflow recommender, Auric Adviser, provides promising results

UWL Repository

VTT Research System

Generating Natural Language from Linked Data:Unsupervised template extraction

Author: Duma Daniel
Klein Ewan
Publication venue
Publication date: 01/01/2013
Field of study

We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.

CiteSeerX

Edinburgh Research Explorer

Design and evaluation of acceleration strategies for speeding up the development of dialog applications

Author: Agah
Bohus
Chung
D’Haro
Javier Ferreiros
José Manuel Pardo
Jung
Luis Fernando D’Haro
McTear
Pargellis
Ricardo de Córdoba
Rubén San-Segundo
Tsai
Wang
Wolters
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Generative Design in Minecraft (GDMC), Settlement Generation Competition

Author: Aluru Krishna
Cardona-Rivera Rogelio Enrique
Colton Simon
Friberger M Gustafsson
Kelly George
Liapis Antonios
Liapis Antonios
McCormack Jon
Rossignol Jim
Shaker Noor
Smith Anthony J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/07/2018
Field of study

This paper introduces the settlement generation competition for Minecraft, the first part of the Generative Design in Minecraft challenge. The settlement generation competition is about creating Artificial Intelligence (AI) agents that can produce functional, aesthetically appealing and believable settlements adapted to a given Minecraft map - ideally at a level that can compete with human created designs. The aim of the competition is to advance procedural content generation for games, especially in overcoming the challenges of adaptive and holistic PCG. The paper introduces the technical details of the challenge, but mostly focuses on what challenges this competition provides and why they are scientifically relevant.Comment: 10 pages, 5 figures, Part of the Foundations of Digital Games 2018 proceedings, as part of the workshop on Procedural Content Generatio

arXiv.org e-Print Archive

Crossref