63 research outputs found
Learning household task knowledge from WikiHow descriptions
Commonsense procedural knowledge is important for AI agents and robots that
operate in a human environment. While previous attempts at constructing
procedural knowledge are mostly rule- and template-based, recent advances in
deep learning provide the possibility of acquiring such knowledge directly from
natural language sources. As a first step in this direction, we propose a model
to learn embeddings for tasks, as well as the individual steps that need to be
taken to solve them, based on WikiHow articles. We learn these embeddings such
that they are predictive of both step relevance and step ordering. We also
experiment with the use of integer programming for inferring consistent global
step orderings from noisy pairwise predictions.Comment: IJCAI 2019 Workshop on Semantic Deep Learnin
A Dataset for Tracking Entities in Open Domain Procedural Text
We present the first dataset for tracking state changes in procedural text
from arbitrary domains by using an unrestricted (open) vocabulary. For example,
in a text describing fog removal using potatoes, a car window may transition
between being foggy, sticky,opaque, and clear. Previous formulations of this
task provide the text and entities involved,and ask how those entities change
for just a small, pre-defined set of attributes (e.g., location), limiting
their fidelity. Our solution is a new task formulation where given just a
procedural text as input, the task is to generate a set of state change
tuples(entity, at-tribute, before-state, after-state)for each step,where the
entity, attribute, and state values must be predicted from an open vocabulary.
Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as
judged by humans and completely vetted), and large-scale dataset comprising
29,928 state changes over 4,050 sentences from 810 procedural real-world
paragraphs from WikiHow.com. A current state-of-the-art generation model on
this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel
model architectures.Comment: To appear in EMNLP 202
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish
Understanding procedural natural language (e.g., step-by-step instructions)
is a crucial step to execution and planning. However, while there are ample
corpora and downstream tasks available in English, the field lacks such
resources for most languages. To address this gap, we conduct a case study on
Turkish procedural texts. We first expand the number of tutorials in Turkish
wikiHow from 2,000 to 52,000 using automated translation tools, where the
translation quality and loyalty to the original meaning are validated by a team
of experts on a random set. Then, we generate several downstream tasks on the
corpus, such as linking actions, goal inference, and summarization. To tackle
these tasks, we implement strong baseline models via fine-tuning large
language-specific models such as TR-BART and BERTurk, as well as multilingual
models such as mBART, mT5, and XLM. We find that language-specific models
consistently outperform their multilingual models by a significant margin
across most procedural language understanding (PLU) tasks. We release our
corpus, downstream tasks and the baseline models with https://github.com/
GGLAB-KU/turkish-plu.Comment: 9 page
Multimedia Generative Script Learning for Task Planning
Goal-oriented generative script learning aims to generate subsequent steps to
reach a particular goal, which is an essential task to assist robots or humans
in performing stereotypical activities. An important aspect of this process is
the ability to capture historical states visually, which provides detailed
information that is not covered by text and will guide subsequent steps.
Therefore, we propose a new task, Multimedia Generative Script Learning, to
generate subsequent steps by tracking historical states in both text and vision
modalities, as well as presenting the first benchmark containing 5,652 tasks
and 79,089 multimedia steps. This task is challenging in three aspects: the
multimedia challenge of capturing the visual states in images, the induction
challenge of performing unseen tasks, and the diversity challenge of covering
different information in individual steps. We propose to encode visual state
changes through a selective multimedia encoder to address the multimedia
challenge, transfer knowledge from previously observed tasks using a
retrieval-augmented decoder to overcome the induction challenge, and further
present distinct information at each step by optimizing a diversity-oriented
contrastive learning objective. We define metrics to evaluate both generation
and inductive quality. Experiment results demonstrate that our approach
significantly outperforms strong baselines.Comment: 21 pages, Accepted by Findings of the Association for Computational
Linguistics: ACL 2023, Code and Resources at
https://github.com/EagleW/Multimedia-Generative-Script-Learnin
Context-Independent Task Knowledge for Neurosymbolic Reasoning in Cognitive Robotics
One of the current main goals of artificial intelligence and robotics research is the creation of an artificial assistant which can have flexible, human like behavior, in order to accomplish everyday tasks. A lot of what is context-independent task knowledge to the human is what enables this flexibility at multiple levels of cognition. In this scope the author analyzes how to acquire, represent and disambiguate symbolic knowledge representing context-independent task knowledge, abstracted from multiple instances: this thesis elaborates the incurred problems, implementation constraints, current state-of-the-art practices and ultimately the solutions newly introduced in this scope. The author specifically discusses acquisition of context-independent task knowledge from large amounts of human-written texts and their reusability in the robotics domain; the acquisition of knowledge on human musculoskeletal dependencies constraining motion which allows a better higher level representation of observed trajectories; the means of verbalization of partial contextual and instruction knowledge, increasing interaction possibilities with the human as well as contextual adaptation. All the aforementioned points are supported by evaluation in heterogeneous setups, to bring a view on how to make optimal use of statistical & symbolic applications (i.e. neurosymbolic reasoning) in cognitive robotics. This work has been performed to enable context-adaptable artificial assistants, by bringing together knowledge on what is usually regarded as context-independent task knowledge
- …