21,311 research outputs found
VirtualHome: Simulating Household Activities via Programs
In this paper, we are interested in modeling complex activities that occur in
a typical household. We propose to use programs, i.e., sequences of atomic
actions and interactions, as a high level representation of complex tasks.
Programs are interesting because they provide a non-ambiguous representation of
a task, and allow agents to execute them. However, nowadays, there is no
database providing this type of information. Towards this goal, we first
crowd-source programs for a variety of activities that happen in people's
homes, via a game-like interface used for teaching kids how to code. Using the
collected dataset, we show how we can learn to extract programs directly from
natural language descriptions or from videos. We then implement the most common
atomic (inter)actions in the Unity3D game engine, and use our programs to
"drive" an artificial agent to execute tasks in a simulated household
environment. Our VirtualHome simulator allows us to create a large activity
video dataset with rich ground-truth, enabling training and testing of video
understanding models. We further showcase examples of our agent performing
tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral
Recommended from our members
Arguing satisfaction of security requirements
This chapter presents a process for security requirements elicitation and analysis,
based around the construction of a satisfaction argument for the security of a
system. The process starts with the enumeration of security goals based on assets
in the system, then uses these goals to derive security requirements in the form of
constraints. Next, a satisfaction argument for the system is constructed, using a
problem-centered representation, a formal proof to analyze properties that can be
demonstrated, and structured informal argumentation of the assumptions exposed
during construction of the argument. Constructing the satisfaction argument can
expose missing and inconsistent assumptions about system context and behavior
that effect security, and a completed argument provides assurances that a system
can respect its security requirements
Emergent Capabilities for Collaborative Teams in the Evolving Web Environment
This paper reports on our investigation of the latest advances for the Social Web, Web 2.0 and the Linked Data Web. These advances are discussed in terms of the latest capabilities that are available (or being made available) on the Web at the time of writing this paper. Such capabilities can be of significant benefit to teams, especially those comprised of multinational, geographically-dispersed team members. The specific context of coalition members in a rapidly formed diverse military context such as disaster relief or humanitarian aid is considered, where close working between non-government organisations and non-military teams will help to achieve results as quickly and efficiently as possible. The heterogeneity one finds in such teams, coupled with a lack of dedicated private network infrastructure, poses a number of challenges for collaboration, and the current paper represents an attempt to assess whether nascent Web-based capabilities can support such teams in terms of both their collaborative activities and their access to (and sharing of) information resources
Memory Networks
We describe a new class of learning models called memory networks. Memory
networks reason with inference components combined with a long-term memory
component; they learn how to use these jointly. The long-term memory can be
read and written to, with the goal of using it for prediction. We investigate
these models in the context of question answering (QA) where the long-term
memory effectively acts as a (dynamic) knowledge base, and the output is a
textual response. We evaluate them on a large-scale QA task, and a smaller, but
more complex, toy task generated from a simulated world. In the latter, we show
the reasoning power of such models by chaining multiple supporting sentences to
answer questions that require understanding the intension of verbs
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings
The five-dollar model is a lightweight text-to-image generative architecture
that generates low dimensional images from an encoded text prompt. This model
can successfully generate accurate and aesthetically pleasing content in low
dimensional domains, with limited amounts of training data. Despite the small
size of both the model and datasets, the generated images are still able to
maintain the encoded semantic meaning of the textual prompt. We apply this
model to three small datasets: pixel art video game maps, video game sprite
images, and down-scaled emoji images and apply novel augmentation strategies to
improve the performance of our model on these limited datasets. We evaluate our
models performance using cosine similarity score between text-image pairs
generated by the CLIP VIT-B/32 model.Comment: to be published in AIIDE 202
- …