468 research outputs found
Interpreting and Executing Recipes with a Cooking Robot
The creation of a robot chef represents a grand challenge for the field of robotics. Cooking is one of the most important activities that takes place in the home, and a robotic chef capable of following arbitrary recipes would have many applications in both household and industrial environments. The kitchen environment is a semi-structured proving ground for algorithms in robotics. It provides many computational challenges, such as accurately perceiving ingredients in cluttered environments, manipulating objects, and engaging in complex activities such as mixing and chopping. Keywords:
Reward Function; Statistical Machine Translation; Human Partner; Motion Primitive; Primitive Actio
VirtualHome: Simulating Household Activities via Programs
In this paper, we are interested in modeling complex activities that occur in
a typical household. We propose to use programs, i.e., sequences of atomic
actions and interactions, as a high level representation of complex tasks.
Programs are interesting because they provide a non-ambiguous representation of
a task, and allow agents to execute them. However, nowadays, there is no
database providing this type of information. Towards this goal, we first
crowd-source programs for a variety of activities that happen in people's
homes, via a game-like interface used for teaching kids how to code. Using the
collected dataset, we show how we can learn to extract programs directly from
natural language descriptions or from videos. We then implement the most common
atomic (inter)actions in the Unity3D game engine, and use our programs to
"drive" an artificial agent to execute tasks in a simulated household
environment. Our VirtualHome simulator allows us to create a large activity
video dataset with rich ground-truth, enabling training and testing of video
understanding models. We further showcase examples of our agent performing
tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral
Recommended from our members
Improving Robotic Cooking using Batch Bayesian Optimization
With advances in the field of robotic manipulation,
sensing and machine learning, robotic chefs are expected to
become prevalent in our kitchens and restaurants. Robotic chefs
are envisioned to replicate human skills in order to reduce
the burden of the cooking process. However, the potential
of robots as a means to enhance the dining experience is
unrecognised. This work introduces the concept of food quality
optimization and its challenges with an automated omelette
cooking robotic system. The design and control of the robotic
system that uses general kitchen tools is presented first. Next, we
investigate new optimization strategies for improving subjective
food quality rating, a problem challenging because of the
qualitative nature of the objective and strongly constrained
number of function evaluations possible. Our results show that
through appropriate design of the optimization routine using
Batch Bayesian Optimization, improvements in the subjective
evaluation of food quality can be achieved reliably, with very
few trials and with the ability for bulk optimization. This study
paves the way towards a broader vision of personalized food
for taste-and-nutrition and transferable recipes
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
We describe an effort to annotate a corpus of natural language instructions
consisting of 622 wet lab protocols to facilitate automatic or semi-automatic
conversion of protocols into a machine-readable format and benefit biological
research. Experimental results demonstrate the utility of our corpus for
developing machine learning approaches to shallow semantic parsing of
instructional texts. We make our annotated Wet Lab Protocol Corpus available to
the research community
Recommended from our members
English recipe flow graph corpus
We present an annotated corpus of English cooking recipe procedures, and describe and evaluate computational methods for learning these annotations. The corpus consists of 300 recipes written by members of the public, which we have annotated with domain-specific linguistic and semantic structure. Each recipe is annotated with (1) `recipe named entities' (r-NEs) specific to the recipe domain, and (2) a flow graph representing in detail the sequencing of steps, and interactions between cooking tools, food ingredients and the products of intermediate steps. For these two kinds of annotations, inter-annotator agreement ranges from 82.3 to 90.5 F1, indicating that our annotation scheme is appropriate and consistent. We experiment with producing these annotations automatically. For r-NE tagging we train a deep neural network NER tool; to compute flow graphs we train a dependency-style parsing procedure which we apply to the entire sequence of r-NEs in a recipe.In evaluations, our systems achieve 71.1 to 87.5 F1, demonstrating that our annotation scheme is learnable
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
The AI community has embraced multi-sensory or multi-modal approaches to
advance this generation of AI models to resemble expected intelligent
understanding. Combining language and imagery represents a familiar method for
specific tasks like image captioning or generation from descriptions. This
paper compares these monolithic approaches to a lightweight and specialized
method based on employing image models to label objects, then serially
submitting this resulting object list to a large language model (LLM). This use
of multiple Application Programming Interfaces (APIs) enables better than 95%
mean average precision for correct object lists, which serve as input to the
latest Open AI text generator (GPT-4). To demonstrate the API as a modular
alternative, we solve the problem of a user taking a picture of ingredients
available in a refrigerator, then generating novel recipe cards tailored to
complex constraints on cost, preparation time, dietary restrictions, portion
sizes, and multiple meal plans. The research concludes that monolithic
multimodal models currently lack the coherent memory to maintain context and
format for this task and that until recently, the language models like GPT-2/3
struggled to format similar problems without degenerating into repetitive or
non-sensical combinations of ingredients. For the first time, an AI chef or
cook seems not only possible but offers some enhanced capabilities to augment
human recipe libraries in pragmatic ways. The work generates a 100-page recipe
book featuring the thirty top ingredients using over 2000 refrigerator images
as initializing lists
- …