5 research outputs found
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
We propose an automated algorithm to stress-test a trained visual model by
generating language-guided counterfactual test images (LANCE). Our method
leverages recent progress in large language modeling and text-based image
editing to augment an IID test set with a suite of diverse, realistic, and
challenging test images without altering model weights. We benchmark the
performance of a diverse set of pretrained models on our generated data and
observe significant and consistent performance drops. We further analyze model
sensitivity across different types of edits, and demonstrate its applicability
at surfacing previously unknown class-level model biases in ImageNet.Comment: Project webpage: https://virajprabhu.github.io/lance-web
GOAT: GO to Any Thing
In deployment scenarios such as homes and warehouses, mobile robots are
expected to autonomously navigate for extended periods, seamlessly executing
tasks articulated in terms that are intuitively understandable by human
operators. We present GO To Any Thing (GOAT), a universal navigation system
capable of tackling these requirements with three key features: a) Multimodal:
it can tackle goals specified via category labels, target images, and language
descriptions, b) Lifelong: it benefits from its past experience in the same
environment, and c) Platform Agnostic: it can be quickly deployed on robots
with different embodiments. GOAT is made possible through a modular system
design and a continually augmented instance-aware semantic memory that keeps
track of the appearance of objects from different viewpoints in addition to
category-level semantics. This enables GOAT to distinguish between different
instances of the same category to enable navigation to targets specified by
images and language descriptions. In experimental comparisons spanning over 90
hours in 9 different homes consisting of 675 goals selected across 200+
different object instances, we find GOAT achieves an overall success rate of
83%, surpassing previous methods and ablations by 32% (absolute improvement).
GOAT improves with experience in the environment, from a 60% success rate at
the first goal to a 90% success after exploration. In addition, we demonstrate
that GOAT can readily be applied to downstream tasks such as pick and place and
social navigation
Housekeep: Tidying Virtual Households using Commonsense Reasoning
We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the
home for embodied AI. In Housekeep, an embodied agent must tidy a house by
rearranging misplaced objects without explicit instructions specifying which
objects need to be rearranged. Instead, the agent must learn from and is
evaluated against human preferences of which objects belong where in a tidy
house. Specifically, we collect a dataset of where humans typically place
objects in tidy and untidy houses constituting 1799 objects, 268 object
categories, 585 placements, and 105 rooms. Next, we propose a modular baseline
approach for Housekeep that integrates planning, exploration, and navigation.
It leverages a fine-tuned large language model (LLM) trained on an internet
text corpus for effective planning. We show that our baseline agent generalizes
to rearranging unseen objects in unknown environments. See our webpage for more
details: https://yashkant.github.io/housekeep
HomeRobot: An Open Source Software Stack for Mobile Manipulation Research
Reproducibility in robotics research requires capable, shared hardware platforms which can be used for a wide variety of research. We’ve seen the power of these sorts of shared platforms in more general machine learning research, where there is constant iteration on shared AI platforms like PyTorch. To be able to make rapid progress in robotics in the same way, we propose that we need: (1) shared real-world platforms which allow different teams to test and compare methods at low cost; (2) challenging simulations that reflect real-world environments and especially can drive perception and planning research; and (3) low-cost platforms with enough software to get started addressing all of these problems. To this end, we propose HomeRobot, a mobile manipulator software stack with associated benchmark in simulation, which is initially based on the low-cost, human-safe Hello Robot Stretch