1 research outputs found
Grounding Language Attributes to Objects using Bayesian Eigenobjects
We develop a system to disambiguate object instances within the same class
based on simple physical descriptions. The system takes as input a natural
language phrase and a depth image containing a segmented object and predicts
how similar the observed object is to the object described by the phrase. Our
system is designed to learn from only a small amount of human-labeled language
data and generalize to viewpoints not represented in the language-annotated
depth image training set. By decoupling 3D shape representation from language
representation, this method is able to ground language to novel objects using a
small amount of language-annotated depth-data and a larger corpus of unlabeled
3D object meshes, even when these objects are partially observed from unusual
viewpoints. Our system is able to disambiguate between novel objects, observed
via depth images, based on natural language descriptions. Our method also
enables view-point transfer; trained on human-annotated data on a small set of
depth images captured from frontal viewpoints, our system successfully
predicted object attributes from rear views despite having no such depth images
in its training set. Finally, we demonstrate our approach on a Baxter robot,
enabling it to pick specific objects based on human-provided natural language
descriptions