11,454 research outputs found
Metric Learning for Generalizing Spatial Relations to New Objects
Human-centered environments are rich with a wide variety of spatial relations
between everyday objects. For autonomous robots to operate effectively in such
environments, they should be able to reason about these relations and
generalize them to objects with different shapes and sizes. For example, having
learned to place a toy inside a basket, a robot should be able to generalize
this concept using a spoon and a cup. This requires a robot to have the
flexibility to learn arbitrary relations in a lifelong manner, making it
challenging for an expert to pre-program it with sufficient knowledge to do so
beforehand. In this paper, we address the problem of learning spatial relations
by introducing a novel method from the perspective of distance metric learning.
Our approach enables a robot to reason about the similarity between pairwise
spatial relations, thereby enabling it to use its previous knowledge when
presented with a new relation to imitate. We show how this makes it possible to
learn arbitrary spatial relations from non-expert users using a small number of
examples and in an interactive manner. Our extensive evaluation with real-world
data demonstrates the effectiveness of our method in reasoning about a
continuous spectrum of spatial relations and generalizing them to new objects.Comment: Accepted at the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems. The new Freiburg Spatial Relations Dataset and a demo
video of our approach running on the PR-2 robot are available at our project
website: http://spatialrelations.cs.uni-freiburg.d
Answer Set Programming Modulo `Space-Time'
We present ASP Modulo `Space-Time', a declarative representational and
computational framework to perform commonsense reasoning about regions with
both spatial and temporal components. Supported are capabilities for mixed
qualitative-quantitative reasoning, consistency checking, and inferring
compositions of space-time relations; these capabilities combine and synergise
for applications in a range of AI application areas where the processing and
interpretation of spatio-temporal data is crucial. The framework and resulting
system is the only general KR-based method for declaratively reasoning about
the dynamics of `space-time' regions as first-class objects. We present an
empirical evaluation (with scalability and robustness results), and include
diverse application examples involving interpretation and control tasks
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Recently, large language models (LLMs) have made significant advancements in
natural language understanding and generation. However, their potential in
computer vision remains largely unexplored. In this paper, we introduce a new,
exploratory approach that enables LLMs to process images using the Scalable
Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions
of SVG representations instead of raster images, we aim to bridge the gap
between the visual and textual modalities, allowing LLMs to directly understand
and manipulate images without the need for parameterized visual components. Our
method facilitates simple image classification, generation, and in-context
learning using only LLM capabilities. We demonstrate the promise of our
approach across discriminative and generative tasks, highlighting its (i)
robustness against distribution shift, (ii) substantial improvements achieved
by tapping into the in-context learning abilities of LLMs, and (iii) image
understanding and generation capabilities with human guidance. Our code, data,
and models can be found here https://github.com/mu-cai/svg-llm
- …