32 research outputs found
Knives are picked before slices are cut: Recognition through activity sequence analysis
In this paper, we introduce a model to classify cooking activities using their visual and temporal coherence information. We fuse multiple feature descriptors for fine-grained activity recognition as we would need every single detail to catch even subtle differences between classes with low inter-class variance. Considering the observation that daily activities such as cooking are likely to be performed in sequential patterns of activities, we also model temporal coherence of activities. By combining both aspects, we show that we can improve the overall accuracy of cooking recognition tasks. © Copyright 2013 ACM
Design and Control of Compliant Tensegrity Robots Through Simulation and Hardware Validation
To better understand the role of tensegrity structures in biological systems and their application to robotics, the Dynamic Tensegrity Robotics Lab at NASA Ames Research Center has developed and validated two different software environments for the analysis, simulation, and design of tensegrity robots. These tools, along with new control methodologies and the modular hardware components developed to validate them, are presented as a system for the design of actuated tensegrity structures. As evidenced from their appearance in many biological systems, tensegrity ("tensile-integrity") structures have unique physical properties which make them ideal for interaction with uncertain environments. Yet these characteristics, such as variable structural compliance, and global multi-path load distribution through the tension network, make design and control of bio-inspired tensegrity robots extremely challenging. This work presents the progress in using these two tools in tackling the design and control challenges. The results of this analysis includes multiple novel control approaches for mobility and terrain interaction of spherical tensegrity structures. The current hardware prototype of a six-bar tensegrity, code-named ReCTeR, is presented in the context of this validation
Conceptfusion: A flexible scene classification framework
We introduce ConceptFusion, a method that aims high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels through the definition of concepts. The proposed framework encodes the perspectives brought through different concepts by considering them in concept groups that are ensembled for the final decision. Experiments carried out on benchmark datasets show the effectiveness of incorporating concepts in different levels with different perspectives. © Springer International Publishing Switzerland 2015
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
In this paper, we propose an autonomous information seeking visual question
answering framework, AVIS. Our method leverages a Large Language Model (LLM) to
dynamically strategize the utilization of external tools and to investigate
their outputs, thereby acquiring the indispensable knowledge needed to provide
answers to the posed questions. Responding to visual questions that necessitate
external knowledge, such as "What event is commemorated by the building
depicted in this image?", is a complex task. This task presents a combinatorial
search space that demands a sequence of actions, including invoking APIs,
analyzing their responses, and making informed decisions. We conduct a user
study to collect a variety of instances of human decision-making when faced
with this task. This data is then used to design a system comprised of three
components: an LLM-powered planner that dynamically determines which tool to
use next, an LLM-powered reasoner that analyzes and extracts key information
from the tool outputs, and a working memory component that retains the acquired
information throughout the process. The collected user behavior serves as a
guide for our system in two key ways. First, we create a transition graph by
analyzing the sequence of decisions made by users. This graph delineates
distinct states and confines the set of actions available at each state.
Second, we use examples of user decision-making to provide our LLM-powered
planner and reasoner with relevant contextual instances, enhancing their
capacity to make informed decisions. We show that AVIS achieves
state-of-the-art results on knowledge-intensive visual question answering
benchmarks such as Infoseek and OK-VQA.Comment: Published on NeurIPS 202
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language
Model (REVEAL) that learns to encode world knowledge into a large-scale memory,
and to retrieve from it to answer knowledge-intensive queries. REVEAL consists
of four key components: the memory, the encoder, the retriever and the
generator. The large-scale memory encodes various sources of multimodal world
knowledge (e.g. image-text pairs, question answering pairs, knowledge graph
triplets, etc) via a unified encoder. The retriever finds the most relevant
knowledge entries in the memory, and the generator fuses the retrieved
knowledge with the input query to produce the output. A key novelty in our
approach is that the memory, encoder, retriever and generator are all
pre-trained end-to-end on a massive amount of data. Furthermore, our approach
can use a diverse set of multimodal knowledge sources, which is shown to result
in significant gains. We show that REVEAL achieves state-of-the-art results on
visual question answering and image captioning
Compact Deep Aggregation for Set Retrieval
The objective of this work is to learn a compact embedding of a set of
descriptors that is suitable for efficient retrieval and ranking, whilst
maintaining discriminability of the individual descriptors. We focus on a
specific example of this general problem -- that of retrieving images
containing multiple faces from a large scale dataset of images. Here the set
consists of the face descriptors in each image, and given a query for multiple
identities, the goal is then to retrieve, in order, images which contain all
the identities, all but one, \etc
To this end, we make the following contributions: first, we propose a CNN
architecture -- {\em SetNet} -- to achieve the objective: it learns face
descriptors and their aggregation over a set to produce a compact fixed length
descriptor designed for set retrieval, and the score of an image is a count of
the number of identities that match the query; second, we show that this
compact descriptor has minimal loss of discriminability up to two faces per
image, and degrades slowly after that -- far exceeding a number of baselines;
third, we explore the speed vs.\ retrieval quality trade-off for set retrieval
using this compact descriptor; and, finally, we collect and annotate a large
dataset of images containing various number of celebrities, which we use for
evaluation and is publicly released.Comment: 20 page
Computational Identification and Analysis of the Key Biosorbent Characteristics for the Biosorption Process of Reactive Black 5 onto Fungal Biomass
The performances of nine biosorbents derived from dead fungal biomass were investigated for their ability to remove Reactive Black 5 from aqueous solution. The biosorption data for removal of Reactive Black 5 were readily modeled using the Langmuir adsorption isotherm. Kinetic analysis based on both pseudo-second-order and Weber-Morris models indicated intraparticle diffusion was the rate limiting step for biosorption of Reactive Black 5 on to the biosorbents. Sorption capacities of the biosorbents were not correlated with the initial biosorption rates. Sensitivity analysis of the factors affecting biosorption examined by an artificial neural network model showed that pH was the most important parameter, explaining 22%, followed by nitrogen content of biosorbents (16%), initial dye concentration (15%) and carbon content of biosorbents (10%). The biosorption capacities were not proportional to surface areas of the sorbents, but were instead influenced by their chemical element composition. The main functional groups contributing to dye sorption were amine, carboxylic, and alcohol moieties. The data further suggest that differences in carbon and nitrogen contents of biosorbents may be used as a selection index for identifying effective biosorbents from dead fungal biomass