32 research outputs found

    Knives are picked before slices are cut: Recognition through activity sequence analysis

    Get PDF
    In this paper, we introduce a model to classify cooking activities using their visual and temporal coherence information. We fuse multiple feature descriptors for fine-grained activity recognition as we would need every single detail to catch even subtle differences between classes with low inter-class variance. Considering the observation that daily activities such as cooking are likely to be performed in sequential patterns of activities, we also model temporal coherence of activities. By combining both aspects, we show that we can improve the overall accuracy of cooking recognition tasks. © Copyright 2013 ACM

    Design and Control of Compliant Tensegrity Robots Through Simulation and Hardware Validation

    Get PDF
    To better understand the role of tensegrity structures in biological systems and their application to robotics, the Dynamic Tensegrity Robotics Lab at NASA Ames Research Center has developed and validated two different software environments for the analysis, simulation, and design of tensegrity robots. These tools, along with new control methodologies and the modular hardware components developed to validate them, are presented as a system for the design of actuated tensegrity structures. As evidenced from their appearance in many biological systems, tensegrity ("tensile-integrity") structures have unique physical properties which make them ideal for interaction with uncertain environments. Yet these characteristics, such as variable structural compliance, and global multi-path load distribution through the tension network, make design and control of bio-inspired tensegrity robots extremely challenging. This work presents the progress in using these two tools in tackling the design and control challenges. The results of this analysis includes multiple novel control approaches for mobility and terrain interaction of spherical tensegrity structures. The current hardware prototype of a six-bar tensegrity, code-named ReCTeR, is presented in the context of this validation

    Conceptfusion: A flexible scene classification framework

    Get PDF
    We introduce ConceptFusion, a method that aims high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels through the definition of concepts. The proposed framework encodes the perspectives brought through different concepts by considering them in concept groups that are ensembled for the final decision. Experiments carried out on benchmark datasets show the effectiveness of incorporating concepts in different levels with different perspectives. © Springer International Publishing Switzerland 2015

    AVIS: Autonomous Visual Information Seeking with Large Language Model Agent

    Full text link
    In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as "What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.Comment: Published on NeurIPS 202

    REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

    Full text link
    In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning

    Compact Deep Aggregation for Set Retrieval

    Full text link
    The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem -- that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities, all but one, \etc To this end, we make the following contributions: first, we propose a CNN architecture -- {\em SetNet} -- to achieve the objective: it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval, and the score of an image is a count of the number of identities that match the query; second, we show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that -- far exceeding a number of baselines; third, we explore the speed vs.\ retrieval quality trade-off for set retrieval using this compact descriptor; and, finally, we collect and annotate a large dataset of images containing various number of celebrities, which we use for evaluation and is publicly released.Comment: 20 page

    Computational Identification and Analysis of the Key Biosorbent Characteristics for the Biosorption Process of Reactive Black 5 onto Fungal Biomass

    Get PDF
    The performances of nine biosorbents derived from dead fungal biomass were investigated for their ability to remove Reactive Black 5 from aqueous solution. The biosorption data for removal of Reactive Black 5 were readily modeled using the Langmuir adsorption isotherm. Kinetic analysis based on both pseudo-second-order and Weber-Morris models indicated intraparticle diffusion was the rate limiting step for biosorption of Reactive Black 5 on to the biosorbents. Sorption capacities of the biosorbents were not correlated with the initial biosorption rates. Sensitivity analysis of the factors affecting biosorption examined by an artificial neural network model showed that pH was the most important parameter, explaining 22%, followed by nitrogen content of biosorbents (16%), initial dye concentration (15%) and carbon content of biosorbents (10%). The biosorption capacities were not proportional to surface areas of the sorbents, but were instead influenced by their chemical element composition. The main functional groups contributing to dye sorption were amine, carboxylic, and alcohol moieties. The data further suggest that differences in carbon and nitrogen contents of biosorbents may be used as a selection index for identifying effective biosorbents from dead fungal biomass
    corecore