766 research outputs found

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Get PDF
    Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a source-target selective joint fine-tuning scheme for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task with insufficient training data is carried out simultaneously with another source learning task with abundant training data. However, the source learning task does not use all existing training data. Our core idea is to identify and use a subset of training images from the original source learning task whose low-level characteristics are similar to those from the target learning task, and jointly fine-tune shared convolutional layers for both tasks. Specifically, we compute descriptors from linear or nonlinear filter bank responses on training images from both tasks, and use such descriptors to search for a desired subset of training samples for the source learning task. Experiments demonstrate that our selective joint fine-tuning scheme achieves state-of-the-art performance on multiple visual classification tasks with insufficient training data for deep learning. Such tasks include Caltech 256, MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to fine-tuning without a source domain, the proposed method can improve the classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017

    Hierarchical Attention Network for Action Segmentation

    Full text link
    The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.Comment: Published in Pattern Recognition Letter

    Semantic Mechanical Search with Large Vision and Language Models

    Full text link
    Moving objects to find a fully-occluded target object, known as mechanical search, is a challenging problem in robotics. As objects are often organized semantically, we conjecture that semantic information about object relationships can facilitate mechanical search and reduce search time. Large pretrained vision and language models (VLMs and LLMs) have shown promise in generalizing to uncommon objects and previously unseen real-world environments. In this work, we propose a novel framework called Semantic Mechanical Search (SMS). SMS conducts scene understanding and generates a semantic occupancy distribution explicitly using LLMs. Compared to methods that rely on visual similarities offered by CLIP embeddings, SMS leverages the deep reasoning capabilities of LLMs. Unlike prior work that uses VLMs and LLMs as end-to-end planners, which may not integrate well with specialized geometric planners, SMS can serve as a plug-in semantic module for downstream manipulation or navigation policies. For mechanical search in closed-world settings such as shelves, we compare with a geometric-based planner and show that SMS improves mechanical search performance by 24% across the pharmacy, kitchen, and office domains in simulation and 47.1% in physical experiments. For open-world real environments, SMS can produce better semantic distributions compared to CLIP-based methods, with the potential to be integrated with downstream navigation policies to improve object navigation tasks. Code, data, videos, and the appendix are available: https://sites.google.com/view/semantic-mechanical-searc

    Memory-based preferential choice in large option spaces

    Get PDF
    Whether adding songs to a playlist or groceries to a shopping basket, everyday decisions often require us to choose between an innumerable set of options. Laboratory studies of preferential choice have made considerable progress in describing how people navigate fixed sets of options. Yet, questions remain about how well this generalises to more complex, everyday choices. In this thesis, I ask how people navigate large option spaces, focusing particularly on how long-term memory supports decisions. In the first project, I explore how large option spaces are structured in the mind. A topic model trained on the purchasing patterns of consumers uncovered an intuitive set of themes that centred primarily around goals (e.g., tomatoes go well in a salad), suggesting that representations are geared to support action. In the second project, I explore how such representations are queried during memory-based decisions, where options must be retrieved from memory. Using a large dataset of over 100,000 online grocery shops, results revealed that consumers query multiple systems of associative memory when determining what choose next. Attending to certain knowledge sources, as estimated by a cognitive model, predicted important retrieval errors, such as the propensity to forget or add unwanted products. In the final project, I ask how preferences could be learned and represented in large option spaces, where most options are untried. A cognitive model of sequential decision making is proposed, which learns preferences over choice attributes, allowing for the generalisation of preferences to unseen options, by virtue of their similarity to previous choices. This model explains reduced exploration patterns behaviour observed in the supermarket and preferential choices in more controlled laboratory settings. Overall, this suggests that consumers depend on associative systems in long-term memory when navigating large spaces of options, enabling inferences about the conceptual properties and subjective value of novel options
    • …
    corecore