5,584 research outputs found

    Behaviourally meaningful representations from normalisation and context-guided denoising

    Get PDF
    Many existing independent component analysis algorithms include a preprocessing stage where the inputs are sphered. This amounts to normalising the data such that all correlations between the variables are removed. In this work, I show that sphering allows very weak contextual modulation to steer the development of meaningful features. Context-biased competition has been proposed as a model of covert attention and I propose that sphering-like normalisation also allows weaker top-down bias to guide attention

    Sim2Real View Invariant Visual Servoing by Recurrent Control

    Full text link
    Humans are remarkably proficient at controlling their limbs and tools from a wide range of viewpoints and angles, even in the presence of optical distortions. In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback. In this paper, we study how viewpoint-invariant visual servoing skills can be learned automatically in a robotic manipulation scenario. To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object. The problem that must be solved by this controller is fundamentally ambiguous: under severe variation in viewpoint, it may be impossible to determine the actions in a single feedforward operation. Instead, our visual servoing system must use its memory of past movements to understand how the actions affect the robot motion from the current viewpoint, correcting mistakes and gradually moving closer to the target. This ability is in stark contrast to most visual servoing methods, which either assume known dynamics or require a calibration phase. We show how we can learn this recurrent controller using simulated data and a reinforcement learning objective. We then describe how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers. The adapted model can servo to previously unseen objects from novel viewpoints on a real-world Kuka IIWA robotic arm. For supplementary videos, see: https://fsadeghi.github.io/Sim2RealViewInvariantServoComment: Supplementary video: https://fsadeghi.github.io/Sim2RealViewInvariantServ

    Transfer Learning via Contextual Invariants for One-to-Many Cross-Domain Recommendation

    Full text link
    The rapid proliferation of new users and items on the social web has aggravated the gray-sheep user/long-tail item challenge in recommender systems. Historically, cross-domain co-clustering methods have successfully leveraged shared users and items across dense and sparse domains to improve inference quality. However, they rely on shared rating data and cannot scale to multiple sparse target domains (i.e., the one-to-many transfer setting). This, combined with the increasing adoption of neural recommender architectures, motivates us to develop scalable neural layer-transfer approaches for cross-domain learning. Our key intuition is to guide neural collaborative filtering with domain-invariant components shared across the dense and sparse domains, improving the user and item representations learned in the sparse domains. We leverage contextual invariances across domains to develop these shared modules, and demonstrate that with user-item interaction context, we can learn-to-learn informative representation spaces even with sparse interaction data. We show the effectiveness and scalability of our approach on two public datasets and a massive transaction dataset from Visa, a global payments technology company (19% Item Recall, 3x faster vs. training separate models for each domain). Our approach is applicable to both implicit and explicit feedback settings.Comment: SIGIR 202

    Adversarial Bipartite Graph Learning for Video Domain Adaptation

    Full text link
    Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmarks evidence the effectiveness of the proposed approach over the SOTA methods on the task of video recognition.Comment: Proceedings of the 28th ACM International Conference on Multimedia (MM '20
    • …
    corecore