295,297 research outputs found

    ActiveRMAP: Radiance Field for Active Mapping And Planning

    Full text link
    A high-quality 3D reconstruction of a scene from a collection of 2D images can be achieved through offline/online mapping methods. In this paper, we explore active mapping from the perspective of implicit representations, which have recently produced compelling results in a variety of applications. One of the most popular implicit representations - Neural Radiance Field (NeRF), first demonstrated photorealistic rendering results using multi-layer perceptrons, with promising offline 3D reconstruction as a by-product of the radiance field. More recently, researchers also applied this implicit representation for online reconstruction and localization (i.e. implicit SLAM systems). However, the study on using implicit representation for active vision tasks is still very limited. In this paper, we are particularly interested in applying the neural radiance field for active mapping and planning problems, which are closely coupled tasks in an active system. We, for the first time, present an RGB-only active vision framework using radiance field representation for active 3D reconstruction and planning in an online manner. Specifically, we formulate this joint task as an iterative dual-stage optimization problem, where we alternatively optimize for the radiance field representation and path planning. Experimental results suggest that the proposed method achieves competitive results compared to other offline methods and outperforms active reconstruction methods using NeRFs.Comment: Under revie

    An active stereo vision-based learning approach for robotic tracking, fixating and grasping control

    Full text link
    In this paper, an active stereo vision-based learning approach is proposed for a robot to track, fixate and grasp an object in unknown environments. First, the functional mapping relationships between the joint angles of the active stereo vision system and the spatial representations of the object are derived and expressed in a three-dimensional workspace frame. Next, the self-adaptive resonance theory-based neural networks and the feedforward neural networks are used to learn the mapping relationships in a self-organized way. Then, the approach is verified by simulation using the models of an active stereo vision system which is installed in the end-effector of a robot. Finally, the simulation results confirm the effectiveness of the present approach. <br /

    Active Vision during Action Execution, Observation and Imagery: Evidence for Shared Motor Representations

    Get PDF
    The concept of shared motor representations between action execution and various covert conditions has been demonstrated through a number of psychophysiological modalities over the past two decades. Rarely, however, have researchers considered the congruence of physical, imaginary and observed movement markers in a single paradigm and never in a design where eye movement metrics are the markers. In this study, participants were required to perform a forward reach and point Fitts’ Task on a digitizing tablet whilst wearing an eye movement system. Gaze metrics were used to compare behaviour congruence between action execution, action observation, and guided and unguided movement imagery conditions. The data showed that participants attended the same task-related visual cues between conditions but the strategy was different. Specifically, the number of fixations was significantly different between action execution and all covert conditions. In addition, fixation duration was congruent between action execution and action observation only, and both conditions displayed an indirect Fitts’ Law effect. We therefore extend the understanding of the common motor representation by demonstrating, for the first time, common spatial eye movement metrics across simulation conditions and some specific temporal congruence for action execution and action observation. Our findings suggest that action observation may be an effective technique in supporting motor processes. The use of video as an adjunct to physical techniques may be beneficial in supporting motor planning in both performance and clinical rehabilitation environments

    Cross-dimensional Weighting for Aggregated Deep Convolutional Features

    Full text link
    We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps. We then propose specific non-parametric schemes for both spatial- and channel-wise weighting that boost the effect of highly active spatial responses and at the same time regulate burstiness effects. We experiment on different public datasets for image search and show that our approach outperforms the current state-of-the-art for approaches based on pre-trained networks. We also provide an easy-to-use, open source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and Social Media (VSM), ECCV 201

    Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression

    Full text link
    We present techniques for improving performance driven facial animation, emotion recognition, and facial key-point or landmark prediction using learned identity invariant representations. Established approaches to these problems can work well if sufficient examples and labels for a particular identity are available and factors of variation are highly controlled. However, labeled examples of facial expressions, emotions and key-points for new individuals are difficult and costly to obtain. In this paper we improve the ability of techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. We use a weakly-supervised approach in which identity labels are used to learn the different factors of variation linked to identity separately from factors related to expression. We show how probabilistic modeling of these sources of variation allows one to learn identity-invariant representations for expressions which can then be used to identity-normalize various procedures for facial expression analysis and animation control. We also show how to extend the widely used techniques of active appearance models and constrained local models through replacing the underlying point distribution models which are typically constructed using principal component analysis with identity-expression factorized representations. We present a wide variety of experiments in which we consistently improve performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS

    Language with Vision: a Study on Grounded Word and Sentence Embeddings

    Full text link
    Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioural datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT, but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2

    fMRI Evidence for Modality-Specific Processing of Conceptual Knowledge on Six Modalities

    Get PDF
    Traditional theories assume that amodal representations, such as feature lists and semantic networks, represent conceptual knowledge about the world. According to this view, the sensory, motor, and introspective states that arise during perception and action are irrelevant to representing knowledge. Instead the conceptual system lies outside modality-specific systems and operates according to different principles. Increasingly, however, researchers report that modality-specific systems become active during purely conceptual tasks, suggesting that these systems play central roles in representing knowledge (for a review, see Martin, 2001, Handbook of Functional Neuroimaging of Cognition). In particular, researchers report that the visual system becomes active while processing visual properties, and that the motor system becomes active while processing action properties. The present study corroborates and extends these findings. During fMRI, subjects verified whether or not properties could potentially be true of concepts (e.g., BLENDER-loud). Subjects received only linguistic stimuli, and nothing was said about using imagery. Highly related false properties were used on false trials to block word association strategies (e.g., BUFFALOwinged). To assess the full extent of the modality-specific hypothesis, properties were verified on each of six modalities. Examples include GEMSTONE-glittering (vision), BLENDER-loud (audition), FAUCET-turned (motor), MARBLE-cool (touch), CUCUMBER-bland (taste), and SOAP-perfumed (smell). Neural activity during property verification was compared to a lexical decision baseline. For all six sets of the modalityspecific properties, significant activation was observed in the respective neural system. Finding modality-specific processing across six modalities contributes to the growing conclusion that knowledge is grounded in modality-specific systems of the brain

    The real-time learning mechanism of the Scientific Research Associates Advanced Robotic System (SRAARS)

    Get PDF
    Scientific research associates advanced robotic system (SRAARS) is an intelligent robotic system which has autonomous learning capability in geometric reasoning. The system is equipped with one global intelligence center (GIC) and eight local intelligence centers (LICs). It controls mainly sixteen links with fourteen active joints, which constitute two articulated arms, an extensible lower body, a vision system with two CCD cameras and a mobile base. The on-board knowledge-based system supports the learning controller with model representations of both the robot and the working environment. By consecutive verifying and planning procedures, hypothesis-and-test routines and learning-by-analogy paradigm, the system would autonomously build up its own understanding of the relationship between itself (i.e., the robot) and the focused environment for the purposes of collision avoidance, motion analysis and object manipulation. The intelligence of SRAARS presents a valuable technical advantage to implement robotic systems for space exploration and space station operations

    Object segregation and local gist vision using low-level geometry

    Get PDF
    Multi-scale representations of lines, edges and keypoints on the basis of simple, complex, and end-stopped cells can be used for object categorisation and recognition. These representations are complemented by saliency maps of colour, texture, disparity and motion information, which also serve to model extremely fast gist vision in parallel with object segregation. We present a low-level geometry model based on a single type of self-adjusting grouping cell, with a circular array of dendrites connected to edge cells located at several angles. Different angles between active edge cells allow the grouping cell to detect geometric primitives like corners, bars and blobs. Such primitives forming different configurations can then be grouped to identify more complex geometry, like object shapes, without much additional effort. The speed of the model permits it to be used for fast gist vision, assuming that edge cells respond to transients in colour, texture, disparity and motion. The big advantage of combining this information at a low level is that local (object) gist can be extracted first, ie, which types of objects are about where in a scene, after which global (scene) gist can be processed at a semantic level
    • …
    corecore