295,297 research outputs found
ActiveRMAP: Radiance Field for Active Mapping And Planning
A high-quality 3D reconstruction of a scene from a collection of 2D images
can be achieved through offline/online mapping methods. In this paper, we
explore active mapping from the perspective of implicit representations, which
have recently produced compelling results in a variety of applications. One of
the most popular implicit representations - Neural Radiance Field (NeRF), first
demonstrated photorealistic rendering results using multi-layer perceptrons,
with promising offline 3D reconstruction as a by-product of the radiance field.
More recently, researchers also applied this implicit representation for online
reconstruction and localization (i.e. implicit SLAM systems). However, the
study on using implicit representation for active vision tasks is still very
limited. In this paper, we are particularly interested in applying the neural
radiance field for active mapping and planning problems, which are closely
coupled tasks in an active system. We, for the first time, present an RGB-only
active vision framework using radiance field representation for active 3D
reconstruction and planning in an online manner. Specifically, we formulate
this joint task as an iterative dual-stage optimization problem, where we
alternatively optimize for the radiance field representation and path planning.
Experimental results suggest that the proposed method achieves competitive
results compared to other offline methods and outperforms active reconstruction
methods using NeRFs.Comment: Under revie
An active stereo vision-based learning approach for robotic tracking, fixating and grasping control
In this paper, an active stereo vision-based learning approach is proposed for a robot to track, fixate and grasp an object in unknown environments. First, the functional mapping relationships between the joint angles of the active stereo vision system and the spatial representations of the object are derived and expressed in a three-dimensional workspace frame. Next, the self-adaptive resonance theory-based neural networks and the feedforward neural networks are used to learn the mapping relationships in a self-organized way. Then, the approach is verified by simulation using the models of an active stereo vision system which is installed in the end-effector of a robot. Finally, the simulation results confirm the effectiveness of the present approach. <br /
Active Vision during Action Execution, Observation and Imagery: Evidence for Shared Motor Representations
The concept of shared motor representations between action execution and various covert conditions has been demonstrated through a number of psychophysiological modalities over the past two decades. Rarely, however, have
researchers considered the congruence of physical, imaginary and observed movement markers in a single paradigm and never in a design where eye movement metrics are the markers. In this study, participants were required to perform a forward reach and point Fitts’ Task on a digitizing tablet whilst wearing an eye movement system. Gaze metrics were used to compare behaviour congruence between action execution, action observation, and guided and unguided movement imagery conditions. The data showed that participants attended the same task-related visual cues between conditions but the strategy was different. Specifically, the number of fixations was significantly different between action execution and all covert conditions. In addition, fixation duration was congruent between action execution and action observation only, and
both conditions displayed an indirect Fitts’ Law effect. We therefore extend the understanding of the common motor representation by demonstrating, for the first time, common spatial eye movement metrics across simulation conditions
and some specific temporal congruence for action execution and action observation. Our findings suggest that action
observation may be an effective technique in supporting motor processes. The use of video as an adjunct to physical
techniques may be beneficial in supporting motor planning in both performance and clinical rehabilitation environments
Cross-dimensional Weighting for Aggregated Deep Convolutional Features
We propose a simple and straightforward way of creating powerful image
representations via cross-dimensional weighting and aggregation of deep
convolutional neural network layer outputs. We first present a generalized
framework that encompasses a broad family of approaches and includes
cross-dimensional pooling and weighting steps. We then propose specific
non-parametric schemes for both spatial- and channel-wise weighting that boost
the effect of highly active spatial responses and at the same time regulate
burstiness effects. We experiment on different public datasets for image search
and show that our approach outperforms the current state-of-the-art for
approaches based on pre-trained networks. We also provide an easy-to-use, open
source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and
Social Media (VSM), ECCV 201
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
Language with Vision: a Study on Grounded Word and Sentence Embeddings
Grounding language in vision is an active field of research seeking to
construct cognitively plausible word and sentence representations by
incorporating perceptual knowledge from vision into text-based representations.
Despite many attempts at language grounding, achieving an optimal equilibrium
between textual representations of the language and our embodied experiences
remains an open field. Some common concerns are the following. Is visual
grounding advantageous for abstract words, or is its effectiveness restricted
to concrete words? What is the optimal way of bridging the gap between text and
vision? To what extent is perceptual knowledge from images advantageous for
acquiring high-quality embeddings? Leveraging the current advances in machine
learning and natural language processing, the present study addresses these
questions by proposing a simple yet very effective computational grounding
model for pre-trained word embeddings. Our model effectively balances the
interplay between language and vision by aligning textual embeddings with
visual information while simultaneously preserving the distributional
statistics that characterize word usage in text corpora. By applying a learned
alignment, we are able to indirectly ground unseen words including abstract
words. A series of evaluations on a range of behavioural datasets shows that
visual grounding is beneficial not only for concrete words but also for
abstract words, lending support to the indirect theory of abstract concepts.
Moreover, our approach offers advantages for contextualized embeddings, such as
those generated by BERT, but only when trained on corpora of modest,
cognitively plausible sizes. Code and grounded embeddings for English are
available at https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2
fMRI Evidence for Modality-Specific Processing of Conceptual Knowledge on Six Modalities
Traditional theories assume that amodal representations, such as feature lists and semantic
networks, represent conceptual knowledge about the world. According to this view, the
sensory, motor, and introspective states that arise during perception and action are irrelevant
to representing knowledge. Instead the conceptual system lies outside modality-specific
systems and operates according to different principles. Increasingly, however, researchers
report that modality-specific systems become active during purely conceptual tasks,
suggesting that these systems play central roles in representing knowledge (for a review, see
Martin, 2001, Handbook of Functional Neuroimaging of Cognition). In particular,
researchers report that the visual system becomes active while processing visual properties,
and that the motor system becomes active while processing action properties. The present
study corroborates and extends these findings. During fMRI, subjects verified whether or not
properties could potentially be true of concepts (e.g., BLENDER-loud). Subjects received
only linguistic stimuli, and nothing was said about using imagery. Highly related false
properties were used on false trials to block word association strategies (e.g., BUFFALOwinged).
To assess the full extent of the modality-specific hypothesis, properties were
verified on each of six modalities. Examples include GEMSTONE-glittering (vision),
BLENDER-loud (audition), FAUCET-turned (motor), MARBLE-cool (touch),
CUCUMBER-bland (taste), and SOAP-perfumed (smell). Neural activity during property
verification was compared to a lexical decision baseline. For all six sets of the modalityspecific
properties, significant activation was observed in the respective neural system.
Finding modality-specific processing across six modalities contributes to the growing
conclusion that knowledge is grounded in modality-specific systems of the brain
The real-time learning mechanism of the Scientific Research Associates Advanced Robotic System (SRAARS)
Scientific research associates advanced robotic system (SRAARS) is an intelligent robotic system which has autonomous learning capability in geometric reasoning. The system is equipped with one global intelligence center (GIC) and eight local intelligence centers (LICs). It controls mainly sixteen links with fourteen active joints, which constitute two articulated arms, an extensible lower body, a vision system with two CCD cameras and a mobile base. The on-board knowledge-based system supports the learning controller with model representations of both the robot and the working environment. By consecutive verifying and planning procedures, hypothesis-and-test routines and learning-by-analogy paradigm, the system would autonomously build up its own understanding of the relationship between itself (i.e., the robot) and the focused environment for the purposes of collision avoidance, motion analysis and object manipulation. The intelligence of SRAARS presents a valuable technical advantage to implement robotic systems for space exploration and space station operations
Object segregation and local gist vision using low-level geometry
Multi-scale representations of lines, edges and keypoints on the basis of simple, complex, and end-stopped cells can be used for object categorisation and recognition. These representations are complemented by saliency maps of colour, texture, disparity and motion information, which also serve to model extremely fast gist vision in parallel with object segregation. We present a low-level geometry model based on a single type of self-adjusting grouping cell, with a circular array of dendrites connected to edge cells located at several angles. Different angles between active edge cells allow the grouping cell to detect geometric primitives like corners, bars and blobs. Such primitives forming different configurations can then be grouped to identify more complex geometry, like object shapes, without much additional effort. The speed of the model permits it to be used for fast gist vision, assuming that edge cells respond to transients in colour, texture, disparity and motion. The big advantage of combining this information at a low level is that local (object) gist can be extracted first, ie, which types of objects are about where in a scene, after which global (scene) gist can be processed at a semantic level
- …