138 research outputs found
Learned Monocular Depth Priors in Visual-Inertial Initialization
Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR
and autonomous robotic systems today, in both academia and industry. However,
these systems are highly sensitive to the initialization of key parameters such
as sensor biases, gravity direction, and metric scale. In practical scenarios
where high-parallax or variable acceleration assumptions are rarely met (e.g.
hovering aerial robot, smartphone AR user not gesticulating with phone),
classical visual-inertial initialization formulations often become
ill-conditioned and/or fail to meaningfully converge. In this paper we target
visual-inertial initialization specifically for these low-excitation scenarios
critical to in-the-wild usage. We propose to circumvent the limitations of
classical visual-inertial structure-from-motion (SfM) initialization by
incorporating a new learning-based measurement as a higher-level input. We
leverage learned monocular depth images (mono-depth) to constrain the relative
depth of features, and upgrade the mono-depth to metric scale by jointly
optimizing for its scale and shift. Our experiments show a significant
improvement in problem conditioning compared to a classical formulation for
visual-inertial initialization, and demonstrate significant accuracy and
robustness improvements relative to the state-of-the-art on public benchmarks,
particularly under motion-restricted scenarios. We further extend this
improvement to implementation within an existing odometry system to illustrate
the impact of our improved initialization method on resulting tracking
trajectories
Inductive learning spatial attention
This paper investigates the automatic induction of spatial attention
from the visual observation of objects manipulated
on a table top. In this work, space is represented in terms of
a novel observer-object relative reference system, named Local
Cardinal System, defined upon the local neighbourhood
of objects on the table. We present results of applying the
proposed methodology on five distinct scenarios involving
the construction of spatial patterns of coloured blocks
Linear vs. Nonlinear Feature Combination for Saliency Computation: A Comparison with Human Vision
Modelling Visual Search with the Selective Attention for Identification Model (VS-SAIM): A Novel Explanation for Visual Search Asymmetries
In earlier work, we developed the Selective Attention for Identification Model (SAIM [16]). SAIM models the human ability to perform translation-invariant object identification in multiple object scenes. SAIM suggests that central for this ability is an interaction between parallel competitive processes in a selection stage and a object identification stage. In this paper, we applied the model to visual search experiments involving simple lines and letters. We presented successful simulation results for asymmetric and symmetric searches and for the influence of background line orientations. Search asymmetry refers to changes in search performance when the roles of target item and non-target item (distractor) are swapped. In line with other models of visual search, the results suggest that a large part of the empirical evidence can be explained by competitive processes in the brain, which are modulated by the similarity between target and distractor. The simulations also suggest that another important factor is the feature properties of distractors. Finally, the simulations indicate that search asymmetries can be the outcome of interactions between top-down (knowledge about search items) and bottom-up (feature of search items) processing. This interaction in VS-SAIM is dominated by a novel mechanism, the knowledge-based on-centre-off-surround receptive field. This receptive field is reminiscent of the classical receptive fields but the exact shape is modulated by both, top-down and bottom-up processes. The paper discusses supporting evidence for the existence of this novel concept
The time course of exogenous and endogenous control of covert attention
Studies of eye-movements and manual response have established that rapid overt selection is largely exogenously driven toward salient stimuli, whereas slower selection is largely endogenously driven to relevant objects. We use the N2pc, an event-related potential index of covert attention, to demonstrate that this time course reflects an underlying pattern in the deployment of covert attention. We find that shifts of attention that occur soon after the onset of a visual search array are directed toward salient, task-irrelevant visual stimuli and are associated with slow responses to the target. In contrast, slower shifts are target-directed and are associated with fast responses. The time course of exogenous and endogenous control provides a framework in which some inconsistent results in the capture literature might be reconciled; capture may occur when attention is rapidly deployed
Investigation of the nanostructure and wear properties of physical vapor deposited CrCuN nanocomposite coatings
We get the algorithms of our ground truths: Designing referential databases in digital image processing.
This article documents the practical efforts of a group of scientists designing an image-processing algorithm for saliency detection. By following the actors of this computer science project, the article shows that the problems often considered to be the starting points of computational models are in fact provisional results of time-consuming, collective and highly material processes that engage habits, desires, skills and values. In the project being studied, problematization processes lead to the constitution of referential databases called 'ground truths' that enable both the effective shaping of algorithms and the evaluation of their performances. Working as important common touchstones for research communities in image processing, the ground truths are inherited from prior problematization processes and may be imparted to subsequent ones. The ethnographic results of this study suggest two complementary analytical perspectives on algorithms: (1) an 'axiomatic' perspective that understands algorithms as sets of instructions designed to solve given problems computationally in the best possible way, and (2) a 'problem-oriented' perspective that understands algorithms as sets of instructions designed to computationally retrieve outputs designed and designated during specific problematization processes. If the axiomatic perspective on algorithms puts the emphasis on the numerical transformations of inputs into outputs, the problem-oriented perspective puts the emphasis on the definition of both inputs and outputs
Mechanical and high pressure tribological properties of nanocrystalline Ti(N,C) and amorphous C:H nanocomposite coatings
Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention
Visual attention is thought to be driven by the interplay between low-level visual features and task dependent information content of local image regions, as well as by spatial viewing biases. Though dependent on experimental paradigms and model assumptions, this idea has given rise to varying claims that either bottom-up or top-down mechanisms dominate visual attention. To contribute toward a resolution of this discussion, here we quantify the influence of these factors and their relative importance in a set of classification tasks. Our stimuli consist of individual image patches (bubbles). For each bubble we derive three measures: a measure of salience based on low-level stimulus features, a measure of salience based on the task dependent information content derived from our subjects' classification responses and a measure of salience based on spatial viewing biases. Furthermore, we measure the empirical salience of each bubble based on our subjects' measured eye gazes thus characterizing the overt visual attention each bubble receives. A multivariate linear model relates the three salience measures to overt visual attention. It reveals that all three salience measures contribute significantly. The effect of spatial viewing biases is highest and rather constant in different tasks. The contribution of task dependent information is a close runner-up. Specifically, in a standardized task of judging facial expressions it scores highly. The contribution of low-level features is, on average, somewhat lower. However, in a prototypical search task, without an available template, it makes a strong contribution on par with the two other measures. Finally, the contributions of the three factors are only slightly redundant, and the semi-partial correlation coefficients are only slightly lower than the coefficients for full correlations. These data provide evidence that all three measures make significant and independent contributions and that none can be neglected in a model of human overt visual attention
- …