85,142 research outputs found
Level 2 perspective-taking distinguishes automatic and non-automatic belief-tracking
Little is known about whether human beingsâ automatic mindreading is computationally restricted to processing a limited kind of content, and what exactly the nature of that signature limit might be. We developed a novel object-detection paradigm to test adultsâ automatic processing in a Level 1 perspective-taking (L1PT) context (where an agentâs belief, but not his visuospatial perspective, is relevantly different) and in a Level 2 perspective-taking (L2PT) context (where both the agentâs belief and visuospatial perspective are relevantly different). Experiment 1 uncovered that adultsâ reaction times in the L1PT task were helpfully speeded by a bystanderâs irrelevant belief when tracking two homogenous objects but not in the L2PT task when tracking a single heterogeneous object. The limitation is especially striking given that the heterogeneous nature of the single object was fully revealed to participants as well as the bystander. The results were replicated in two further experiments, which confirmed that the selective modulation of adultsâ reaction times was maintained when tracking the location of a single object (Experiment 2) and when attention checks were removed (Experiment 3). Our findings suggest that automatic mindreading draws upon a distinctively minimalist model of the mental that underspecifies representation of differences in perspective relative to an agentâs position in space
Allowing content-based functionalities in segmentation-based coding schemes
This paper deals with the use of the segmentation tools and principles presented in [10] and [13] for allowing content-based functionalities. In this framework, means for supervised selection of objects in the scene are proposed. In addition, a technique for object tracking in the context of segmentation-based video coding is presented. The technique is independent of the type of segmentation approach used in the coding scheme. The algorithm relies on a double partition of the image that yields spatially homogeneous regions. This double partition permits to obtain the position and shape of the previous object in the current image while computing the projected partition. In order to demonstrate the potentialities of this algorithm, it is applied in a specific coding scheme so that content-based functionalities, such as selective coding, are allowed.Peer ReviewedPostprint (published version
Self-Selective Correlation Ship Tracking Method for Smart Ocean System
In recent years, with the development of the marine industry, navigation
environment becomes more complicated. Some artificial intelligence
technologies, such as computer vision, can recognize, track and count the
sailing ships to ensure the maritime security and facilitates the management
for Smart Ocean System. Aiming at the scaling problem and boundary effect
problem of traditional correlation filtering methods, we propose a
self-selective correlation filtering method based on box regression (BRCF). The
proposed method mainly include: 1) A self-selective model with negative samples
mining method which effectively reduces the boundary effect in strengthening
the classification ability of classifier at the same time; 2) A bounding box
regression method combined with a key points matching method for the scale
prediction, leading to a fast and efficient calculation. The experimental
results show that the proposed method can effectively deal with the problem of
ship size changes and background interference. The success rates and precisions
were higher than Discriminative Scale Space Tracking (DSST) by over 8
percentage points on the marine traffic dataset of our laboratory. In terms of
processing speed, the proposed method is higher than DSST by nearly 22 Frames
Per Second (FPS)
Ambient Sound Provides Supervision for Visual Learning
The sound of crashing waves, the roar of fast-moving cars -- sound conveys
important information about the objects in our surroundings. In this work, we
show that ambient sounds can be used as a supervisory signal for learning
visual models. To demonstrate this, we train a convolutional neural network to
predict a statistical summary of the sound associated with a video frame. We
show that, through this process, the network learns a representation that
conveys information about objects and scenes. We evaluate this representation
on several recognition tasks, finding that its performance is comparable to
that of other state-of-the-art unsupervised learning methods. Finally, we show
through visualizations that the network learns units that are selective to
objects that are often associated with characteristic sounds.Comment: ECCV 201
Towards binocular active vision in a robot head system
This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the âsearchlight metaphorâ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a âstepping stoneâ visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade
- âŠ