533 research outputs found
The Conditional Lucas & Kanade Algorithm
The Lucas & Kanade (LK) algorithm is the method of choice for efficient dense
image and object alignment. The approach is efficient as it attempts to model
the connection between appearance and geometric displacement through a linear
relationship that assumes independence across pixel coordinates. A drawback of
the approach, however, is its generative nature. Specifically, its performance
is tightly coupled with how well the linear model can synthesize appearance
from geometric displacement, even though the alignment task itself is
associated with the inverse problem. In this paper, we present a new approach,
referred to as the Conditional LK algorithm, which: (i) directly learns linear
models that predict geometric displacement as a function of appearance, and
(ii) employs a novel strategy for ensuring that the generative pixel
independence assumption can still be taken advantage of. We demonstrate that
our approach exhibits superior performance to classical generative forms of the
LK algorithm. Furthermore, we demonstrate its comparable performance to
state-of-the-art methods such as the Supervised Descent Method with
substantially less training examples, as well as the unique ability to "swap"
geometric warp functions without having to retrain from scratch. Finally, from
a theoretical perspective, our approach hints at possible redundancies that
exist in current state-of-the-art methods for alignment that could be leveraged
in vision systems of the future.Comment: 17 pages, 11 figure
Synchronization in Complex Systems Following the Decision Based Queuing Process: The Rhythmic Applause as a Test Case
Living communities can be considered as complex systems, thus a fertile
ground for studies related to their statistics and dynamics. In this study we
revisit the case of the rhythmic applause by utilizing the model proposed by
V\'azquez et al. [A. V\'azquez et al., Phys. Rev. E 73, 036127 (2006)]
augmented with two contradicted {\it driving forces}, namely: {\it
Individuality} and {\it Companionship}. To that extend, after performing
computer simulations with a large number of oscillators we propose an
explanation on the following open questions (a) why synchronization occurs
suddenly, and b) why synchronization is observed when the clapping period
() is ( is the mean self period
of the spectators) and is lost after a time. Moreover, based on the model, a
weak preferential attachment principle is proposed which can produce complex
networks obeying power law in the distribution of number edges per node with
exponent greater than 3.Comment: 16 pages, 5 figure
Phase-based video motion processing
We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.Shell ResearchUnited States. Defense Advanced Research Projects Agency. Soldier Centric Imaging via Computational CamerasNational Science Foundation (U.S.) (CGV-1111415)Cognex CorporationMicrosoft Research (PhD Fellowship)American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi
A point process framework for modeling electrical stimulation of the auditory nerve
Model-based studies of auditory nerve responses to electrical stimulation can
provide insight into the functioning of cochlear implants. Ideally, these
studies can identify limitations in sound processing strategies and lead to
improved methods for providing sound information to cochlear implant users. To
accomplish this, models must accurately describe auditory nerve spiking while
avoiding excessive complexity that would preclude large-scale simulations of
populations of auditory nerve fibers and obscure insight into the mechanisms
that influence neural encoding of sound information. In this spirit, we develop
a point process model of the auditory nerve that provides a compact and
accurate description of neural responses to electric stimulation. Inspired by
the framework of generalized linear models, the proposed model consists of a
cascade of linear and nonlinear stages. We show how each of these stages can be
associated with biophysical mechanisms and related to models of neuronal
dynamics. Moreover, we derive a semi-analytical procedure that uniquely
determines each parameter in the model on the basis of fundamental statistics
from recordings of single fiber responses to electric stimulation, including
threshold, relative spread, jitter, and chronaxie. The model also accounts for
refractory and summation effects that influence the responses of auditory nerve
fibers to high pulse rate stimulation. Throughout, we compare model predictions
to published physiological data and explain differences in auditory nerve
responses to high and low pulse rate stimulation. We close by performing an
ideal observer analysis of simulated spike trains in response to sinusoidally
amplitude modulated stimuli and find that carrier pulse rate does not affect
modulation detection thresholds.Comment: 1 title page, 27 manuscript pages, 14 figures, 1 table, 1 appendi
Culture shapes how we look at faces
Background: Face processing, amongst many basic visual skills, is thought to be invariant across all humans. From as early as 1965, studies of eye movements have consistently revealed a systematic triangular sequence of fixations over the eyes and the mouth, suggesting that faces elicit a universal, biologically-determined information extraction pattern. Methodology/Principal Findings: Here we monitored the eye movements of Western Caucasian and East Asian observers while they learned, recognized, and categorized by race Western Caucasian and East Asian faces. Western Caucasian observers reproduced a scattered triangular pattern of fixations for faces of both races and across tasks. Contrary to intuition, East Asian observers focused more on the central region of the face. Conclusions/Significance: These results demonstrate that face processing can no longer be considered as arising from a universal series of perceptual events. The strategy employed to extract visual information from faces differs across cultures
Local biases drive, but do not determine, the perception of illusory trajectories
When a dot moves horizontally across a set of tilted lines of alternating orientations, the dot appears to be moving up and down along its trajectory. This perceptual phenomenon, known as the slalom illusion, reveals a mismatch between the veridical motion signals and the subjective percept of the motion trajectory, which has not been comprehensively explained. In the present study, we investigated the empirical boundaries of the slalom illusion using psychophysical methods. The phenomenon was found to occur both under conditions of smooth pursuit eye movements and constant fixation, and to be consistently amplified by intermittently occluding the dot trajectory. When the motion direction of the dot was not constant, however, the stimulus display did not elicit the expected illusory percept. These findings confirm that a local bias towards perpendicularity at the intersection points between the dot trajectory and the tilted lines cause the illusion, but also highlight that higher-level cortical processes are involved in interpreting and amplifying the biased local motion signals into a global illusion of trajectory perception
Longer fixation duration while viewing face images
The spatio-temporal properties of saccadic eye movements can be influenced by the cognitive demand and the characteristics of the observed scene. Probably due to its crucial role in social communication, it is argued that face perception may involve different cognitive processes compared with non-face object or scene perception. In this study, we investigated whether and how face and natural scene images can influence the patterns of visuomotor activity. We recorded monkeys’ saccadic eye movements as they freely viewed monkey face and natural scene images. The face and natural scene images attracted similar number of fixations, but viewing of faces was accompanied by longer fixations compared with natural scenes. These longer fixations were dependent on the context of facial features. The duration of fixations directed at facial contours decreased when the face images were scrambled, and increased at the later stage of normal face viewing. The results suggest that face and natural scene images can generate different patterns of visuomotor activity. The extra fixation duration on faces may be correlated with the detailed analysis of facial features
Optical and near-infrared observations of the GRB020405 afterglow
(Abridged) We report on observations of the optical and NIR afterglow of
GRB020405. Ground-based optical observations started about 1 day after the GRB
and spanned a period of ~10 days; archival HST data extended the coverage up to
70 days after the GRB. We report the first detection of the afterglow in NIR
bands. The detection of emission lines in the optical spectrum indicates that
the GRB is located at z = 0.691. Absorptions are also detected at z = 0.691 and
at z = 0.472. The latter system is likely caused by clouds in a galaxy located
2 arcsec southwest of the GRB host. Hence, for the first time, the galaxy
responsible for an intervening absorption system in the spectrum of a GRB
afterglow is identified. Optical and NIR photometry indicates that the decay in
all bands follows a single power law of index alpha = 1.54. The late-epoch VLT
and HST points lie above the extrapolation of this power law, so that a plateau
is apparent in the VRIJ light curves at 10-20 days after the GRB. The light
curves at epochs later than day ~20 after the GRB are consistent with a
power-law decay with index alphaprime = 1.85. We suggest that this deviation
can be modeled with a SN having the same temporal profile as SN2002ap, but 1.3
mag brighter at peak, and located at the GRB redshift. Alternatively, a shock
re-energization may be responsible for the rebrightening. A polarimetric R-band
measurement shows that the afterglow is polarized, with P = 1.5 % and theta =
172 degrees. Optical-NIR spectral flux distributions show a change of slope
across the J band which we interpret as due to the presence of nu_c. The
analysis of the multiwavelength spectrum within the fireball model suggests
that a population of relativistic electrons produces the optical-NIR emission
via synchrotron in an adiabatically expanding blastwave, and the X-rays via IC.Comment: 17 pages, 10 figures, 4 tables, accepted for publication on A&A, main
journa
Action Recognition with a Bio--Inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions
International audienceHere we show that reproducing the functional properties of MT cells with various center--surround interactions enriches motion representation and improves the action recognition performance. To do so, we propose a simplified bio--inspired model of the motion pathway in primates: It is a feedforward model restricted to V1-MT cortical layers, cortical cells cover the visual space with a foveated structure, and more importantly, we reproduce some of the richness of center-surround interactions of MT cells. Interestingly, as observed in neurophysiology, our MT cells not only behave like simple velocity detectors, but also respond to several kinds of motion contrasts. Results show that this diversity of motion representation at the MT level is a major advantage for an action recognition task. Defining motion maps as our feature vectors, we used a standard classification method on the Weizmann database: We obtained an average recognition rate of 98.9%, which is superior to the recent results by Jhuang et al. (2007). These promising results encourage us to further develop bio--inspired models incorporating other brain mechanisms and cortical layers in order to deal with more complex videos
Natural images from the birthplace of the human eye
Here we introduce a database of calibrated natural images publicly available
through an easy-to-use web interface. Using a Nikon D70 digital SLR camera, we
acquired about 5000 six-megapixel images of Okavango Delta of Botswana, a
tropical savanna habitat similar to where the human eye is thought to have
evolved. Some sequences of images were captured unsystematically while
following a baboon troop, while others were designed to vary a single parameter
such as aperture, object distance, time of day or position on the horizon.
Images are available in the raw RGB format and in grayscale. Images are also
available in units relevant to the physiology of human cone photoreceptors,
where pixel values represent the expected number of photoisomerizations per
second for cones sensitive to long (L), medium (M) and short (S) wavelengths.
This database is distributed under a Creative Commons Attribution-Noncommercial
Unported license to facilitate research in computer vision, psychophysics of
perception, and visual neuroscience.Comment: Submitted to PLoS ON
- …
