1,172 research outputs found
Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics
Dozens of new models on fixation prediction are published every year and
compared on open benchmarks such as MIT300 and LSUN. However, progress in the
field can be difficult to judge because models are compared using a variety of
inconsistent metrics. Here we show that no single saliency map can perform well
under all metrics. Instead, we propose a principled approach to solve the
benchmarking problem by separating the notions of saliency models, maps and
metrics. Inspired by Bayesian decision theory, we define a saliency model to be
a probabilistic model of fixation density prediction and a saliency map to be a
metric-specific prediction derived from the model density which maximizes the
expected performance on that metric given the model density. We derive these
optimal saliency maps for the most commonly used saliency metrics (AUC, sAUC,
NSS, CC, SIM, KL-Div) and show that they can be computed analytically or
approximated with high precision. We show that this leads to consistent
rankings in all metrics and avoids the penalties of using one saliency map for
all metrics. Our method allows researchers to have their model compete on many
different metrics with state-of-the-art in those metrics: "good" models will
perform well in all metrics.Comment: published at ECCV 201
SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification
Automatic classification of epileptic seizure types in electroencephalograms
(EEGs) data can enable more precise diagnosis and efficient management of the
disease. This task is challenging due to factors such as low signal-to-noise
ratios, signal artefacts, high variance in seizure semiology among epileptic
patients, and limited availability of clinical data. To overcome these
challenges, in this paper, we present SeizureNet, a deep learning framework
which learns multi-spectral feature embeddings using an ensemble architecture
for cross-patient seizure type classification. We used the recently released
TUH EEG Seizure Corpus (V1.4.0 and V1.5.2) to evaluate the performance of
SeizureNet. Experiments show that SeizureNet can reach a weighted F1 score of
up to 0.94 for seizure-wise cross validation and 0.59 for patient-wise cross
validation for scalp EEG based multi-class seizure type classification. We also
show that the high-level feature embeddings learnt by SeizureNet considerably
improve the accuracy of smaller networks through knowledge distillation for
applications with low-memory constraints
Contribution of Color Information in Visual Saliency Model for Videos
International audienceMuch research has been concerned with the contribution of the low level features of a visual scene to the deployment of visual attention. Bottom-up saliency models have been developed to predict the location of gaze according to these features. So far, color besides to brightness, contrast and motion is considered as one of the primary features in computing bottom-up saliency. However, its contribution in guiding eye movements when viewing natural scenes has been debated. We investigated the contribution of color information in a bottom-up visual saliency model. The model efficiency was tested using the experimental data obtained on 45 observers who were eye tracked while freely exploring a large data set of color and grayscale videos. The two datasets of recorded eye positions, for grayscale and color videos, were compared with a luminance-based saliency model. We incorporated chrominance information to the model. Results show that color information improves the performance of the saliency model in predicting eye positions
Contextual cropping and scaling of TV productions
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows
On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection
It has become apparent that a Gaussian center bias can serve as an important
prior for visual saliency detection, which has been demonstrated for predicting
human eye fixations and salient object detection. Tseng et al. have shown that
the photographer's tendency to place interesting objects in the center is a
likely cause for the center bias of eye fixations. We investigate the influence
of the photographer's center bias on salient object detection, extending our
previous work. We show that the centroid locations of salient objects in
photographs of Achanta and Liu's data set in fact correlate strongly with a
Gaussian model. This is an important insight, because it provides an empirical
motivation and justification for the integration of such a center bias in
salient object detection algorithms and helps to understand why Gaussian models
are so effective. To assess the influence of the center bias on salient object
detection, we integrate an explicit Gaussian center bias model into two
state-of-the-art salient object detection algorithms. This way, first, we
quantify the influence of the Gaussian center bias on pixel- and segment-based
salient object detection. Second, we improve the performance in terms of F1
score, Fb score, area under the recall-precision curve, area under the receiver
operating characteristic curve, and hit-rate on the well-known data set by
Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we
exemplarily demonstrate that implicit center biases are partially responsible
for the outstanding performance of state-of-the-art algorithms. Last but not
least, as a result of debiasing Cheng et al.'s algorithm, we introduce a
non-biased salient object detection method, which is of interest for
applications in which the image data is not likely to have a photographer's
center bias (e.g., image data of surveillance cameras or autonomous robots)
Winner-take-all selection in a neural system with delayed feedback
We consider the effects of temporal delay in a neural feedback system with
excitation and inhibition. The topology of our model system reflects the
anatomy of the avian isthmic circuitry, a feedback structure found in all
classes of vertebrates. We show that the system is capable of performing a
`winner-take-all' selection rule for certain combinations of excitatory and
inhibitory feedback. In particular, we show that when the time delays are
sufficiently large a system with local inhibition and global excitation can
function as a `winner-take-all' network and exhibit oscillatory dynamics. We
demonstrate how the origin of the oscillations can be attributed to the finite
delays through a linear stability analysis.Comment: 8 pages, 6 figure
Multiscale Discriminant Saliency for Visual Attention
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between center and surround
classes. Discriminant power of features for the classification is measured as
mutual information between features and two classes distribution. The estimated
discrepancy of two feature classes very much depends on considered scale
levels; then, multi-scale structure and discriminant power are integrated by
employing discrete wavelet features and Hidden markov tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, saliency value for
each dyadic square at each scale level is computed with discriminant power
principle and the MAP. Finally, across multiple scales is integrated the final
saliency map by an information maximization rule. Both standard quantitative
tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating
the proposed multiscale discriminant saliency method (MDIS) against the
well-know information-based saliency method AIM on its Bruce Database wity
eye-tracking data. Simulation results are presented and analyzed to verify the
validity of MDIS as well as point out its disadvantages for further research
direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio
A fast neural-dynamical approach to scale-invariant object detection
We present a biologically-inspired method for object detection which is capable of online and one-shot learning of object appearance. We use a computationally efficient model of V1 keypoints to select object parts with the highest information content and model their surroundings by a simple binary descriptor based on responses of cortical cells. We feed these features into a dynamical neural network which binds compatible features together by employing a Bayesian criterion and a set of previously observed object views. We demonstrate the feasibility of our algorithm for cognitive robotic scenarios by evaluating detection performance on a dataset of common household items. © Springer International Publishing Switzerland 2014
A visual programming model to implement coarse-grained DSP applications on parallel and heterogeneous clusters
International audienceThe digital signal processing (DSP) applications are one of the biggest consumers of computing. They process a big data volume which is represented with a high accuracy. They use complex algorithms, and must satisfy a time constraints in most of cases. In the other hand, it's necessary today to use parallel and heterogeneous architectures in order to speedup the processing, where the best examples are the su-percomputers "Tianhe-2" and "Titan" from the top500 ranking. These architectures could contain several connected nodes, where each node includes a number of generalist processor (multi-core) and a number of accelerators (many-core) to finally allows several levels of parallelism. However, for DSP programmers, it's still complicated to exploit all these parallelism levels to reach good performance for their applications. They have to design their implementation to take advantage of all heteroge-neous computing units, taking into account the architecture specifici-ties of each of them: communication model, memory management, data management, jobs scheduling and synchronization . . . etc. In the present work, we characterize DSP applications, and based on their distinctive-ness, we propose a high level visual programming model and an execution model in order to drop down their implementations and in the same time make desirable performances
Visual saliency and semantic incongruency influence eye movements when inspecting pictures
Models of low-level saliency predict that when we first look at a photograph our first few eye movements should be made towards visually conspicuous objects. Two experiments investigated this prediction by recording eye fixations while viewers inspected pictures of room interiors that contained objects with known saliency characteristics. Highly salient objects did attract fixations earlier than less conspicuous objects, but only in a task requiring general encoding of the whole picture. When participants were required to detect the presence of a small target, then the visual saliency of nontarget objects did not influence fixations. These results support modifications of the model that take the cognitive override of saliency into account by allowing task demands to reduce the saliency weights of task-irrelevant objects. The pictures sometimes contained incongruent objects that were taken from other rooms. These objects were used to test the hypothesis that previous reports of the early fixation of congruent objects have not been consistent because the effect depends upon the visual conspicuity of the incongruent object. There was an effect of incongruency in both experiments, with earlier fixation of objects that violated the gist of the scene, but the effect was only apparent for inconspicuous objects, which argues against the hypothesis
- …
