405,021 research outputs found
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing
The brain tracks and encodes multi‐level speech features during spoken language processing. It is evident that this speech tracking is dominant at low frequencies (<8 Hz) including delta and theta bands. Recent research has demonstrated distinctions between delta‐ and theta‐band tracking but has not elucidated how they differentially encode speech across linguistic levels. Here, we hypothesised that delta‐band tracking encodes prediction errors (enhanced processing of unexpected features) while theta‐band tracking encodes neural sharpening (enhanced processing of expected features) when people perceive speech with different linguistic contents. EEG responses were recorded when normal‐hearing participants attended to continuous auditory stimuli that contained different phonological/morphological and semantic contents: (1) real‐words, (2) pseudo‐words and (3) time‐reversed speech. We employed multivariate temporal response functions to measure EEG reconstruction accuracies in response to acoustic (spectrogram), phonetic and phonemic features with the partialling procedure that singles out unique contributions of individual features. We found higher delta‐band accuracies for pseudo‐words than real‐words and time‐reversed speech, especially during encoding of phonetic features. Notably, individual time‐lag analyses showed that significantly higher accuracies for pseudo‐words than real‐words started at early processing stages for phonetic encoding (<100 ms post‐feature) and later stages for acoustic and phonemic encoding (>200 and 400 ms post‐feature, respectively). Theta‐band accuracies, on the other hand, were higher when stimuli had richer linguistic content (real‐words > pseudo‐words > time‐reversed speech). Such effects also started at early stages (<100 ms post‐feature) during encoding of all individual features or when all features were combined. We argue these results indicate that delta‐band tracking may play a role in predictive coding leading to greater tracking of pseudo‐words due to the presence of unexpected/unpredicted semantic information, while theta‐band tracking encodes sharpened signals caused by more expected phonological/morphological and semantic contents. Early presence of these effects reflects rapid computations of sharpening and prediction errors. Moreover, by measuring changes in EEG alpha power, we did not find evidence that the observed effects can be solitarily explained by attentional demands or listening efforts. Finally, we used directed information analyses to illustrate feedforward and feedback information transfers between prediction errors and sharpening across linguistic levels, showcasing how our results fit with the hierarchical Predictive Coding framework. Together, we suggest the distinct roles of delta and theta neural tracking for sharpening and predictive coding of multi‐level speech features during spoken language processing
Pose Invariant Face Recognition and Tracking for Human Identification
Real-time tracking and recognition of people in complex environments has been a widely researched area in computer vision as it has a huge potential in efficient security automation and surveillance. We propose a real time system for detection and recognition of individuals in a scene by detecting, recognizing and tracking faces. The system integrates the multi-view face detection algorithm, the multi-pose face recognition algorithm and the extended multi-pose Kalman face tracker. The multi-view face detection algorithm contains the frontal face and profile face detectors which extract the Haar-like features and detect faces at any pose by a cascade of boosted classifiers. The pose of the face is inherently determined from the face detection algorithm and is used in the multi-pose face recognition module where depending on the pose, the detected face is compared with a particular set of trained faces having the same pose range. The pose range of the trained faces is divided into bins onto which the faces are sorted and each bin is trained separately to have its own Eigenspace. The human faces are recognized by projecting them onto a suitable Eigenspace corresponding to the determined pose using Weighted Modular Principal Component Analysis (WMPCA) technique and then, are tracked using the proposed multiple face tracker. This tracker is implemented by extracting suitable face features which are represented by a variant of WMPCA and then tracking these features across the scene using the Kalman filter. This low-level system is created using the same face database of twenty unrelated people trained using WMPCA and classification is performed using a feature correlation metric. This system has the advantage of recognizing and tracking an individual in a cluttered environment with varying pose variations.https://ecommons.udayton.edu/stander_posters/1240/thumbnail.jp
A Unified Model for Tracking and Image-Video Detection Has More Power
Objection detection (OD) has been one of the most fundamental tasks in
computer vision. Recent developments in deep learning have pushed the
performance of image OD to new heights by learning-based, data-driven
approaches. On the other hand, video OD remains less explored, mostly due to
much more expensive data annotation needs. At the same time, multi-object
tracking (MOT) which requires reasoning about track identities and
spatio-temporal trajectories, shares similar spirits with video OD. However,
most MOT datasets are class-specific (e.g., person-annotated only), which
constrains a model's flexibility to perform tracking on other objects. We
propose TrIVD (Tracking and Image-Video Detection), the first framework that
unifies image OD, video OD, and MOT within one end-to-end model. To handle the
discrepancies and semantic overlaps across datasets, TrIVD formulates
detection/tracking as grounding and reasons about object categories via
visual-text alignments. The unified formulation enables cross-dataset,
multi-task training, and thus equips TrIVD with the ability to leverage
frame-level features, video-level spatio-temporal relations, as well as track
identity associations. With such joint training, we can now extend the
knowledge from OD data, that comes with much richer object category
annotations, to MOT and achieve zero-shot tracking capability. Experiments
demonstrate that TrIVD achieves state-of-the-art performances across all
image/video OD and MOT tasks.Comment: (13 pages, 4 figures
- …