405,021 research outputs found

    Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling

    Full text link
    The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between centre and surround classes. Discriminant power of features for the classification is measured as mutual information between distributions of image features and corresponding classes . As the estimated discrepancy very much depends on considered scale level, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden Markov Tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, a saliency value for each square block at each scale level is computed with discriminant power principle. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multi-scale discriminant saliency (MDIS) method against the well-know information based approach AIM on its released image collection with eye-tracking data. Simulation results are presented and analysed to verify the validity of MDIS as well as point out its limitation for further research direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396

    Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing

    Get PDF
    The brain tracks and encodes multi‐level speech features during spoken language processing. It is evident that this speech tracking is dominant at low frequencies (<8 Hz) including delta and theta bands. Recent research has demonstrated distinctions between delta‐ and theta‐band tracking but has not elucidated how they differentially encode speech across linguistic levels. Here, we hypothesised that delta‐band tracking encodes prediction errors (enhanced processing of unexpected features) while theta‐band tracking encodes neural sharpening (enhanced processing of expected features) when people perceive speech with different linguistic contents. EEG responses were recorded when normal‐hearing participants attended to continuous auditory stimuli that contained different phonological/morphological and semantic contents: (1) real‐words, (2) pseudo‐words and (3) time‐reversed speech. We employed multivariate temporal response functions to measure EEG reconstruction accuracies in response to acoustic (spectrogram), phonetic and phonemic features with the partialling procedure that singles out unique contributions of individual features. We found higher delta‐band accuracies for pseudo‐words than real‐words and time‐reversed speech, especially during encoding of phonetic features. Notably, individual time‐lag analyses showed that significantly higher accuracies for pseudo‐words than real‐words started at early processing stages for phonetic encoding (<100 ms post‐feature) and later stages for acoustic and phonemic encoding (>200 and 400 ms post‐feature, respectively). Theta‐band accuracies, on the other hand, were higher when stimuli had richer linguistic content (real‐words > pseudo‐words > time‐reversed speech). Such effects also started at early stages (<100 ms post‐feature) during encoding of all individual features or when all features were combined. We argue these results indicate that delta‐band tracking may play a role in predictive coding leading to greater tracking of pseudo‐words due to the presence of unexpected/unpredicted semantic information, while theta‐band tracking encodes sharpened signals caused by more expected phonological/morphological and semantic contents. Early presence of these effects reflects rapid computations of sharpening and prediction errors. Moreover, by measuring changes in EEG alpha power, we did not find evidence that the observed effects can be solitarily explained by attentional demands or listening efforts. Finally, we used directed information analyses to illustrate feedforward and feedback information transfers between prediction errors and sharpening across linguistic levels, showcasing how our results fit with the hierarchical Predictive Coding framework. Together, we suggest the distinct roles of delta and theta neural tracking for sharpening and predictive coding of multi‐level speech features during spoken language processing

    Pose Invariant Face Recognition and Tracking for Human Identification

    Get PDF
    Real-time tracking and recognition of people in complex environments has been a widely researched area in computer vision as it has a huge potential in efficient security automation and surveillance. We propose a real time system for detection and recognition of individuals in a scene by detecting, recognizing and tracking faces. The system integrates the multi-view face detection algorithm, the multi-pose face recognition algorithm and the extended multi-pose Kalman face tracker. The multi-view face detection algorithm contains the frontal face and profile face detectors which extract the Haar-like features and detect faces at any pose by a cascade of boosted classifiers. The pose of the face is inherently determined from the face detection algorithm and is used in the multi-pose face recognition module where depending on the pose, the detected face is compared with a particular set of trained faces having the same pose range. The pose range of the trained faces is divided into bins onto which the faces are sorted and each bin is trained separately to have its own Eigenspace. The human faces are recognized by projecting them onto a suitable Eigenspace corresponding to the determined pose using Weighted Modular Principal Component Analysis (WMPCA) technique and then, are tracked using the proposed multiple face tracker. This tracker is implemented by extracting suitable face features which are represented by a variant of WMPCA and then tracking these features across the scene using the Kalman filter. This low-level system is created using the same face database of twenty unrelated people trained using WMPCA and classification is performed using a feature correlation metric. This system has the advantage of recognizing and tracking an individual in a cluttered environment with varying pose variations.https://ecommons.udayton.edu/stander_posters/1240/thumbnail.jp

    A Unified Model for Tracking and Image-Video Detection Has More Power

    Full text link
    Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasoning about track identities and spatio-temporal trajectories, shares similar spirits with video OD. However, most MOT datasets are class-specific (e.g., person-annotated only), which constrains a model's flexibility to perform tracking on other objects. We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model. To handle the discrepancies and semantic overlaps across datasets, TrIVD formulates detection/tracking as grounding and reasons about object categories via visual-text alignments. The unified formulation enables cross-dataset, multi-task training, and thus equips TrIVD with the ability to leverage frame-level features, video-level spatio-temporal relations, as well as track identity associations. With such joint training, we can now extend the knowledge from OD data, that comes with much richer object category annotations, to MOT and achieve zero-shot tracking capability. Experiments demonstrate that TrIVD achieves state-of-the-art performances across all image/video OD and MOT tasks.Comment: (13 pages, 4 figures
    corecore