1,204 research outputs found

    Active Collaborative Ensemble Tracking

    Full text link
    A discriminative ensemble tracker employs multiple classifiers, each of which casts a vote on all of the obtained samples. The votes are then aggregated in an attempt to localize the target object. Such method relies on collective competence and the diversity of the ensemble to approach the target/non-target classification task from different views. However, by updating all of the ensemble using a shared set of samples and their final labels, such diversity is lost or reduced to the diversity provided by the underlying features or internal classifiers' dynamics. Additionally, the classifiers do not exchange information with each other while striving to serve the collective goal, i.e., better classification. In this study, we propose an active collaborative information exchange scheme for ensemble tracking. This, not only orchestrates different classifier towards a common goal but also provides an intelligent update mechanism to keep the diversity of classifiers and to mitigate the shortcomings of one with the others. The data exchange is optimized with regard to an ensemble uncertainty utility function, and the ensemble is updated via co-training. The evaluations demonstrate promising results realized by the proposed algorithm for the real-world online tracking.Comment: AVSS 2017 Submissio

    Gaze-Informed egocentric action recognition for memory aid systems

    Get PDF
    Egocentric action recognition has been intensively studied in the fields of computer vision and clinical science with applications in pervasive health-care. The majority of the existing egocentric action recognition techniques utilize the features extracted from either the entire contents or the regions of interest in video frames as the inputs of action classifiers. The former might suffer from moving backgrounds or irrelevant foregrounds usually associated with egocentric action videos, while the latter may be impaired by the mismatch between the calculated and the ground truth regions of interest. This paper proposes a new gaze-informed feature extraction approach, by which the features are extracted from the regions around the gaze points and thus representing the genuine regions of interest from a first person of view. The activity of daily life can then be classified based only on the identified regions using the extracted gaze-informed features. The proposed approach has been further applied to a memory support system for people with poor memory, such as those with Amnesia or dementia, and their carers. The experimental results demonstrate the efficacy of the proposed approach in egocentric action recognition and thus the potential of the memory support tool in health care

    Intelligent summarization of sports videos using automatic saliency detection

    Get PDF
    The aim of this thesis is to present an efficient and intelligent way of creating sports summary videos by automatically identifying the highlights or salient events from one or multiple video footage using computer vision techniques and combining them to form a video summary of the game. The thesis presents a twofold solution -Identification of salient parts from single or multiple video footage of a certain sports event. -Remixing of video by extracting and merging various segments, with effects (such as slow replay) and mixing audio. This project involves applying methods of machine learning and computer vision to identify regions of interest in the video frames and detect action areas and scoring attempts. These methods were developed for the sport of basketball. However, the methods may be tweaked or enhanced for other sports such as football, hockey etc. For creating summary videos, various video processing techniques have been experimented to add certain visual effects to improve the quality of summary videos. The goal has been to deliver a fully automated, fast and robust system that could work with large high definition video files

    Visual saliency computation for image analysis

    Full text link
    Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training

    Unsupervised maritime target detection

    Get PDF
    The unsupervised detection of maritime targets in grey scale video is a difficult problem in maritime video surveillance. Most approaches assume that the camera is static and employ pixel-wise background modelling techniques for foreground detection; other methods rely on colour or thermal information to detect targets. These methods fail in real-world situations when the static camera assumption is violated, and colour or thermal data is unavailable. In defence and security applications, prior information and training samples of targets may be unavailable for training a classifier; the learning of a one class classifier for the background may be impossible as well. Thus, an unsupervised online approach that attempts to learn from the scene data is highly desirable. In this thesis, the characteristics of the maritime scene and the ocean texture are exploited for foreground detection. Two fast and effective methods are investigated for target detection. Firstly, online regionbased background texture models are explored for describing the appearance of the ocean. This approach avoids the need for frame registration because the model is built spatially rather than temporally. The texture appearance of the ocean is described using Local Binary Pattern (LBP) descriptors. Two models are proposed: one model is a Gaussian Mixture (GMM) and the other, referred to as a Sparse Texture Model (STM), is a set of histogram texture distributions. The foreground detections are optimized using a Graph Cut (GC) that enforces spatial coherence. Secondly, feature tracking is investigated as a means of detecting stable features in an image frame that typically correspond to maritime targets; unstable features are background regions. This approach is a Track-Before-Detect (TBD) concept and it is implemented using a hierarchical scheme for motion estimation, and matching of Scale- Invariant Feature Transform (SIFT) appearance features. The experimental results show that these approaches are feasible for foreground detection in maritime video when the camera is either static or moving. Receiver Operating Characteristic (ROC) curves were generated for five test sequences and the Area Under the ROC Curve (AUC) was analyzed for the performance of the proposed methods. The texture models, without GC optimization, achieved an AUC of 0.85 or greater on four out of the five test videos. At 50% True Positive Rate (TPR), these four test scenarios had a False Positive Rate (FPR) of less than 2%. With the GC optimization, an AUC of greater than 0.8 was achieved for all the test cases and the FPR was reduced in all cases when compared to the results without the GC. In comparison to the state of the art in background modelling for maritime scenes, our texture model methods achieved the best performance or comparable performance. The two texture models executed at a reasonable processing frame rate. The experimental results for TBD show that one may detect target features using a simple track score based on the track length. At 50% TPR a FPR of less than 4% is achieved for four out of the five test scenarios. These results are very promising for maritime target detection

    Shape Representations Using Nested Descriptors

    Get PDF
    The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks
    corecore