151,468 research outputs found
Exemplar-based Linear Discriminant Analysis for Robust Object Tracking
Tracking-by-detection has become an attractive tracking technique, which
treats tracking as a category detection problem. However, the task in tracking
is to search for a specific object, rather than an object category as in
detection. In this paper, we propose a novel tracking framework based on
exemplar detector rather than category detector. The proposed tracker is an
ensemble of exemplar-based linear discriminant analysis (ELDA) detectors. Each
detector is quite specific and discriminative, because it is trained by a
single object instance and massive negatives. To improve its adaptivity, we
update both object and background models. Experimental results on several
challenging video sequences demonstrate the effectiveness and robustness of our
tracking algorithm.Comment: ICIP201
Multiple video object tracking using variational inference
In this article a Bayesian filter approximation is proposed for simultaneous multiple target detection and tracking and then applied for object detection on video from moving camera. The inference uses the evidence lower bound optimisation for Gaussian mixtures. The proposed filter is capable of real time data processing and may be used as a basis for data fusion. The method we propose was tested on the video with dynamic background,where the velocity with respect to the background is used to discriminate the objects. The framework does not depend on the feature space, that means that different feature spaces can be unrestrictedly used while preserving the structure of the filter
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection
Accurate polyp detection is essential for assisting clinical rectal cancer
diagnoses. Colonoscopy videos contain richer information than still images,
making them a valuable resource for deep learning methods. Great efforts have
been made to conduct video polyp detection through multi-frame temporal/spatial
aggregation. However, unlike common fixed-camera video, the camera-moving scene
in colonoscopy videos can cause rapid video jitters, leading to unstable
training for existing video detection models. Additionally, the concealed
nature of some polyps and the complex background environment further hinder the
performance of existing video detectors. In this paper, we propose the
\textbf{YONA} (\textbf{Y}ou \textbf{O}nly \textbf{N}eed one \textbf{A}djacent
Reference-frame) method, an efficient end-to-end training framework for video
polyp detection. YONA fully exploits the information of one previous adjacent
frame and conducts polyp detection on the current frame without multi-frame
collaborations. Specifically, for the foreground, YONA adaptively aligns the
current frame's channel activation patterns with its adjacent reference frames
according to their foreground similarity. For the background, YONA conducts
background dynamic alignment guided by inter-frame difference to eliminate the
invalid features produced by drastic spatial jitters. Moreover, YONA applies
cross-frame contrastive learning during training, leveraging the ground truth
bounding box to improve the model's perception of polyp and background.
Quantitative and qualitative experiments on three public challenging benchmarks
demonstrate that our proposed YONA outperforms previous state-of-the-art
competitors by a large margin in both accuracy and speed.Comment: 11 pages, 3 figures, Accepted by MICCAI202
Pedestrian detection for mobile bus surveillance
In this paper, we present a system for pedestrian detection involving scenes captured by mobile bus surveillance cameras in busy city streets. Our approach integrates scene localization, foreground and background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data. In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarities and second stage further clusters these aligned frames in terms of lighting. This produces clusters of images which are differential in viewpoint and lighting. A kernel density estimation (KDE) method for colour and gradient foreground-background separation are then used to construct background model for each image cluster which is subsequently used to detect all foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be identified. We have tested our system on a set of real bus video datasets and the experimental results verify that our system works well in practice.<br /
Semantic Background Subtraction
peer reviewedWe introduce the notion of semantic background subtraction, a novel framework for motion detection in video sequences. The key innovation consists to leverage object-level semantics to address the variety of challenging scenarios for background subtraction. Our framework combines the information of a semantic segmentation algorithm, expressed by a probability for each pixel, with the output of any background subtraction algorithm to reduce false positive detections produced by illumination changes, dynamic backgrounds, strong shadows, and ghosts. In addition, it maintains a fully semantic background model to improve the detection of camouflaged foreground objects. Experiments led on the CDNet dataset show that we managed to improve, significantly, almost all background subtraction algorithms of the CDNet leaderboard, and reduce the mean overall error rate of all the 34 algorithms (resp. of the best 5 algorithms) by roughly 50% (resp. 20%). Note that a C++ implementation of the framework is available at http://www.telecom.ulg.ac.be/semantic
Background Subtraction for Night Videos
Motion analysis is important in video surveillance systems and background subtraction is useful for moving object detection in such systems. However, most of the existing background subtraction methods do not work well for surveillance systems in the evening because objects are usually dark and reflected light is usually strong. To resolve these issues, we propose a framework that utilizes a Weber contrast descriptor, a texture feature extractor, and a light detection unit, to extract the features of foreground objects. We propose a local pattern enhancement method. For the light detection unit, our method utilizes the finding that lighted areas in the evening usually have a low saturation in hue-saturation-value and hue-saturation-lightness color spaces. Finally, we update the background model and the foreground objects in the framework. This approach is able to improve foreground object detection in night videos, which do not need a large data set for pre-training
Video content analysis for intelligent forensics
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes.
To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects.
To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression.
In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain.
The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images.
Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild
Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation
Object detection is a fundamental step for automated video analysis in many
vision applications. Object detection in a video is usually performed by object
detectors or background subtraction techniques. Often, an object detector
requires manually labeled examples to train a binary classifier, while
background subtraction needs a training sequence that contains no objects to
build a background model. To automate the analysis, object detection without a
separate training phase becomes a critical task. People have tried to tackle
this task by using motion information. But existing motion-based methods are
usually limited when coping with complex scenarios such as nonrigid motion and
dynamic background. In this paper, we show that above challenges can be
addressed in a unified framework named DEtecting Contiguous Outliers in the
LOw-rank Representation (DECOLOR). This formulation integrates object detection
and background learning into a single process of optimization, which can be
solved by an alternating algorithm efficiently. We explain the relations
between DECOLOR and other sparsity-based methods. Experiments on both simulated
data and real sequences demonstrate that DECOLOR outperforms the
state-of-the-art approaches and it can work effectively on a wide range of
complex scenarios.Comment: 30 page
- …