40,527 research outputs found
Automatic image segmentation by dynamic region growth and multiresolution merging
Image segmentation is a fundamental task in many computer vision applications. We present a novel unsupervised color image segmentation algorithm named GSEG, which exploits the information obtained from detecting edges in color images. By using a color gradient detection technique, pixels without edges are clustered and labeled individually to identify the image content. Elements that contain higher gradient density are included by a dynamic generation of clusters as the segmentation progresses. By quantizing the colors in the image and extracting texture information from the neighborhood entropy of each pixel, the proposed method obtains accurate models of texture that are highly effective to merge regions that belong to the same object. Experimental results for various image scenarios in comparison with state-of-the-art segmentation techniques demonstrate the performance advantages of the proposed method
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Segmenting Foreground Objects from a Dynamic Textured Background via a Robust Kalman Filter
The algorithm presented in this paper aims to segment the foreground objects in video (e.g., people) given time-varying, textured backgrounds. Examples of time-varying backgrounds include waves on water, clouds moving, trees waving in the wind, automobile traffic, moving crowds, escalators, etc. We have developed a novel foreground-background segmentation algorithm that explicitly accounts for the non-stationary nature and clutter-like appearance of many dynamic textures. The dynamic texture is modeled by an Autoregressive Moving Average Model (ARMA). A robust Kalman filter algorithm iteratively estimates the intrinsic appearance of the dynamic texture, as well as the regions of the foreground objects. Preliminary experiments with this method have demonstrated promising results
Recommended from our members
Dynamic low-level context for the detection of mild traumatic brain injury.
Mild traumatic brain injury (mTBI) appears as low contrast lesions in magnetic resonance (MR) imaging. Standard automated detection approaches cannot detect the subtle changes caused by the lesions. The use of context has become integral for the detection of low contrast objects in images. Context is any information that can be used for object detection but is not directly due to the physical appearance of an object in an image. In this paper, new low-level static and dynamic context features are proposed and integrated into a discriminative voxel-level classifier to improve the detection of mTBI lesions. Visual features, including multiple texture measures, are used to give an initial estimate of a lesion. From the initial estimate novel proximity and directional distance, contextual features are calculated and used as features for another classifier. This feature takes advantage of spatial information given by the initial lesion estimate using only the visual features. Dynamic context is captured by the proposed posterior marginal edge distance context feature, which measures the distance from a hard estimate of the lesion at a previous time point. The approach is validated on a temporal mTBI rat model dataset and shown to have improved dice score and convergence compared to other state-of-the-art approaches. Analysis of feature importance and versatility of the approach on other datasets are also provided
Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network
In recent years, various shadow detection methods from a single image have
been proposed and used in vision systems; however, most of them are not
appropriate for the robotic applications due to the expensive time complexity.
This paper introduces a fast shadow detection method using a deep learning
framework, with a time cost that is appropriate for robotic applications. In
our solution, we first obtain a shadow prior map with the help of multi-class
support vector machine using statistical features. Then, we use a semantic-
aware patch-level Convolutional Neural Network that efficiently trains on
shadow examples by combining the original image and the shadow prior map.
Experiments on benchmark datasets demonstrate the proposed method significantly
decreases the time complexity of shadow detection, by one or two orders of
magnitude compared with state-of-the-art methods, without losing accuracy.Comment: 6 pages, 5 figures, Submitted to IROS 201
Full Reference Objective Quality Assessment for Reconstructed Background Images
With an increased interest in applications that require a clean background
image, such as video surveillance, object tracking, street view imaging and
location-based services on web-based maps, multiple algorithms have been
developed to reconstruct a background image from cluttered scenes.
Traditionally, statistical measures and existing image quality techniques have
been applied for evaluating the quality of the reconstructed background images.
Though these quality assessment methods have been widely used in the past,
their performance in evaluating the perceived quality of the reconstructed
background image has not been verified. In this work, we discuss the
shortcomings in existing metrics and propose a full reference Reconstructed
Background image Quality Index (RBQI) that combines color and structural
information at multiple scales using a probability summation model to predict
the perceived quality in the reconstructed background image given a reference
image. To compare the performance of the proposed quality index with existing
image quality assessment measures, we construct two different datasets
consisting of reconstructed background images and corresponding subjective
scores. The quality assessment measures are evaluated by correlating their
objective scores with human subjective ratings. The correlation results show
that the proposed RBQI outperforms all the existing approaches. Additionally,
the constructed datasets and the corresponding subjective scores provide a
benchmark to evaluate the performance of future metrics that are developed to
evaluate the perceived quality of reconstructed background images.Comment: Associated source code: https://github.com/ashrotre/RBQI, Associated
Database:
https://drive.google.com/drive/folders/1bg8YRPIBcxpKIF9BIPisULPBPcA5x-Bk?usp=sharing
(Email for permissions at: ashrotreasuedu
A dynamic texture based approach to recognition of facial actions and their temporal models
In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set
- …