43,550 research outputs found
Ear detection with convolutional neural networks
Object detection is still considered a difficult task in the field of computer vision. Specifically, earlobe detection has become a popular application as the interest in human identification using earlobe biometry has increased. So far earlobe detection problem has been solved using a combination of skin detection, edge detection, segmentation by fusion of histogram-based k-means, and template matching algorithms. In this work we present a method of earlobe detection without template matching by using a convolutional neural network, performing image segmentation. With this method, which is invariant to angle at which the photo was taken, earlobe shape, skin color, illumination, occlusions, and earlobe accessories, we were able to accurately detect the area of the image, where an earlobe is present. Moreover, detection time was significantly improved when compared to other methods for solving the same task. We expect our method to be used in Annotated Web Ears Toolbox
Ear detection with convolutional neural networks
Object detection is still considered a difficult task in the field of computer vision. Specifically, earlobe detection has become a popular application as the interest in human identification using earlobe biometry has increased. So far earlobe detection problem has been solved using a combination of skin detection, edge detection, segmentation by fusion of histogram-based k-means, and template matching algorithms. In this work we present a method of earlobe detection without template matching by using a convolutional neural network, performing image segmentation. With this method, which is invariant to angle at which the photo was taken, earlobe shape, skin color, illumination, occlusions, and earlobe accessories, we were able to accurately detect the area of the image, where an earlobe is present. Moreover, detection time was significantly improved when compared to other methods for solving the same task. We expect our method to be used in Annotated Web Ears Toolbox
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Visual Object Recognition and Tracking of Tools
A method has been created to automatically build an algorithm off-line, using computer-aided design (CAD) models, and to apply this at runtime. The object type is discriminated, and the position and orientation are identified. This system can work with a single image and can provide improved performance using multiple images provided from videos. The spatial processing unit uses three stages: (1) segmentation; (2) initial type, pose, and geometry (ITPG) estimation; and (3) refined type, pose, and geometry (RTPG) calculation. The image segmentation module files all the tools in an image and isolates them from the background. For this, the system uses edge-detection and thresholding to find the pixels that are part of a tool. After the pixels are identified, nearby pixels are grouped into blobs. These blobs represent the potential tools in the image and are the product of the segmentation algorithm. The second module uses matched filtering (or template matching). This approach is used for condensing synthetic images using an image subspace that captures key information. Three degrees of orientation, three degrees of position, and any number of degrees of freedom in geometry change are included. To do this, a template-matching framework is applied. This framework uses an off-line system for calculating template images, measurement images, and the measurements of the template images. These results are used online to match segmented tools against the templates. The final module is the RTPG processor. Its role is to find the exact states of the tools given initial conditions provided by the ITPG module. The requirement that the initial conditions exist allows this module to make use of a local search (whereas the ITPG module had global scope). To perform the local search, 3D model matching is used, where a synthetic image of the object is created and compared to the sensed data. The availability of low-cost PC graphics hardware allows rapid creation of synthetic images. In this approach, a function of orientation, distance, and articulation is defined as a metric on the difference between the captured image and a synthetic image with an object in the given orientation, distance, and articulation. The synthetic image is created using a model that is looked up in an object-model database. A composable software architecture is used for implementation. Video is first preprocessed to remove sensor anomalies (like dead pixels), and then is processed sequentially by a prioritized list of tracker-identifiers
Planogram Compliance Checking Based on Detection of Recurring Patterns
In this paper, a novel method for automatic planogram compliance checking in
retail chains is proposed without requiring product template images for
training. Product layout is extracted from an input image by means of
unsupervised recurring pattern detection and matched via graph matching with
the expected product layout specified by a planogram to measure the level of
compliance. A divide and conquer strategy is employed to improve the speed.
Specifically, the input image is divided into several regions based on the
planogram. Recurring patterns are detected in each region respectively and then
merged together to estimate the product layout. Experimental results on real
data have verified the efficacy of the proposed method. Compared with a
template-based method, higher accuracies are achieved by the proposed method
over a wide range of products.Comment: Accepted by MM (IEEE Multimedia Magazine) 201
Real-Time RGB-D based Template Matching Pedestrian Detection
Pedestrian detection is one of the most popular topics in computer vision and
robotics. Considering challenging issues in multiple pedestrian detection, we
present a real-time depth-based template matching people detector. In this
paper, we propose different approaches for training the depth-based template.
We train multiple templates for handling issues due to various upper-body
orientations of the pedestrians and different levels of detail in depth-map of
the pedestrians with various distances from the camera. And, we take into
account the degree of reliability for different regions of sliding window by
proposing the weighted template approach. Furthermore, we combine the
depth-detector with an appearance based detector as a verifier to take
advantage of the appearance cues for dealing with the limitations of depth
data. We evaluate our method on the challenging ETH dataset sequence. We show
that our method outperforms the state-of-the-art approaches.Comment: published in ICRA 201
- …