89,658 research outputs found
Hand and face segmentation using motion and colour cues in digital image sequences
© 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.In this paper, we present a hand and face segmentation algorithm using motion and color cues. The algorithm is proposed for the content based representation of sign language image sequences, where the hands and face constitute a video object. Our hand and face segmentation algorithm consists of three stages, namely color segmentation, temporal segmentation, and video object plane generation. In color segmentation, we model the skin color as a normal distribution and classify each pixel as skin or non-skin based on its Mahalanobis distance. The aim of temporal segmentation is to localize moving objects in image sequences. A statistical variance test is employed to detect object motion between two consecutive images. Finally, the results from color and temporal segmentation are analyzed to yield a change detection mask. The performance of the algorithm is illustrated by simulation carried out on the silent test sequence.Nariman Habili ; Cheng-Chew Lim ; Alireza Moin
Using encoder-decoder architecture for material segmentation based on beam profile analysis
Abstract. Recognition and segmentation of materials has proven to be a challenging problem because of the wide divergence in appearance within and between categories. Many recent material segmentation approaches treat materials as yet another set of labels like objects. However, materials are basically different from objects as they have no basic shape or defined spatial extent. Our approach roughly ignores this and can primarily take advantage of limited implicit context (local appearance) as it seems during training, because our training images that almost do not have a global image context; such as (I) where the used materials have no inherent shape or defined spatial extent like apple, orange and potato approximately have the same spherical shape; (II) besides, images where taken under a black background, which roughly removes the spatial features of the materials.
We introduce a new materials segmentation dataset, which was taken with a Beam Profile Analysis sensing device. The dataset contains 10 material categories, and it has image pair samples consisting of grayscale images with and without the laser spots (grayscale and laser images) in addition to annotated segmented images.
To the best of our knowledge, this is the first material segmentation dataset for Beam Profile Analysis images. As a second step, we proposed a deep learning approach to perform material segmentation on our dataset; our proposed CNNs is an encoder-decoder model, which is based on the DeeplabV3+ model. Our main goal is to obtain segmented material maps and discover how the laser spots contribute to the segmentation results; therefore, we perform a comparative analysis across different types of architectures to observe how the laser spots contribute to the whole segmentation. We built our experiments on three main types of models that use a different type of input; for each model, we implemented various types of backbone architectures. Our experiments results show that the laser spots have an efficient contribution on the segmentation results. GrayLaser model achieves a significant accuracy improvement compared to other models, where the fine-tuned architecture of this model has reached an accuracy of 94% over MIoU metric, and one trained from the scratch has reached an accuracy of 62% over MIoU
Temporally coherent 3D point cloud video segmentation in generic scenes
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Video segmentation is an important building block for high level applications, such as scene understanding and interaction analysis. While outstanding results are achieved in this field by the state-of-the-art learning and model-based methods, they are restricted to certain types of scenes or require a large amount of annotated training data to achieve object segmentation in generic scenes. On the other hand, RGBD data, widely available with the introduction of consumer depth sensors, provide actual world 3D geometry compared with 2D images. The explicit geometry in RGBD data greatly help in computer vision tasks, but the lack of annotations in this type of data may also hinder the extension of learning-based methods to RGBD. In this paper, we present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry in RGBD. Our proposal is only based on low level features, such as connectivity and compactness. We exploit temporal coherence by representing the rough estimation of objects in a single frame with a hierarchical structure and propagating this hierarchy along time. The hierarchical structure provides an efficient way to establish temporal correspondences at different scales of object-connectivity and to temporally manage the splits and merges of objects. This allows updating the segmentation according to the evidence observed in the history. The proposed method is evaluated on several challenging data sets, with promising results for the presented approach.Peer ReviewedPostprint (author's final draft
Recommended from our members
Classification of Material Surfaces Using the Polarization of Specular Highlights
Recently there has been interest, in computer vision research, in the segmentation of images based upon the actual material makeup of the objects or object parts that constitute image regions. The idea is to identify image characteristics which can be used to predict the material properties of objects that are being imaged. A majority of object surfaces can be simply classified according to their basic electrical properties; metal objects (e.g. Aluminum, Copper) conduct electricity rather well while dielectric objects (e.g. Rubber, Plastic, Ceramic) conduct electricity poorly. Distinguishing image regions according to whether they correspond to metal or dielectric material can provide important information for scene understanding especially in industrial machine vision. One such major application is circuit board inspection where the presence of dielectric or metal material in the wrong place can cause trouble. A previous approach to the problem of identifying metal or dielectric material in images is based upon careful spectral (i.e. color) analysis of reflected light from material objects. This paper presents a technique for identifying the material properties of objects in an image using a polarizing lens (i.e. Polaroid filter). Two images of the same scene are taken with a polarizing lens placed in front of a camera in two different respective orientations. Effectively these two images represent two linearly independent polarization components of the reflected light. It is shown that when the linearly independent components of polarization are taken parallel and perpendicular with respect to the plane in which specular rays travel that dielectric objects can be distinguished from metallic objects when specular highlights are present. In particular the two polarization components are very similar at specular highlights on metals while the two polarization components for specular highlights on dielectrics are very different, the perpendicular component having much larger magnitude than the parallel component. This is shown to hold regardless of whether the surface is polished or rough. Results for coated surfaces will be presented at a future date
VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector Quantization
We propose VQ-NeRF, a two-branch neural network model that incorporates
Vector Quantization (VQ) to decompose and edit reflectance fields in 3D scenes.
Conventional neural reflectance fields use only continuous representations to
model 3D scenes, despite the fact that objects are typically composed of
discrete materials in reality. This lack of discretization can result in noisy
material decomposition and complicated material editing. To address these
limitations, our model consists of a continuous branch and a discrete branch.
The continuous branch follows the conventional pipeline to predict decomposed
materials, while the discrete branch uses the VQ mechanism to quantize
continuous materials into individual ones. By discretizing the materials, our
model can reduce noise in the decomposition process and generate a segmentation
map of discrete materials. Specific materials can be easily selected for
further editing by clicking on the corresponding area of the segmentation
outcomes. Additionally, we propose a dropout-based VQ codeword ranking strategy
to predict the number of materials in a scene, which reduces redundancy in the
material segmentation process. To improve usability, we also develop an
interactive interface to further assist material editing. We evaluate our model
on both computer-generated and real-world scenes, demonstrating its superior
performance. To the best of our knowledge, our model is the first to enable
discrete material editing in 3D scenes.Comment: Accepted by TVCG. Project Page:
https://jtbzhl.github.io/VQ-NeRF.github.io
Moving object detection unaffected by cast shadows, highlights and ghosts
IEEE Copyright Policies:
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.This paper describes a new approach to perform segmentation of moving objects in real-time from images acquired by a fixed color video camera and is the first tool of a major project that aspires to recognize abnormal human behavior in public areas. The moving objects detection is based on
background subtraction and it is unaffected by changes in illumination, i.e., cast shadows and highlights. Furthermore it does not require a special attention during the initialization process, due to its ability to detect and rectify ghosts. The results show that with image resolutions of 380x280 at 24 bits per pixel, the time spent in the segmentation process is around 80ms, in a 32
bits 3GHz processor based computer.Fundação para a Ciência e a Tecnologia (FCT
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
- …