1,790 research outputs found
Fair comparison of skin detection approaches on publicly available datasets
Skin detection is the process of discriminating skin and non-skin regions in
a digital image and it is widely used in several applications ranging from hand
gesture analysis to track body parts and face detection. Skin detection is a
challenging problem which has drawn extensive attention from the research
community, nevertheless a fair comparison among approaches is very difficult
due to the lack of a common benchmark and a unified testing protocol. In this
work, we investigate the most recent researches in this field and we propose a
fair comparison among approaches using several different datasets. The major
contributions of this work are an exhaustive literature review of skin color
detection approaches, a framework to evaluate and combine different skin
detector approaches, whose source code is made freely available for future
research, and an extensive experimental comparison among several recent methods
which have also been used to define an ensemble that works well in many
different problems. Experiments are carried out in 10 different datasets
including more than 10000 labelled images: experimental results confirm that
the best method here proposed obtains a very good performance with respect to
other stand-alone approaches, without requiring ad hoc parameter tuning. A
MATLAB version of the framework for testing and of the methods proposed in this
paper will be freely available from https://github.com/LorisNann
Recommended from our members
Automated Detection and Counting of Pedestrians on an Urban Roadside
This thesis implements an automated system that counts pedestrians with 85% accuracy. Two approaches have been considered and evaluated in terms of count accuracy, cost and ease of deployment. The first approach employs the Autoscope Solo Terra, a traffic camera which is widely used to monitor vehicular traffic. The Solo Terra supports an image processing-based detector that counts the number of objects crossing user-defined areas in the captured image. The count is updated based on the amount of movement across the selected regions. Therefore, a second approach has been considered that uses a histogram of oriented gradients (HoG), an advanced vision based algorithm proposed by Dalal et al. which distinguishes a pedestrian from a non-pedestrian based on an omega shape formed by the head and shoulders of a human being. The implemented detection software processes video frames that are streamed from a low-cost digital camera. The frames are divided into sub-regions which are scanned for an omega shape whenever movement is detected in those regions. It has been found that the HoG-based approach degrades in performance due to occlusion under dense pedestrian traffic conditions whereas the Solo Terra approach appears to be more robust. Undercounts and overcounts were encountered using the Solo Terra approach. To combat the disadvantages of both the approaches, they were integrated to form a single system where count is incremented predominantly using the Solo Terra. The HoG-based approach corrects the obtained count under certain conditions. A preliminary prototype of the integrated system has been verified
Automatic summarization of rushes video using bipartite graphs
In this paper we present a new approach for automatic summarization of rushes, or unstructured video. Our approach is composed of three major steps. First, based on shot and sub-shot segmentations, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes video. Finally, the presence of faces and motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by TRECVid, using the same dataset and evaluation metrics used in the TRECVid video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement on our own work in terms of the fraction of the TRECVid summary ground truth included and is competitive with the best of other approaches in TRECVid 2007
Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera
Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer
Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera
Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer
Bulge plus disc and S\'ersic decomposition catalogues for 16,908 galaxies in the SDSS Stripe 82 co-adds: A detailed study of the structural measurements
Quantitative characterization of galaxy morphology is vital in enabling
comparison of observations to predictions from galaxy formation theory.
However, without significant overlap between the observational footprints of
deep and shallow galaxy surveys, the extent to which structural measurements
for large galaxy samples are robust to image quality (e.g., depth, spatial
resolution) cannot be established. Deep images from the Sloan Digital Sky
Survey (SDSS) Stripe 82 co-adds provide a unique solution to this problem -
offering magnitudes improvement in depth with respect to SDSS Legacy
images. Having similar spatial resolution to Legacy, the co-adds make it
possible to examine the sensitivity of parametric morphologies to depth alone.
Using the Gim2D surface-brightness decomposition software, we provide public
morphology catalogs for 16,908 galaxies in the Stripe 82 co-adds. Our
methods and selection are completely consistent with the Simard et al. (2011)
and Mendel et al. (2014) photometric decompositions. We rigorously compare
measurements in the deep and shallow images. We find no systematics in total
magnitudes and sizes except for faint galaxies in the -band and the
brightest galaxies in each band. However, characterization of bulge-to-total
fractions is significantly improved in the deep images. Furthermore, statistics
used to determine whether single-S\'ersic or two-component (e.g., bulge+disc)
models are required become more bimodal in the deep images. Lastly, we show
that asymmetries are enhanced in the deep images and that the enhancement is
positively correlated with the asymmetries measured in Legacy images.Comment: 27 pages, 14 figures. MNRAS accepted. Our catalogs are available in
TXT and SQL formats at
http://orca.phys.uvic.ca/~cbottrel/share/Stripe82/Catalogs
Deep Learning-Based SOLO Architecture for Re-Identification of Single Persons by Locations
Analyzing and judging of captured and retrieved images of the targets from the surveillance video cameras for person re-identification have been a herculean task for computer vision that is worth further research. Hence, re-identification of single persons by locations based on single objects by locations (SOLO) model is proposed in this paper. To achieve the re-identification goal, we based the training of the re-identification model on synchronized stochastic gradient descent (SGD). SOLO is capable of exploiting the contextual cues and segmenting individual persons by their motions. The proposed approach consists of the following steps: (1) reformulating the person instance segmentation as: (a) prediction of category and (b) mask generation tasks for each person instance, (2) dividing the input person image into a uniform grids, i.e., G×G grid cells in such a way that a grid cell can predict the category of the semantic and masks of the person instances provided the center of the person falls into the grid cell and (3) conducting person segmentation. Discriminating features of individual persons are obtained by extraction using convolution neural networks. On person re-identification Market-1501 dataset, SOLO model achieved mAP of 84.1% and 93.8% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). On person re-identification CUHK03 dataset, SOLO model achieved mAP of 82.1 % and 90.1% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). These results show that SOLO model achieves best results for person re-identification, indicating high effectiveness of the model. The research contributions are: (1) Application of synchronized stochastic gradient descent (SGD) to SOLO training for person re-identification and (2) Single objects by locations using semantic category branch and instance mask branch instead of detect-then-segment method, thereby converting person instance segmentation into a solvable problem of single-shot classification
Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion
Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene.
Inferring the presence of high-level semantic concepts from low-level visual features is a research
topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene.
In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures
A Training Assistant Tool for the Automated Visual Inspection System
This thesis considers the problem of assisting a human user setting up an automated Visual Inspection (VI) system. The VI system uses a stationary camera on an automobile assembly line to inspect cars as they pass by. The inspection process is intended to identify when parts have been missed or incorrect parts have been assembled. The result is reported to a human working on the assembly line who then can take corrective actions. As originally developed, the system requires a setup phase in which the human user places the camera and records a video of at least 30 minutes length to use for training the system. Training includes specifying regions of cars passing by that are to be inspected. After deployment of a number of systems, it was learned that users could benefit from being provided guidance in best practices to delineate training data. It was also learned that users could benefit from simple visual feedback to ascertain whether or not an inspection problem was suitable for a VI system or if the problem was too challenging. This thesis describes a few methods and a new software tool intended to address this need
- …