Search CORE

1,790 research outputs found

Fair comparison of skin detection approaches on publicly available datasets

Author: Lumini Alessandra
Nanni Loris
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to track body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent researches in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work are an exhaustive literature review of skin color detection approaches, a framework to evaluate and combine different skin detector approaches, whose source code is made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the best method here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and of the methods proposed in this paper will be freely available from https://github.com/LorisNann

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Automated Detection and Counting of Pedestrians on an Urban Roadside

Author: Prabhu Gayatri D
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2011
Field of study

This thesis implements an automated system that counts pedestrians with 85% accuracy. Two approaches have been considered and evaluated in terms of count accuracy, cost and ease of deployment. The first approach employs the Autoscope Solo Terra, a traffic camera which is widely used to monitor vehicular traffic. The Solo Terra supports an image processing-based detector that counts the number of objects crossing user-defined areas in the captured image. The count is updated based on the amount of movement across the selected regions. Therefore, a second approach has been considered that uses a histogram of oriented gradients (HoG), an advanced vision based algorithm proposed by Dalal et al. which distinguishes a pedestrian from a non-pedestrian based on an omega shape formed by the head and shoulders of a human being. The implemented detection software processes video frames that are streamed from a low-cost digital camera. The frames are divided into sub-regions which are scanned for an omega shape whenever movement is detected in those regions. It has been found that the HoG-based approach degrades in performance due to occlusion under dense pedestrian traffic conditions whereas the Solo Terra approach appears to be more robust. Undercounts and overcounts were encountered using the Solo Terra approach. To combat the disadvantages of both the approaches, they were integrated to form a single system where count is incremented predominantly using the Solo Terra. The HoG-based approach corrects the obtained count under certain conditions. A preliminary prototype of the integrated system has been verified

ScholarWorks@UMass Amherst

Automatic summarization of rushes video using bipartite graphs

Author: A Ferman
AF Smeaton
AF Smeaton
Alan F. Smeaton
C Liu
C Ngo
C Taskiran
D Byrne
J Canny
Liang Bai
Noel E. O’Connor
P Over
P Over
Songyang Lao
Y Dai
Y Ma
Yanli Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In this paper we present a new approach for automatic summarization of rushes, or unstructured video. Our approach is composed of three major steps. First, based on shot and sub-shot segmentations, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes video. Finally, the presence of faces and motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by TRECVid, using the same dataset and evaluation metrics used in the TRECVid video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement on our own work in terms of the fraction of the TRECVid summary ground truth included and is competitive with the best of other approaches in TRECVid 2007

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera

Author: Ahmed Imran 1991-
Publication venue: 'University of Saskatchewan Library'
Publication date: 18/06/2019
Field of study

Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer

University of Saskatchewan Research Archive

Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera

Author: Ahmed Imran 1991-
Publication venue: 'University of Saskatchewan Library'
Publication date: 18/06/2019
Field of study

Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer

University of Saskatchewan Research Archive

Bulge plus disc and S\'ersic decomposition catalogues for 16,908 galaxies in the SDSS Stripe 82 co-adds: A detailed study of the $ugriz$ structural measurements

Author: Bottrell Connor
Ellison Sara L.
Mendel J. Trevor
Simard Luc
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/03/2019
Field of study

Quantitative characterization of galaxy morphology is vital in enabling comparison of observations to predictions from galaxy formation theory. However, without significant overlap between the observational footprints of deep and shallow galaxy surveys, the extent to which structural measurements for large galaxy samples are robust to image quality (e.g., depth, spatial resolution) cannot be established. Deep images from the Sloan Digital Sky Survey (SDSS) Stripe 82 co-adds provide a unique solution to this problem - offering

1.6-1.8

magnitudes improvement in depth with respect to SDSS Legacy images. Having similar spatial resolution to Legacy, the co-adds make it possible to examine the sensitivity of parametric morphologies to depth alone. Using the Gim2D surface-brightness decomposition software, we provide public morphology catalogs for 16,908 galaxies in the Stripe 82

ugriz

co-adds. Our methods and selection are completely consistent with the Simard et al. (2011) and Mendel et al. (2014) photometric decompositions. We rigorously compare measurements in the deep and shallow images. We find no systematics in total magnitudes and sizes except for faint galaxies in the

u

-band and the brightest galaxies in each band. However, characterization of bulge-to-total fractions is significantly improved in the deep images. Furthermore, statistics used to determine whether single-S\'ersic or two-component (e.g., bulge+disc) models are required become more bimodal in the deep images. Lastly, we show that asymmetries are enhanced in the deep images and that the enhancement is positively correlated with the asymmetries measured in Legacy images.Comment: 27 pages, 14 figures. MNRAS accepted. Our catalogs are available in TXT and SQL formats at http://orca.phys.uvic.ca/~cbottrel/share/Stripe82/Catalogs

arXiv.org e-Print Archive

The Australian National University

Deep Learning-Based SOLO Architecture for Re-Identification of Single Persons by Locations

Author: Bello Rotimi-Williams
Oluigbo Chinedu Uchechukwu
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 21/12/2022
Field of study

Analyzing and judging of captured and retrieved images of the targets from the surveillance video cameras for person re-identification have been a herculean task for computer vision that is worth further research. Hence, re-identification of single persons by locations based on single objects by locations (SOLO) model is proposed in this paper. To achieve the re-identification goal, we based the training of the re-identification model on synchronized stochastic gradient descent (SGD). SOLO is capable of exploiting the contextual cues and segmenting individual persons by their motions. The proposed approach consists of the following steps: (1) reformulating the person instance segmentation as: (a) prediction of category and (b) mask generation tasks for each person instance, (2) dividing the input person image into a uniform grids, i.e., G×G grid cells in such a way that a grid cell can predict the category of the semantic and masks of the person instances provided the center of the person falls into the grid cell and (3) conducting person segmentation. Discriminating features of individual persons are obtained by extraction using convolution neural networks. On person re-identification Market-1501 dataset, SOLO model achieved mAP of 84.1% and 93.8% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). On person re-identification CUHK03 dataset, SOLO model achieved mAP of 82.1 % and 90.1% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). These results show that SOLO model achieves best results for person re-identification, indicating high effectiveness of the model. The research contributions are: (1) Application of synchronized stochastic gradient descent (SGD) to SOLO training for person re-identification and (2) Single objects by locations using semantic category branch and instance mask branch instead of detect-then-segment method, thereby converting person instance segmentation into a solvable problem of single-shot classification

Journal of Education and Learning (EduLearn)

Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

Author: Malobabić Jovanka
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/11/2007
Field of study

Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

DCU Online Research Access Service

A Training Assistant Tool for the Automated Visual Inspection System

Author: Ramaraj Mohan Karthik
Publication venue: Clemson University Libraries
Publication date: 01/12/2015
Field of study

This thesis considers the problem of assisting a human user setting up an automated Visual Inspection (VI) system. The VI system uses a stationary camera on an automobile assembly line to inspect cars as they pass by. The inspection process is intended to identify when parts have been missed or incorrect parts have been assembled. The result is reported to a human working on the assembly line who then can take corrective actions. As originally developed, the system requires a setup phase in which the human user places the camera and records a video of at least 30 minutes length to use for training the system. Training includes specifying regions of cars passing by that are to be inspected. After deployment of a number of systems, it was learned that users could benefit from being provided guidance in best practices to delineate training data. It was also learned that users could benefit from simple visual feedback to ascertain whether or not an inspection problem was suitable for a VI system or if the problem was too challenging. This thesis describes a few methods and a new software tool intended to address this need

Clemson University: TigerPrints