1,790 research outputs found

    Fair comparison of skin detection approaches on publicly available datasets

    Full text link
    Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to track body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent researches in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work are an exhaustive literature review of skin color detection approaches, a framework to evaluate and combine different skin detector approaches, whose source code is made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the best method here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and of the methods proposed in this paper will be freely available from https://github.com/LorisNann

    Automatic summarization of rushes video using bipartite graphs

    Get PDF
    In this paper we present a new approach for automatic summarization of rushes, or unstructured video. Our approach is composed of three major steps. First, based on shot and sub-shot segmentations, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes video. Finally, the presence of faces and motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by TRECVid, using the same dataset and evaluation metrics used in the TRECVid video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement on our own work in terms of the fraction of the TRECVid summary ground truth included and is competitive with the best of other approaches in TRECVid 2007

    Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera

    Get PDF
    Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer

    Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camera

    Get PDF
    Automatic Detection and Segmentation of Lentil Breeding Plots from Images Captured by Multi-spectral UAV-Mounted Camer

    Bulge plus disc and S\'ersic decomposition catalogues for 16,908 galaxies in the SDSS Stripe 82 co-adds: A detailed study of the ugrizugriz structural measurements

    Full text link
    Quantitative characterization of galaxy morphology is vital in enabling comparison of observations to predictions from galaxy formation theory. However, without significant overlap between the observational footprints of deep and shallow galaxy surveys, the extent to which structural measurements for large galaxy samples are robust to image quality (e.g., depth, spatial resolution) cannot be established. Deep images from the Sloan Digital Sky Survey (SDSS) Stripe 82 co-adds provide a unique solution to this problem - offering 1.61.81.6-1.8 magnitudes improvement in depth with respect to SDSS Legacy images. Having similar spatial resolution to Legacy, the co-adds make it possible to examine the sensitivity of parametric morphologies to depth alone. Using the Gim2D surface-brightness decomposition software, we provide public morphology catalogs for 16,908 galaxies in the Stripe 82 ugrizugriz co-adds. Our methods and selection are completely consistent with the Simard et al. (2011) and Mendel et al. (2014) photometric decompositions. We rigorously compare measurements in the deep and shallow images. We find no systematics in total magnitudes and sizes except for faint galaxies in the uu-band and the brightest galaxies in each band. However, characterization of bulge-to-total fractions is significantly improved in the deep images. Furthermore, statistics used to determine whether single-S\'ersic or two-component (e.g., bulge+disc) models are required become more bimodal in the deep images. Lastly, we show that asymmetries are enhanced in the deep images and that the enhancement is positively correlated with the asymmetries measured in Legacy images.Comment: 27 pages, 14 figures. MNRAS accepted. Our catalogs are available in TXT and SQL formats at http://orca.phys.uvic.ca/~cbottrel/share/Stripe82/Catalogs

    Deep Learning-Based SOLO Architecture for Re-Identification of Single Persons by Locations

    Get PDF
    Analyzing and judging of captured and retrieved images of the targets from the surveillance video cameras for person re-identification have been a herculean task for computer vision that is worth further research. Hence, re-identification of single persons by locations based on single objects by locations (SOLO) model is proposed in this paper. To achieve the re-identification goal, we based the training of the re-identification model on synchronized stochastic gradient descent (SGD). SOLO is capable of exploiting the contextual cues and segmenting individual persons by their motions. The proposed approach consists of the following steps: (1) reformulating the person instance segmentation as: (a) prediction of category and (b) mask generation tasks for each person instance, (2) dividing the input person image into a uniform grids, i.e., G×G grid cells in such a way that a grid cell can predict the category of the semantic and masks of the person instances provided the center of the person falls into the grid cell and (3) conducting person segmentation. Discriminating features of individual persons are obtained by extraction using convolution neural networks. On person re-identification Market-1501 dataset, SOLO model achieved mAP of 84.1% and 93.8% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). On person re-identification CUHK03 dataset, SOLO model achieved mAP of 82.1 % and 90.1% rank-1 identification rate, higher than what is achieved by other comparative algorithms such as PL-Net, SegHAN, Siamese, GoogLeNet, and M3L (IBN-Net50). These results show that SOLO model achieves best results for person re-identification, indicating high effectiveness of the model. The research contributions are: (1) Application of synchronized stochastic gradient descent (SGD) to SOLO training for person re-identification and (2) Single objects by locations using semantic category branch and instance mask branch instead of detect-then-segment method, thereby converting person instance segmentation into a solvable problem of single-shot classification

    Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

    Get PDF
    Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

    A Training Assistant Tool for the Automated Visual Inspection System

    Get PDF
    This thesis considers the problem of assisting a human user setting up an automated Visual Inspection (VI) system. The VI system uses a stationary camera on an automobile assembly line to inspect cars as they pass by. The inspection process is intended to identify when parts have been missed or incorrect parts have been assembled. The result is reported to a human working on the assembly line who then can take corrective actions. As originally developed, the system requires a setup phase in which the human user places the camera and records a video of at least 30 minutes length to use for training the system. Training includes specifying regions of cars passing by that are to be inspected. After deployment of a number of systems, it was learned that users could benefit from being provided guidance in best practices to delineate training data. It was also learned that users could benefit from simple visual feedback to ascertain whether or not an inspection problem was suitable for a VI system or if the problem was too challenging. This thesis describes a few methods and a new software tool intended to address this need
    corecore