8 research outputs found

    ARTYCUL: A Privacy-Preserving ML-Driven Framework to Determine the Popularity of a Cultural Exhibit on Display.

    Full text link
    We present ARTYCUL (ARTifact popularitY for CULtural heritage), a machine learning(ML)-based framework that graphically represents the footfall around an artifact on display at a museum or a heritage site. The driving factor of this framework was the fact that the presence of security cameras has become universal, including at sites of cultural heritage. ARTYCUL used the video streams of closed-circuit televisions (CCTV) cameras installed in such premises to detect human figures, and their coordinates with respect to the camera frames were used to visualize the density of visitors around the specific display items. Such a framework that can display the popularity of artifacts would aid the curators towards a more optimal organization. Moreover, it could also help to gauge if a certain display item were neglected due to incorrect placement. While items of similar interest can be placed in vicinity of each other, an online recommendation system may also use the reputation of an artifact to catch the eye of the visitors. Artificial intelligence-based solutions are well suited for analysis of internet of things (IoT) traffic due to the inherent veracity and volatile nature of the transmissions. The work done for the development of ARTYCUL provided a deeper insight into the avenues for applications of IoT technology to the cultural heritage domain, and suitability of ML to process real-time data at a fast pace. While we also observed common issues that hinder the utilization of IoT in the cultural domain, the proposed framework was designed keeping in mind the same obstacles and a preference for backward compatibility

    Abordagens multiescala para descrição de textura

    Get PDF
    Orientadores: Hélio Pedrini, William Robson SchwartzDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Visão computacional e processamento de imagens desempenham um papel importante em diversas áreas, incluindo detecção de objetos e classificação de imagens, tarefas muito importantes para aplicações em imagens médicas, sensoriamento remoto, análise forense, detecção de pele, entre outras. Estas tarefas dependem fortemente de informação visual extraída de imagens que possa ser utilizada para descrevê-las eficientemente. Textura é uma das principais propriedades usadas para descrever informação tal como distribuição espacial, brilho e arranjos estruturais de superfícies. Para reconhecimento e classificação de imagens, um grande grupo de descritores de textura foi investigado neste trabalho, sendo que apenas parte deles é realmente multiescala. Matrizes de coocorrência em níveis de cinza (GLCM) são amplamente utilizadas na literatura e bem conhecidas como um descritor de textura efetivo. No entanto, este descritor apenas discrimina informação em uma única escala, isto é, a imagem original. Escalas podem oferecer informações importantes em análise de imagens, pois textura pode ser percebida por meio de diferentes padrões em diferentes escalas. Dessa forma, duas estratégias diferentes para estender a matriz de coocorrência para múltiplas escalas são apresentadas: (i) uma representação de escala-espaço Gaussiana, construída pela suavização da imagem por um filtro passa-baixa e (ii) uma pirâmide de imagens, que é definida pelo amostragem de imagens em espaço e escala. Este descritor de textura é comparado com outros descritores em diferentes bases de dados. O descritor de textura proposto e então aplicado em um contexto de detecção de pele, como forma de melhorar a acurácia do processo de detecção. Resultados experimentais demonstram que a extensão multiescala da matriz de coocorrência exibe melhora considerável nas bases de dados testadas, exibindo resultados superiores em relação a diversos outros descritores, incluindo a versão original da matriz de coocorrência em escala únicaAbstract: Computer vision and image processing techniques play an important role in several fields, including object detection and image classification, which are very important tasks with applications in medical imagery, remote sensing, forensic analysis, skin detection, among others. These tasks strongly depend on visual information extracted from images that can be used to describe them efficiently. Texture is one of the main used characteristics that describes information such as spatial distribution, brightness and surface structural arrangements. For image recognition and classification, a large set of texture descriptors was investigated in this work, such that only a small fraction is actually multi-scale. Gray level co-occurrence matrices (GLCM) have been widely used in the literature and are known to be an effective texture descriptor. However, such descriptor only discriminates information on a unique scale, that is, the original image. Scales can offer important information in image analysis, since texture can be perceived as different patterns at distinct scales. For that matter, two different strategies for extending the GLCM to multiple scales are presented: (i) a Gaussian scale-space representation, constructed by smoothing the image with a low-pass filter and (ii) an image pyramid, which is defined by sampling the image both in space and scale. This texture descriptor is evaluated against others in different data sets. Then, the proposed texture descriptor is applied in skin detection context, as a mean of improving the accuracy of the detection process. Experimental results demonstrated that the GLCM multi-scale extension has remarkable improvements on tested data sets, outperforming many other feature descriptors, including the original GLCMMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection

    Full text link
    Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system's structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.Comment: 25 pages, 15 figure

    Compound Models for Vision-Based Pedestrian Recognition

    Get PDF
    This thesis addresses the problem of recognizing pedestrians in video images acquired from a moving camera in real-world cluttered environments. Instead of focusing on the development of novel feature primitives or pattern classifiers, we follow an orthogonal direction and develop feature- and classifier-independent compound techniques which integrate complementary information from multiple image-based sources with the objective of improved pedestrian classification performance. After establishing a performance baseline in terms of a thorough experimental study on monocular pedestrian recognition, we investigate the use of multiple cues on module-level. A motion-based focus of attention stage is proposed based on a learned probabilistic pedestrian-specific model of motion features. The model is used to generate pedestrian localization hypotheses for subsequent shape- and texture-based classification modules. In the remainder of this work, we focus on the integration of complementary information directly into the pattern classification step. We present a combination of shape and texture information by means of pose-specific generative shape and texture models. The generative models are integrated with discriminative classification models by utilizing synthesized virtual pedestrian training samples from the former to enhance the classification performance of the latter. Both models are linked using Active Learning to guide the training process towards informative samples. A multi-level mixture-of-experts classification framework is proposed which involves local pose-specific expert classifiers operating on multiple image modalities and features. In terms of image modalities, we consider gray-level intensity, depth cues derived from dense stereo vision and motion cues arising from dense optical flow. We furthermore employ shape-based, gradient-based and texture-based features. The mixture-of-experts formulation compares favorably to joint space approaches, in view of performance and practical feasibility. Finally, we extend this mixture-of-experts framework in terms of multi-cue partial occlusion handling and the estimation of pedestrian body orientation. Our occlusion model involves examining occlusion boundaries which manifest in discontinuities in depth and motion space. Occlusion-dependent weights which relate to the visibility of certain body parts focus the decision on unoccluded body components. We further apply the pose-specific nature of our mixture-of-experts framework towards estimating the density of pedestrian body orientation from single images, again integrating shape and texture information. Throughout this work, particular emphasis is laid on thorough performance evaluation both regarding methodology and competitive real-world datasets. Several datasets used in this thesis are made publicly available for benchmarking purposes. Our results indicate significant performance boosts over state-of-the-art for all aspects considered in this thesis, i.e. pedestrian recognition, partial occlusion handling and body orientation estimation. The pedestrian recognition performance in particular is considerably advanced; false detections at constant detection rates are reduced by significantly more than an order of magnitude

    Robust CoHOG Feature Extraction in Human-Centered Image/Video Management System

    No full text
    Many human-centered image and video management systems depend on robust human detection. To extract robust features for human detection, this paper investigates the following shortcomings of co-occurrence histograms of oriented gradients (CoHOGs) which significantly limit its advantages: 1) The magnitudes of the gradients are discarded, and only the orientations are used; 2) the gradients are not smoothed, and thus, aliasing effect exists; and 3) the dimensionality of the CoHOG feature vector is very large (e.g., 200 000). To deal with these problems, in this paper, we propose a framework that performs the following: 1) utilizes a novel gradient decomposition and combination strategy to make full use of the information of gradients; (2) adopts a two-stage gradient smoothing scheme to perform efficient gradient interpolation; and (3) employs incremental principal component analysis to reduce the large dimensionality of the CoHOG features. Experimental results on the two different human databases demonstrate the effectiveness of the proposed method
    corecore