1,600 research outputs found

    The color of smiling: computational synaesthesia of facial expressions

    Get PDF
    This note gives a preliminary account of the transcoding or rechanneling problem between different stimuli as it is of interest for the natural interaction or affective computing fields. By the consideration of a simple example, namely the color response of an affective lamp to a sensed facial expression, we frame the problem within an information- theoretic perspective. A full justification in terms of the Information Bottleneck principle promotes a latent affective space, hitherto surmised as an appealing and intuitive solution, as a suitable mediator between the different stimuli.Comment: Submitted to: 18th International Conference on Image Analysis and Processing (ICIAP 2015), 7-11 September 2015, Genova, Ital

    Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns

    Full text link
    First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people's lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing and Beyond, 19th International Conference on Image Analysis and Processing (ICIAP), September 201

    Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns

    Full text link
    First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people's lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing and Beyond, 19th International Conference on Image Analysis and Processing (ICIAP), September 201

    Efficient moving point handling for incremental 3D manifold reconstruction

    Get PDF
    As incremental Structure from Motion algorithms become effective, a good sparse point cloud representing the map of the scene becomes available frame-by-frame. From the 3D Delaunay triangulation of these points, state-of-the-art algorithms build a manifold rough model of the scene. These algorithms integrate incrementally new points to the 3D reconstruction only if their position estimate does not change. Indeed, whenever a point moves in a 3D Delaunay triangulation, for instance because its estimation gets refined, a set of tetrahedra have to be removed and replaced with new ones to maintain the Delaunay property; the management of the manifold reconstruction becomes thus complex and it entails a potentially big overhead. In this paper we investigate different approaches and we propose an efficient policy to deal with moving points in the manifold estimation process. We tested our approach with four sequences of the KITTI dataset and we show the effectiveness of our proposal in comparison with state-of-the-art approaches.Comment: Accepted in International Conference on Image Analysis and Processing (ICIAP 2015

    A topological approach for segmenting human body shape

    Get PDF
    Segmentation of a 3D human body, is a very challenging problem in applications exploiting human scan data. To tackle this problem, the paper proposes a topological approach based on the discrete Reeb graph (DRG) which is an extension of the classical Reeb graph to handle unorganized clouds of 3D points. The essence of the approach concerns detecting critical nodes in the DRG, thereby permitting the extraction of branches that represent parts of the body. Because the human body shape representation is built upon global topological features that are preserved so long as the whole structure of the human body does not change, our approach is quite robust against noise, holes, irregular sampling, frame change and posture variation. Experimental results performed on real scan data demonstrate the validity of our method

    ART Neural Networks for Remote Sensing Image Analysis

    Full text link
    ART and ARTMAP neural networks for adaptive recognition and prediction have been applied to a variety of problems, including automatic mapping from remote sensing satellite measurements, parts design retrieval at the Boeing Company, medical database prediction, and robot vision. This paper features a self-contained introduction to ART and ARTMAP dynamics. An application of these networks to image processing is illustrated by means of a remote sensing example. The basic ART and ARTMAP networks feature winner-take-all (WTA) competitive coding, which groups inputs into discrete recognition categories. WTA coding in these networks enables fast learning, which allows the network to encode important rare cases but which may lead to inefficient category proliferation with noisy training inputs. This problem is partially solved by ART-EMAP, which use WTA coding for learning but distributed category representations for test-set prediction. Recently developed ART models (dART and dARTMAP) retain stable coding, recognition, and prediction, but allow arbitrarily distributed category representation during learning as well as performance

    Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types

    Full text link
    Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∟30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip

    OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

    Full text link
    The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.Comment: International Conference on Image Analysis and Processing (ICIAP) 202
    • …
    corecore