1,600 research outputs found
The color of smiling: computational synaesthesia of facial expressions
This note gives a preliminary account of the transcoding or rechanneling
problem between different stimuli as it is of interest for the natural
interaction or affective computing fields. By the consideration of a simple
example, namely the color response of an affective lamp to a sensed facial
expression, we frame the problem within an information- theoretic perspective.
A full justification in terms of the Information Bottleneck principle promotes
a latent affective space, hitherto surmised as an appealing and intuitive
solution, as a suitable mediator between the different stimuli.Comment: Submitted to: 18th International Conference on Image Analysis and
Processing (ICIAP 2015), 7-11 September 2015, Genova, Ital
Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
First-person stories can be analyzed by means of egocentric pictures acquired
throughout the whole active day with wearable cameras. This manuscript presents
an egocentric dataset with more than 45,000 pictures from four people in
different environments such as working or studying. All the images were
manually labeled to identify three patterns of interest regarding people's
lifestyle: socializing, eating and sedentary. Additionally, two different
approaches are proposed to classify egocentric images into one of the 12 target
categories defined to characterize these three patterns. The approaches are
based on machine learning and deep learning techniques, including traditional
classifiers and state-of-art convolutional neural networks. The experimental
results obtained when applying these methods to the egocentric dataset
demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing
and Beyond, 19th International Conference on Image Analysis and Processing
(ICIAP), September 201
Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
First-person stories can be analyzed by means of egocentric pictures acquired
throughout the whole active day with wearable cameras. This manuscript presents
an egocentric dataset with more than 45,000 pictures from four people in
different environments such as working or studying. All the images were
manually labeled to identify three patterns of interest regarding people's
lifestyle: socializing, eating and sedentary. Additionally, two different
approaches are proposed to classify egocentric images into one of the 12 target
categories defined to characterize these three patterns. The approaches are
based on machine learning and deep learning techniques, including traditional
classifiers and state-of-art convolutional neural networks. The experimental
results obtained when applying these methods to the egocentric dataset
demonstrated their adequacy for the problem at hand.Comment: Accepted at First International Workshop on Social Signal Processing
and Beyond, 19th International Conference on Image Analysis and Processing
(ICIAP), September 201
Efficient moving point handling for incremental 3D manifold reconstruction
As incremental Structure from Motion algorithms become effective, a good
sparse point cloud representing the map of the scene becomes available
frame-by-frame. From the 3D Delaunay triangulation of these points,
state-of-the-art algorithms build a manifold rough model of the scene. These
algorithms integrate incrementally new points to the 3D reconstruction only if
their position estimate does not change. Indeed, whenever a point moves in a 3D
Delaunay triangulation, for instance because its estimation gets refined, a set
of tetrahedra have to be removed and replaced with new ones to maintain the
Delaunay property; the management of the manifold reconstruction becomes thus
complex and it entails a potentially big overhead. In this paper we investigate
different approaches and we propose an efficient policy to deal with moving
points in the manifold estimation process. We tested our approach with four
sequences of the KITTI dataset and we show the effectiveness of our proposal in
comparison with state-of-the-art approaches.Comment: Accepted in International Conference on Image Analysis and Processing
(ICIAP 2015
A topological approach for segmenting human body shape
Segmentation of a 3D human body, is a very challenging problem in applications exploiting human scan data. To tackle this problem, the paper proposes a topological approach based on the discrete Reeb graph (DRG) which is an extension of the classical Reeb graph to handle unorganized clouds of 3D points. The essence of the approach concerns detecting critical nodes in the DRG, thereby permitting the extraction of branches that represent parts of the body. Because the human body shape representation is built upon global topological features that are preserved so long as the whole structure of the human body does not change, our approach is quite robust against noise, holes, irregular sampling, frame change and posture variation. Experimental results performed on real scan data demonstrate the validity of our method
ART Neural Networks for Remote Sensing Image Analysis
ART and ARTMAP neural networks for adaptive recognition and prediction have been applied to a variety of problems, including automatic mapping from remote sensing satellite measurements, parts design retrieval at the Boeing Company, medical database prediction, and robot vision. This paper features a self-contained introduction to ART and ARTMAP dynamics. An application of these networks to image processing is illustrated by means of a remote sensing example. The basic ART and ARTMAP networks feature winner-take-all (WTA) competitive coding, which groups inputs into discrete recognition categories. WTA coding in these networks enables fast learning, which allows the network to encode important rare cases but which may lead to inefficient category proliferation with noisy training inputs. This problem is partially solved by ART-EMAP, which use WTA coding for learning but distributed category representations for test-set prediction. Recently developed ART models (dART and dARTMAP) retain stable coding, recognition, and prediction, but allow arbitrarily distributed category representation during learning as well as performance
Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape âź30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in
Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip
OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data
The inexorable growth of online shopping and e-commerce demands scalable and
robust machine learning-based solutions to accommodate customer requirements.
In the context of automatic tagging classification and multimodal retrieval,
prior works either defined a low generalizable supervised learning approach or
more reusable CLIP-based techniques while, however, training on closed source
data. In this work, we propose OpenFashionCLIP, a vision-and-language
contrastive learning method that only adopts open-source fashion data stemming
from diverse domains, and characterized by varying degrees of specificity. Our
approach is extensively validated across several tasks and benchmarks, and
experimental results highlight a significant out-of-domain generalization
capability and consistent improvements over state-of-the-art methods both in
terms of accuracy and recall. Source code and trained models are publicly
available at: https://github.com/aimagelab/open-fashion-clip.Comment: International Conference on Image Analysis and Processing (ICIAP)
202
- âŚ