5 research outputs found

    Affordance segmentation of hand-occluded containers from exocentric images

    Full text link
    Visual affordance segmentation identifies the surfaces of an object an agent can interact with. Common challenges for the identification of affordances are the variety of the geometry and physical properties of these surfaces as well as occlusions. In this paper, we focus on occlusions of an object that is hand-held by a person manipulating it. To address this challenge, we propose an affordance segmentation model that uses auxiliary branches to process the object and hand regions separately. The proposed model learns affordance features under hand-occlusion by weighting the feature map through hand and object segmentation. To train the model, we annotated the visual affordances of an existing dataset with mixed-reality images of hand-held containers in third-person (exocentric) images. Experiments on both real and mixed-reality images show that our model achieves better affordance segmentation and generalisation than existing models.Comment: Paper accepted to Workshop on Assistive Computer Vision and Robotics (ACVR) in International Conference on Computer Vision (ICCV) 2023; 10 pages, 4 figures, 2 tables. Data, code, and trained models are available at https://apicis.github.io/projects/acanet.htm

    Emotion Recognition on Edge Devices: Training and Deployment

    No full text
    Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones

    Arm-Container Affordance Network (model card and parameters)

    No full text
    [arXiv] [webpage] [code] [mixed-reality data] Arm-Container Affordance Network (ACANet) is an affordance segmentation model that uses auxiliary branches to focus on the object and hand regions separately. The model learns affordance features under hand-occlusion by weighting the feature map through hand and object segmentation. ACANet was designed and trained in a collaboration between Smart Embedded Applications Laboratory of University of Genoa (Italy) and Centre for Intelligent Sensing of Queen Mary University of London (U.K.). Model date. V1.0.0 - 27 May 2023 (Note: this is the date the model was trained.) Model type. ACANet is a UNet-like convolutional neural network with a ResNet encoder. The decoder is composed of 3 branches: one performs arm segmentation, one performs container segmentation, one fuses the outputs of the other two branches with the features and performs arm and container affordances segmentation. Training setup. For ACANet, we use a linear combination of a Dice Loss for arm container segmentation branch, a binary cross-entropy loss with weight 1 for object segmentation, a binary cross-entropy loss with weight 1 for arm segmentation. We set the batch size to 2, the initial learning rate to 0.001, and we use the mini-batch Gradient Descent algorithm as optimizer with a momentum of 0.9 and a weight decay of 0.0001. We schedule the learning rate to decrease by a factor of 0.5, if there is no increase of the mean Intersection over Union in the validation set for 3 consecutive epochs. We use early stopping with a patience of 10 epochs to reduce overfitting, and set the maximum number of epochs to 100. We apply the following sequence of transformations: resize by a factor randomly sampled in the interval [1, 1.5] to avoid degrading quality; center crop the resized image with a W × H window to restore the original image resolution; and horizontal flip with a probability of 0.5 to simulate the other arm. We set the window size to W = H = 480. Citation details. Affordance segmentation of hand-occluded containers from exocentric images T. Apicella, A. Xompero, E. Ragusa, R. Berta, A. Cavallaro, P. Gastaldo IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023 @inproceedings{apicella2023affordance, title={Affordance segmentation of hand-occluded containers from exocentric images}, author={Apicella, Tommaso and Xompero, Alessio and Ragusa, Edoardo and Berta, Riccardo and Cavallaro, Andrea and Gastaldo, Paolo}, booktitle={IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)}, year={2023}, } License. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) Enquiries, Question and Comments. For enquiries, questions, or comments, please contact Tommaso Apicella. Primary intended uses. The primary intended users of this model are academic researchers, scholars, and practitioners working in the fields of computer vision and robotics. The primary intended uses of ACANet are: Assistive technologies for robotics and prosthetic applications (e.g., grasping, object manipulation) or collaborative human-robot scenarios (e.g., handovers). Baseline for affordance segmentation Out-of-scope use cases. Any application which requires a high degree of accuracy and/or real-time requirements. Factors. The model was trained on the extended version of CHOC dataset, which includes human forearm and hands that have textures from the SURREAL dataset. Note that these textures vary widely in skin tones. Backgrounds include both indoor and outdoor settings. Factors that may influence the performance are: cluttered background, lighting conditions, tablecloth with drawings, and textured clothing, object categories. Training Data. Datasets. Mixed-reality training and validation sets from CORSMAL Hand-Occluded Containers (CHOC) dataset complemented with object affordances annotation. Motivation. Using mixed-reality datasets can easily scale the generation of a larger number of images under different realistic backgrounds, varying the hand and object poses. Preprocessing. RGB images are normalised in [0, 1] range, standardised using [0.485, 0.456, 0.406] per-channel mean and [0.229, 0.224, 0.225] per-channel standard deviation. Images can be of different resolutions and therefore we apply a cropping square window of fixed size to avoid distorsions or adding padding. Assuming a perfect object detector, we crop a W × W window around the center of the bounding box obtained from the object mask annotation to restrict the visual field and obtain an object centric view. However, the cropping window can go out of the support of the image if the bounding box is close to the image border. In this case, we extend the side of the window that is inside the image support to avoid padding. In case the bounding box is bigger than the cropping window, we crop the image inside the bounding box and resize it to the window size. W = 480 pixels. Evaluation Data. Datasets. We evaluated the model in the following test sets: Mixed-reality: 2 testing sets, one containing 13, 824 images, the other one 17, 280 images. We sampled 150 images from CCM from the released training and public test set. We sampled 150 images from HO-3D with the objects box and mug. Motivation. Mixed reality: evaluate the models generalisation to different backgrounds and different object instances. CCM and HO-3D: presence of various challenges, such as presence of the human body, real interactions, and different object instances and hand-object poses. Preprocessing. RGB images are are normalised in [0, 1] range, standardised using [0.485, 0.456, 0.406] per-channel mean and [0.229, 0.224, 0.225] per-channel standard deviation. We used the exact same training cropping procedure to evaluate the model on the mixed-reality testing sets. For CCM and HO-3D testing sets, we considered the visible object segmentation mask to recover the bounding box and consequently the W x W window. Metrics. Model performance measures. Precision measures the percentage of true positives among all positive predicted pixels. Recall measures the percentage of true positive pixels with respect to the total number of positive pixels. The Jaccard Index measures how much two regions with the same support are comparable (Intersection over Union or IoU). Decision thresholds. The object and arm segmentation are rounded nearest, hence the output is 0 when the probability is less than 0.5, 1 when it is greater than 0.5 or equal 0.5. Quantitative Analyses. Provided in the paper. ACANet achieves better affordance segmentation and generalisation than existing models. Ethical Considerations. Even if the model is designed for assistive applications, the model was not tested in real use cases with humans involved. A proper analysis of the risks should be conducted before employing the model in such applications

    Electroencephalography correlates of fear of heights in a virtual reality environment

    No full text
    An electroencephalography (EEG)-based classification system of three levels of fear of heights is proposed. A virtual reality (VR) scenario representing a canyon was exploited to gradually expose the subjects to fear inducing stimuli with increasing intensity. An elevating platform allowed the subjects to reach three different height levels. Psychometric tools were employed to initially assess the severity of fear of heights and to assess the effectiveness of fear induction. A feasibility study was conducted on eight subjects who underwent three experimental sessions. The EEG signals were acquired through a 32-channel headset during the exposure to the eliciting VR scenario. The main EEG bands and scalp regions were explored in order to identify which are the most affected by the fear of heights. As a result, the gamma band, followed by the high-beta band, and the frontal area of the scalp resulted the most significant. The average accuracies in the within-subject case for the three-classes fear classification task, were computed. The frontal region of the scalp resulted particularly relevant and an average accuracy of (68.20 ± 11.60) % was achieved using as features the absolute powers in the five EEG bands. Considering the frontal region only, the most significant EEG bands resulted to be the high-beta and gamma bands achieving accuracies of (57.90 ± 10.10) % and of (61.30 ± 8.43) %, respectively. The Sequential Feature Selection (SFS) confirmed those results by selecting for the whole set of channels, in the 48.26 % of the cases the gamma band and in the 22.92 % the high-beta band and by achieving an average accuracy of (86.10 ± 8.29) %
    corecore