Search CORE

1,804 research outputs found

Video Registration in Egocentric Vision under Day and Night Illumination Changes

Author: Alletto Stefano
Cucchiara Rita
Serra Giuseppe
Publication venue
Publication date: 28/07/2016
Field of study

With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Su un frammento di Sallustio

Author: Serra Giuseppe
Publication venue: EUT Edizioni Università di Trieste
Publication date: 01/01/2015
Field of study

La fortuna, (e sfortuna) di una congettura di Lipsio a Sallustio, fr. II, fr. 77 Maur. documenta come una citazione può talvolta sedurre il lettore a considerarla, anche contro l'evidenza, compiuta.The fate of Lipsius' conjecture to Sallustius, fr. II, fr. 77 Maur. proves that a fragment or a quotation may tempt the reader to consider it as a complete sentence

OpenstarTs

A Data-Driven Approach for Tag Refinement and Localization in Web Videos

Author: Ballan Lamberto
Bertini Marco
Del Bimbo Alberto
Serra Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select on the fly from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Florence Research

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Archivio istituzionale della ricerca - Università di Padova

UniUD Submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2023

Author: Falcon Alex
Serra Giuseppe
Publication venue
Publication date: 27/06/2023
Field of study

In this report, we present the technical details of our submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2023. To participate in the challenge, we ensembled two models trained with two different loss functions on 25% of the training data. Our submission, visible on the public leaderboard, obtains an average score of 56.81% nDCG and 42.63% mAP

arXiv.org e-Print Archive

Alfalfa hay digestibility in Sardinian does

Author: Brandano Paolo
Pulina Giuseppe
Serra Antonio
Publication venue: Università degli studi di Sassari, Facoltà di Agraria
Publication date: 01/01/1992
Field of study

A feeding trial was carried out on ten Sardinian does. 5 does (group F) were fed with alfalfa hay and the other 5 (group C+F) with alfalfa hay + 0.6 kg of concentrate to measure the digestibility of these feeds. The digestibility was measured by marker method (lignin) and by the enzymatic method. The marker method showed that the addition of concentrate to the hay increased the OM digestibility. The second method showed similar results. The Authors calculated a regression equation between "in vivo" and "in vitro" digestibility of OM in the C+F group. The changes in lignin composition before and after the digestion were analyzed

UnissResearch

Egocentric Video Summarization of Cultural Tour based on User Preferences

Author: Cucchiara Rita
Serra Giuseppe
Varini Patrizia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

In this paper, we propose a new method to obtain customized video summarization according to specific user preferences. Our approach is tailored on Cultural Heritage scenario and is designed on identifying candidate shots, selecting from the original streams only the scenes with behavior patterns related to the presence of relevant experiences, and further filtering them in order to obtain a summary matching the requested user's preferences. Our preliminary results show that the proposed approach is able to leverage user's preferences in order to obtain a customized summary, so that different users may extract from the same stream different summaries

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval

Author: Falcon Alex
Lanz Oswald
Serra Giuseppe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Every hour, huge amounts of visual contents are posted on social media and user-generated content platforms. To find relevant videos by means of a natural language query, text-video retrieval methods have received increased attention over the past few years. Data augmentation techniques were introduced to increase the performance on unseen test examples by creating new training samples with the application of semantics-preserving techniques, such as color space or geometric transformations on images. Yet, these techniques are usually applied on raw data, leading to more resource-demanding solutions and also requiring the shareability of the raw data, which may not always be true, e.g. copyright issues with clips from movies or TV series. To address this shortcoming, we propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples. We experiment our solution on a large scale public dataset, EPIC-Kitchens-100, and achieve considerable improvements over a baseline method, improved state-of-the-art performance, while at the same time performing multiple ablation studies. We release code and pretrained models on Github at https://github.com/aranciokov/FSMMDA_VideoRetrieval.Comment: Accepted for presentation at 30th ACM International Conference on Multimedia (ACM MM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio della ricerca - Fondazione Bruno Kessler

Data augmentation techniques for the Video Question Answering task

Author: Falcon Alex
Lanz Oswald
Serra Giuseppe
Publication venue
Publication date: 01/01/2020
Field of study

Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer. In our work we focus on the Egocentric VideoQA task, which exploits first-person videos, because of the importance of such task which can have impact on many different fields, such as those pertaining the social assistance and the industrial training. Recently, an Egocentric VideoQA dataset, called EgoVQA, has been released. Given its small size, models tend to overfit quickly. To alleviate this problem, we propose several augmentation techniques which give us a +5.5% improvement on the final accuracy over the considered baseline.Comment: 16 pages, 5 figures; to be published in Egocentric Perception, Interaction and Computing (EPIC) Workshop Proceedings, at ECCV 202

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests

Author: Abdari Ali
Falcon Alex
Serra Giuseppe
Publication venue
Publication date: 06/09/2023
Field of study

Nowadays, many people frequently have to search for new accommodation options. Searching for a suitable apartment is a time-consuming process, especially because visiting them is often mandatory to assess the truthfulness of the advertisements found on the Web. While this process could be alleviated by visiting the apartments in the metaverse, the Web-based recommendation platforms are not suitable for the task. To address this shortcoming, in this paper, we define a new problem called text-to-apartment recommendation, which requires ranking the apartments based on their relevance to a textual query expressing the user's interests. To tackle this problem, we introduce FArMARe, a multi-task approach that supports cross-modal contrastive training with a furniture-aware objective. Since public datasets related to indoor scenes do not contain detailed descriptions of the furniture, we collect and annotate a dataset comprising more than 6000 apartments. A thorough experimentation with three different methods and two raw feature extraction procedures reveals the effectiveness of FArMARe in dealing with the problem at hand.Comment: accepted for presentation at the ICCV2023 CV4Metaverse worksho

arXiv.org e-Print Archive

Efficient Keyphrase Generation with GANs

Author: Lancioni Giuseppe
Mohamed Saida S.
Portelli Beatrice
Serra Giuseppe
Tasso Carlo
Publication venue
Publication date: 01/01/2021
Field of study

Keyphrase Generation is the task of predicting keyphrases: short text sequences that convey the main semantic meaning of a document. In this paper, we introduce a keyphrase generation approach that makes use of a Generative Adversarial Networks (GANs) architecture. In our system, the Generator produces a sequence of keyphrases for an input document. The Discriminator, in turn, tries to distinguish between machine generated and human curated keyphrases. We propose a novel Discriminator architecture based on a BERT pretrained model fine-tuned for Sequence Classification. We train our proposed architecture using only a small subset of the standard available training dataset, amounting to less than 1% of the total, achieving a great level of data efficiency. The resulting model is evaluated on five public datasets, obtaining competitive and promising results with respect to four state-of-the-art generative models

Archivio istituzionale della ricerca - Università degli Studi di Udine