Search CORE

10 research outputs found

Behaviour understanding through the analysis of image sequences collected by wearable cameras

Author: Talavera Martínez Estefanía
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2020
Field of study

Describing people's lifestyle has become a hot topic in the field of artificial intelligence. Lifelogging is described as the process of collecting personal activity data describing the daily behaviour of a person. Nowadays, the development of new technologies and the increasing use of wearable sensors allow to automatically record data from our daily living. In this paper, we describe our developed automatic tools for the analysis of collected visual data that describes the daily behaviour of a person. For this analysis, we rely on sequences of images collected by wearable cameras, which are called egocentric photo-streams. These images are a rich source of information about the behaviour of the camera wearer since they show an objective and first-person view of his or her lifestyle

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB

Dissertations of the University of Groningen

Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions

Author: Cai Minjie
Huang Yifei
Li Zhenqiang
Sato Yoichi
Publication venue
Publication date: 29/06/2020
Field of study

In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context. Our assumption is that in the procedure of performing a manipulation task, what a person is doing determines where the person is looking at, and the gaze point reveals gaze and non-gaze regions which contain important and complementary information about the undergoing action. We propose a novel mutual context network (MCN) that jointly learns action-dependent gaze prediction and gaze-guided action recognition in an end-to-end manner. Experiments on public egocentric video datasets demonstrate that our MCN achieves state-of-the-art performance of both gaze prediction and action recognition

arXiv.org e-Print Archive

Mining reality to explore the 21st century student experience

Author: John Senorita
Publication venue: 'University of Otago Library'
Publication date: 27/10/2020
Field of study

Understanding student experience is a key aspect of higher education research. To date, the dominant methods for advancing this area have been the use of surveys and interviews, methods that typically rely on post-event recollections or perceptions, which can be incomplete and unreliable. Advances in mobile sensor technologies afford the opportunity to capture continuous, naturally-occurring student activity. In this thesis, I propose a new research approach for higher education that redefines student experience in terms of objective activity observation, rather than a construct of perception. I argue that novel, technologically driven research practices such as ‘Reality Mining’—continuous capture of digital data from wearable devices and the use of multi-modal datasets captured over prolonged periods, offer a deeper, more accurate representation of students’ lived experience. To explore the potential of these new methods, I implemented and evaluated three approaches to gathering student activity and behaviour data. I collected data from 21 undergraduate health science students at the University of Otago, over the period of a single semester (approximately four months). The data captured included GPS trace data from a smartphone app to explore student spaces and movements; photo data from a wearable auto-camera (that takes a photo from the wearer’s point-of-view, every 30 seconds) to investigate student activities; and computer usage data captured via the RescueTime software to gain insight into students’ digital practices. I explored the findings of these three datasets, visualising the student experience in different ways to demonstrate different perspectives on student activity, and utilised a number of new analytical approaches (such as Computer Vision algorithms for automatically categorising photostream data) to make sense of the voluminous data generated. To help future researchers wanting to utilise similar techniques, I also outlined the limitations and challenges encountered in using these new methods/devices for research. The findings of the three method explorations offer some insights into various aspects of the student experience, but serve mostly to highlight the idiographic nature of student life. The principal finding of this research is that these types of ‘student analytics’ are most readily useful to the students themselves, for highlighting their practices and informing self-improvement. I look at this aspect through the lens of a movement called the ‘Quantified Self’, which promotes the use of self-tracking technologies for personal development. To conclude my thesis, I discuss broadly how these methods could feature in higher education research, for researchers, for the institution, and, most importantly, for the students themselves. To this end, I develop a conceptual framework derived from Tschumi’s (1976) Space-Event-Movement framework. At the same time, I also take a critical perspective about the role of these types of personal analytics in the future of higher education, and question how involved the institution should be in the capture and utilisation of these data. Ultimately, there is value in exploring these data capture methods further, but always keeping the ‘student’ placed squarely at the centre of the ‘student experience’

Te Tumu Eprints Repository

An Outlook into the Future of Egocentric Vision

Author: Bansal Siddhant
Damen Dima
Farinella Giovanni Maria
Furnari Antonino
Goletto Gabriele
Plizzari Chiara
Ragusa Francesco
Tommasi Tatiana
Publication venue
Publication date: 14/08/2023
Field of study

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

arXiv.org e-Print Archive

Text–to–Video: Image Semantics and NLP

Author: Schwarz Katharina
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

Publikationsserver der Universität Tübingen

Ramon Llull's Ars Magna

Author: Jensen Thessa
Publication venue
Publication date: 22/02/2017
Field of study

VBN

Recent advances in adaptive sequential Monte Carlo methods

Author: Djurić Petar M.
Elvira Arregui Victor
Míguez Joaquín
Publication venue
Publication date: 01/01/2017
Field of study

Edinburgh Research Explorer