3,640 research outputs found

    Ego perspective video indexing for life logging videos

    Get PDF
    This thesis deals with life logging videos that are recorded by head worn devices. The goal is to develop a method to filter out parts of life logging videos which are important. This means it is to determine which parts are important. To do this we take a look at how the autobiographical memory works and try to adapt an indexing mechanism which works on similar aspects. To index life logging videos with the expressive metadata successfully we first need to extract information out of the video itself. Since faces are an important part of autobiographical memory recall, image processing which consists of face detection, tracking and recognition is used. This helps to get the people in a scene. Another part is the location data which is accessed by using GPS data. After all the information is gathered we can index those information in so called events. For each event we have to define the people that are present during this event, which place and at what time the event takes place. To do this an indexing algorithm was developed which segments the video into smaller parts by using the faces, location and time. The result is a prototype algorithm which can be further developed to improve the actual segmentation of life logging videos. This project serves as an information collecting and creation application for future life logging video navigation tools.Diese Arbeit befasst sich mit Lifelogging Videos, die mit auf dem Kopf getragenen Geräten aufgenommen wurden. Das Ziel ist es eine Methode zu entwickeln, um wichtige Teile aus einem Lifelogging Video heraus zu filtern. Das bedeutet, dass wir herausfinden müssen welche Teile eines Videos überhaupt als wichtig erachtet werden. Um die Wichtigkeit einzelner Videoabschnitte festzulegen, müssen wir herausfinden wie das autobiographische Gedächtnis1 funktioniert, um einen indexing Mechanismus zu erstellen, der auf ähnliche Weise funktioniert. Um die Videos mit verschiedenen Informationen zu indexen müssen zunächst diese Informationen aus dem Video selber gewonnen werden. Da Gesichter ein wichtiger Teil des autobiographischen Gedächtnisses sind, wird image processing benutzt, um Gesichter aus den Videos zu erkennen. Zusätzlich können wir die GPS Daten benutzen um den Ort zu bestimmen. Nachdem die ganzen Informationen gesammelt wurden, werden sie in sogenannten Events gespeichert. Für jedes Event muss definiert werden, welche Personen an welchem Ort zu welcher Zeit auftauchen. Um eine gute Zusammensetzung von Events zu gewährleisten wurde ein Prototyp entwickelt um Lifelogging Videos in kleinere Segmente aufzuteilen, die momentan nur auf Gesichtern, Orten und Zeit beruhen. Dieser Prototyp kann in Zukunft beliebig erweitert und verbessert werden. Dieses Projekt dient als Grundlage für die spätere Entwicklung eines geeigneten Lifelogging Navigationstools

    Visual Summary of Egocentric Photostreams by Representative Keyframes

    Get PDF
    Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this paper, we propose a new summarization method based on keyframes selection that uses visual features extracted by means of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the summaries.Comment: Paper accepted in the IEEE First International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX). Turin, Italy. July 3, 201

    Information access tasks and evaluation for personal lifelogs

    Get PDF
    Emerging personal lifelog (PL) collections contain permanent digital records of information associated with individuals’ daily lives. This can include materials such as emails received and sent, web content and other documents with which they have interacted, photographs, videos and music experienced passively or created, logs of phone calls and text messages, and also personal and contextual data such as location (e.g. via GPS sensors), persons and objects present (e.g. via Bluetooth) and physiological state (e.g. via biometric sensors). PLs can be collected by individuals over very extended periods, potentially running to many years. Such archives have many potential applications including helping individuals recover partial forgotten information, sharing experiences with friends or family, telling the story of one’s life, clinical applications for the memory impaired, and fundamental psychological investigations of memory. The Centre for Digital Video Processing (CDVP) at Dublin City University is currently engaged in the collection and exploration of applications of large PLs. We are collecting rich archives of daily life including textual and visual materials, and contextual context data. An important part of this work is to consider how the effectiveness of our ideas can be measured in terms of metrics and experimental design. While these studies have considerable similarity with traditional evaluation activities in areas such as information retrieval and summarization, the characteristics of PLs mean that new challenges and questions emerge. We are currently exploring the issues through a series of pilot studies and questionnaires. Our initial results indicate that there are many research questions to be explored and that the relationships between personal memory, context and content for these tasks is complex and fascinating

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    Analysis of Hand Segmentation in the Wild

    Full text link
    A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two new datasets: a) EgoYouTubeHands including egocentric videos containing hands in the wild, and b) HandOverFace to analyze the performance of our models in presence of similar appearance occlusions. We further explore whether conditional random fields can help refine generated hand segmentations. To demonstrate the benefit of accurate hand maps, we train a CNN for hand-based activity recognition and achieve higher accuracy when a CNN was trained using hand maps produced by the fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for fine-grained action recognition and show that an accuracy of 58.6% can be achieved by just looking at a single hand pose which is much better than the chance level (12.5%).Comment: Accepted at CVPR 201
    corecore