4 research outputs found
Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project
The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
Similarity Pyramids for Browsing and Organization of Large Image Databases
The advent of large image databases (>10,000) has created a need for tools which can search and organize images automatically by their content. This paper presents a method for designing a hierarchical browsing environment which we call a similarity pyramid. The similarity pyramid groups similar images together while allowing users to view the database at varying levels of resolution. We show that the similarity pyramid is best constructed using agglomerative (bottom-up) clustering methods, and present a fast-sparse clustering method which dramatically reduces both memory and computation over conventional methods. We then present an objective measure of pyramid organization called dispersion, and we use it to show that our fast-sparse clustering method produces better similarity pyramids than top down approaches
Organising and structuring a visual diary using visual interest point detectors
As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individualâs photographs. Microsoftâs SenseCam, a
device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter.
We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue.
Although there is a significant volume of work in the literature in the object detection and recognition
and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting
detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on
the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate
its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data
Designing and evaluating a user interface for continous embedded lifelogging based on physical context
PhD ThesisAn increase in both personal information and storage capacity has encouraged people to
store and archive their life experience in multimedia formats. The usefulness of such
large amounts of data will remain inadequate without the development of both retrieval
techniques and interfaces that help people access and navigate their personal collections.
The research described in this thesis investigates lifelogging technology from the
perspective of the psychology of memory and human-computer interaction. The
research described seeks to increase my understanding of what data can trigger
memories and how I might use this insight to retrieve past life experiences in interfaces
to lifelogging technology.
The review of memory and previous research on lifelogging technology allows and
support me to establish a clear understanding of how memory works and design novel
and effective memory cues; whilst at the same time I critiqued existing lifelogging
systems and approaches to retrieving memories of past actions and activities. In the
initial experiments I evaluated the design and implementation of a prototype which
exposed numerous problems both in the visualisation of data and usability. These
findings informed the design of novel lifelogging prototype to facilitate retrieval. I
assessed the second prototype and determined how an improved system supported
access and retrieval of usersâ past life experiences, in particular, how users group their
data into events, how they interact with their data, and the classes of memories that it
supported.
In this doctoral thesis I found that visualizing the movements of usersâ hands and
bodies facilitated grouping activities into events when combined with the photos and
other data captured at the same time. In addition, the movements of the user's hand and
body and the movements of some objects can promote an activity recognition or support
user detection and grouping of them into events. Furthermore, the ability to search for
specific movements significantly reduced the amount of time that it took to retrieve data
related to specific events. I revealed three major strategies that users followed to
understand the combined data: skimming sequences, cross sensor jumping and
continued scanning