238 research outputs found
Designing an interface for a digital movie browsing system in the film studies domain
This article explains our work in designing an interface for a digital movie browsing system in the specific application context of film studies. The development of MOVIEBROWSER2 follows some general design guidelines based on an earlier user study with film studies students at Dublin City University. These design guidelines have been used as an input to the MOVIEBROWSER2 system design. The rationale for the interface design decisions has been elaborated. An experiment has been carried out among film studies student, together with a one-semester trial deployment. The results show positive feedback and a better performance in the students’ essay outcome with higher perceived satisfaction level
An Outlook into the Future of Egocentric Vision
What will the future be? We wonder! In this survey, we explore the gap
between current research in egocentric vision and the ever-anticipated future,
where wearable computing, with outward facing cameras and digital overlays, is
expected to be integrated in our every day lives. To understand this gap, the
article starts by envisaging the future through character-based stories,
showcasing through examples the limitations of current technology. We then
provide a mapping between this future and previously defined research tasks.
For each task, we survey its seminal works, current state-of-the-art
methodologies and available datasets, then reflect on shortcomings that limit
its applicability to future research. Note that this survey focuses on software
models for egocentric vision, independent of any specific hardware. The paper
concludes with recommendations for areas of immediate explorations so as to
unlock our path to the future always-on, personalised and life-enhancing
egocentric vision.Comment: We invite comments, suggestions and corrections here:
https://openreview.net/forum?id=V3974SUk1
Smart PIN: utility-based replication and delivery of multimedia content to mobile users in wireless networks
Next generation wireless networks rely on heterogeneous connectivity technologies to support various rich media services such as personal information storage, file sharing and multimedia streaming. Due to users’ mobility and dynamic characteristics of wireless networks, data availability in collaborating devices is a critical issue. In this context Smart PIN was proposed as a personal information network which focuses on performance of delivery and cost efficiency. Smart PIN uses a novel data replication scheme based on individual and overall system utility to best balance the requirements for static data and multimedia content delivery with variable device availability due to user mobility. Simulations show improved results in comparison with other general purpose data replication schemes in terms of data availability
Recommended from our members
User-centred video abstraction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThe rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques
Indexing of fictional video content for event detection and summarisation
This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach
Automatic movie analysis and summarisation
Automatic movie analysis is the task of employing Machine Learning methods to the
field of screenplays, movie scripts, and motion pictures to facilitate or enable various
tasks throughout the entirety of a movie’s life-cycle. From helping with making
informed decisions about a new movie script with respect to aspects such as its originality,
similarity to other movies, or even commercial viability, all the way to offering
consumers new and interesting ways of viewing the final movie, many stages in the
life-cycle of a movie stand to benefit from Machine Learning techniques that promise
to reduce human effort, time, or both. Within this field of automatic movie analysis,
this thesis addresses the task of summarising the content of screenplays, enabling users
at any stage to gain a broad understanding of a movie from greatly reduced data. The
contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale
data set of original movie scripts, annotated with additional meta-information such as
genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script-
Base is the largest data set of its kind, containing scripts and information for almost
1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the
screenplay domain, which allows for extraction of highly informative and important
scenes from movie scripts. The extracted summaries allow for the content of the original
script to stay largely intact and provide the user with its important parts, while
greatly reducing the script-reading time. (iii) We extend our summarisation model
to capture additional modalities beyond the screenplay text. The model is rendered
multi-modal by introducing visual information obtained from the actual movie and by
extracting scenes from the movie, allowing users to generate visual summaries of motion
pictures. (iv) We devise a novel end-to-end neural network model for generating
natural language screenplay overviews. This model enables the user to generate short
descriptive and informative texts that capture certain aspects of a movie script, such as
its genres, approximate content, or style, allowing them to gain a fast, high-level understanding
of the screenplay. Multiple automatic and human evaluations were carried
out to assess the performance of our models, demonstrating that they are well-suited
for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the
ScriptBase data set has started to gain traction, and is currently used by a number of
other researchers in the field to tackle various tasks relating to screenplays and their
analysis
Using facial recognition services as implicit feedback for recommenders
User authentication and feedback gathering are crucial aspects
for recommender systems. The most common implementations, a
username / password login and star rating systems, require user
interaction and a cognitive effort from the user. As a result, users
opt to save their password in the interface and optional feedback
with a star rating system is often skipped, especially for applica-
tions such as video watching in a home environment. In this article,
we propose an alternative method for user authentication based
on facial recognition and an automatic feedback gathering method
by detecting various face characteristics. Using facial recognition
with a camera in a tablet, smartphone, or smart TV, the persons in
front of the screen can be identified in order to link video watch-
ing sessions to their user profile. During video watching, implicit
feedback is automatically gathered through emotion recognition,
attention measurements, and behavior analysis. An emotion finger-
print, which is defined as a unique spectrum of expected emotions
for a video scene, is compared to the recognized emotions in order
to estimate the experience of a user while watching. An evaluation
with a test panel showed that happiness can be most accurately
detected and the recognized emotions are correlated with the user’s
star rating
Linked Data Entity Summarization
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
- …