7,802 research outputs found
Storytelling Maps Classification
Interactive maps play an increasingly important part in various fields including journalism, education, traveling, and entertainment, among others. Interactive maps require interactive user engagement. The scope of this interaction can vary from the basic mouse scroll to the complex logical sequence of steps including extensive toolboxes. One of the major applications of interactive maps is in storytelling. A map serves as a powerful tool to tell a story and modern technologies make this tool flexible and potent.
This research seeks to analyze and compare cartographic JavaScript APIs and libraries, and to classify storytelling maps with a concentration on ‘path visualization’ type of maps and their technical implementation with an extensive review regarding the function of maps-related API/libraries work under the hood and their improvement. Additional material to this work, a web platform, demonstrates an example of each class and subclass of the classification. The proposed classification has been evaluated by reviewers working with interactive storytelling maps.
The web platform provides examples discussed in this work. The main chapter always references this platform, therefore readers have to adopt the web platform as an essential part of this work during reading: https://konstantinbiryukov.github.io/storytelling-classification/
A neural network approach to audio-assisted movie dialogue detection
A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%
High-level feature detection from video in TRECVid: a 5-year retrospective of achievements
Successful and effective content-based access to digital
video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like
colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip.
The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work
done on the TRECVid high-level feature task, showing the
progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can
achieve large-scale, fast and reliable high-level feature detection on video
Multimedia Annotation Interoperability Framework
Multimedia systems typically contain digital documents of mixed media types, which are indexed on the basis of strongly divergent metadata standards. This severely hamplers the inter-operation of such systems. Therefore, machine understanding of metadata comming from different applications is a basic requirement for the inter-operation of distributed Multimedia systems. In this document, we present how interoperability among metadata, vocabularies/ontologies and services is enhanced using Semantic Web technologies. In addition, it provides guidelines for semantic interoperability, illustrated by use cases. Finally, it presents an overview of the most commonly used metadata standards and tools, and provides the general research direction for semantic interoperability using Semantic Web technologies
Efficient Analysis in Multimedia Databases
The rapid progress of digital technology has led to a situation
where computers have become ubiquitous tools. Now we can find them
in almost every environment, be it industrial or even private. With
ever increasing performance computers assumed more and more vital
tasks in engineering, climate and environmental research, medicine
and the content industry. Previously, these tasks could only be
accomplished by spending enormous amounts of time and money. By
using digital sensor devices, like earth observation satellites,
genome sequencers or video cameras, the amount and complexity of
data with a spatial or temporal relation has gown enormously. This
has led to new challenges for the data analysis and requires the use
of modern multimedia databases.
This thesis aims at developing efficient techniques for the analysis
of complex multimedia objects such as CAD data, time series and
videos. It is assumed that the data is modeled by commonly used
representations. For example CAD data is represented as a set of
voxels, audio and video data is represented as multi-represented,
multi-dimensional time series.
The main part of this thesis focuses on finding efficient methods
for collision queries of complex spatial objects. One way to speed
up those queries is to employ a cost-based decompositioning,
which uses interval groups to approximate a spatial object. For
example, this technique can be used for the Digital Mock-Up (DMU)
process, which helps engineers to ensure short product cycles. This
thesis defines and discusses a new similarity measure for time
series called threshold-similarity. Two time series are
considered similar if they expose a similar behavior regarding the
transgression of a given threshold value. Another part of the thesis
is concerned with the efficient calculation of reverse
k-nearest neighbor (RkNN) queries in general metric spaces
using conservative and progressive approximations. The aim of such
RkNN queries is to determine the impact of single objects on the
whole database. At the end, the thesis deals with video
retrieval and hierarchical genre classification of music
using multiple representations. The practical relevance of the
discussed genre classification approach is highlighted with a
prototype tool that helps the user to organize large music
collections.
Both the efficiency and the effectiveness of the presented
techniques are thoroughly analyzed. The benefits over traditional
approaches are shown by evaluating the new methods on real-world
test datasets
The enhanced ebook: Its past, present, and future place in the North American publishing industry
The enhanced ebook format—an ebook featuring multimedia elements such as audio, video, and animations—was released in 2010, yet it has been largely unused. Despite its potential, only 23% of publishers in Canada produced an enhanced ebook each year between 2014 and 2017.1 The format can excel in the scholarly/professional, trade/consumer, and educational/K to 12 marketplaces; however, it is held back by the same hurdles that halted its progress in 2010. Poor retailer and device support, lack of classification and discoverability, slow consumer adoption, and caution from publishers to invest were, and still are, roadblocks that inhibit the enhanced ebook format from gaining popularity. In an effort to understand why the enhanced ebook format has not gained traction, this report will assess the enhanced ebook format, and its past, present, and future place in the North American publishing industry
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Background-tracking acoustic features for genre identification of broadcast shows
This paper presents a novel method for extracting acoustic features that characterise the background environment in audio recordings. These features are based on the output of an alignment that fits multiple parallel background-based Constrained Maximum Likelihood Linear Regression transformations asynchronously to the input audio signal. With this setup, the resulting features can track changes in the audio background like appearance and disappearance of music, applause or laughter, independently of the speakers in the foreground of the audio. The ability to provide this type of acoustic description in audiovisual data has many potential applications, including automatic classification of broadcast archives or improving automatic transcription and subtitling. In this paper, the performance of these features in a genre identification task in a set of 332 BBC shows is explored. The proposed background-tracking features outperform short-term Perceptual Linear Prediction features in this task using Gaussian Mixture Model classifiers (62% vs 72% accuracy). The use of more complex classifiers, Hidden Markov Models and Support Vector Machines, increases the performance of the system with the novel background-tracking features to 79% and 81% in accuracy respectively
- …