7,802 research outputs found

    Storytelling Maps Classification

    Get PDF
    Interactive maps play an increasingly important part in various fields including journalism, education, traveling, and entertainment, among others. Interactive maps require interactive user engagement. The scope of this interaction can vary from the basic mouse scroll to the complex logical sequence of steps including extensive toolboxes. One of the major applications of interactive maps is in storytelling. A map serves as a powerful tool to tell a story and modern technologies make this tool flexible and potent. This research seeks to analyze and compare cartographic JavaScript APIs and libraries, and to classify storytelling maps with a concentration on ‘path visualization’ type of maps and their technical implementation with an extensive review regarding the function of maps-related API/libraries work under the hood and their improvement. Additional material to this work, a web platform, demonstrates an example of each class and subclass of the classification. The proposed classification has been evaluated by reviewers working with interactive storytelling maps. The web platform provides examples discussed in this work. The main chapter always references this platform, therefore readers have to adopt the web platform as an essential part of this work during reading: https://konstantinbiryukov.github.io/storytelling-classification/

    A neural network approach to audio-assisted movie dialogue detection

    Get PDF
    A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

    Multimedia Annotation Interoperability Framework

    Get PDF
    Multimedia systems typically contain digital documents of mixed media types, which are indexed on the basis of strongly divergent metadata standards. This severely hamplers the inter-operation of such systems. Therefore, machine understanding of metadata comming from different applications is a basic requirement for the inter-operation of distributed Multimedia systems. In this document, we present how interoperability among metadata, vocabularies/ontologies and services is enhanced using Semantic Web technologies. In addition, it provides guidelines for semantic interoperability, illustrated by use cases. Finally, it presents an overview of the most commonly used metadata standards and tools, and provides the general research direction for semantic interoperability using Semantic Web technologies

    Efficient Analysis in Multimedia Databases

    Get PDF
    The rapid progress of digital technology has led to a situation where computers have become ubiquitous tools. Now we can find them in almost every environment, be it industrial or even private. With ever increasing performance computers assumed more and more vital tasks in engineering, climate and environmental research, medicine and the content industry. Previously, these tasks could only be accomplished by spending enormous amounts of time and money. By using digital sensor devices, like earth observation satellites, genome sequencers or video cameras, the amount and complexity of data with a spatial or temporal relation has gown enormously. This has led to new challenges for the data analysis and requires the use of modern multimedia databases. This thesis aims at developing efficient techniques for the analysis of complex multimedia objects such as CAD data, time series and videos. It is assumed that the data is modeled by commonly used representations. For example CAD data is represented as a set of voxels, audio and video data is represented as multi-represented, multi-dimensional time series. The main part of this thesis focuses on finding efficient methods for collision queries of complex spatial objects. One way to speed up those queries is to employ a cost-based decompositioning, which uses interval groups to approximate a spatial object. For example, this technique can be used for the Digital Mock-Up (DMU) process, which helps engineers to ensure short product cycles. This thesis defines and discusses a new similarity measure for time series called threshold-similarity. Two time series are considered similar if they expose a similar behavior regarding the transgression of a given threshold value. Another part of the thesis is concerned with the efficient calculation of reverse k-nearest neighbor (RkNN) queries in general metric spaces using conservative and progressive approximations. The aim of such RkNN queries is to determine the impact of single objects on the whole database. At the end, the thesis deals with video retrieval and hierarchical genre classification of music using multiple representations. The practical relevance of the discussed genre classification approach is highlighted with a prototype tool that helps the user to organize large music collections. Both the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on real-world test datasets

    The enhanced ebook: Its past, present, and future place in the North American publishing industry

    Get PDF
    The enhanced ebook format—an ebook featuring multimedia elements such as audio, video, and animations—was released in 2010, yet it has been largely unused. Despite its potential, only 23% of publishers in Canada produced an enhanced ebook each year between 2014 and 2017.1 The format can excel in the scholarly/professional, trade/consumer, and educational/K to 12 marketplaces; however, it is held back by the same hurdles that halted its progress in 2010. Poor retailer and device support, lack of classification and discoverability, slow consumer adoption, and caution from publishers to invest were, and still are, roadblocks that inhibit the enhanced ebook format from gaining popularity. In an effort to understand why the enhanced ebook format has not gained traction, this report will assess the enhanced ebook format, and its past, present, and future place in the North American publishing industry

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Background-tracking acoustic features for genre identification of broadcast shows

    Get PDF
    This paper presents a novel method for extracting acoustic features that characterise the background environment in audio recordings. These features are based on the output of an alignment that fits multiple parallel background-based Constrained Maximum Likelihood Linear Regression transformations asynchronously to the input audio signal. With this setup, the resulting features can track changes in the audio background like appearance and disappearance of music, applause or laughter, independently of the speakers in the foreground of the audio. The ability to provide this type of acoustic description in audiovisual data has many potential applications, including automatic classification of broadcast archives or improving automatic transcription and subtitling. In this paper, the performance of these features in a genre identification task in a set of 332 BBC shows is explored. The proposed background-tracking features outperform short-term Perceptual Linear Prediction features in this task using Gaussian Mixture Model classifiers (62% vs 72% accuracy). The use of more complex classifiers, Hidden Markov Models and Support Vector Machines, increases the performance of the system with the novel background-tracking features to 79% and 81% in accuracy respectively

    Multimedia Retrieval

    Get PDF
    corecore