492 research outputs found

    VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos.

    Get PDF
    This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each fragment by evaluating the existence of a number (several hundreds) of high-level visual concepts in the keyframes extracted from these fragments. Through the analysis the tool enables the identification and labeling of semantically coherent video fragments, while its user interfaces allow the discovery of these fragments with the help of human-interpretable concepts. The integrated state-of-the-art video analysis technologies perform very well and, by exploiting the processing capabilities of multi-thread / multi-core architectures, reduce the time required for analysis to approximately one third of the video’s duration, thus making the analysis three times faster than real-time processing

    Casual Information Visualization on Exploring Spatiotemporal Data

    Get PDF
    The goal of this thesis is to study how the diverse data on the Web which are familiar to everyone can be visualized, and with a special consideration on their spatial and temporal information. We introduce novel approaches and visualization techniques dealing with different types of data contents: interactively browsing large amount of tags linking with geospace and time, navigating and locating spatiotemporal photos or videos in collections, and especially, providing visual supports for the exploration of diverse Web contents on arbitrary webpages in terms of augmented Web browsing

    Deliverable D1.6 Intelligent hypervideo analysis evaluation, final results

    Get PDF
    This deliverable describes the conducted evaluation activities for assessing the performance of a number of developed methods for intelligent hypervideo analysis and the usability of the implemented Editor Tool for supporting video annotation and enrichment. Based on the performance evaluations reported in D1.4 regarding a set of LinkedTV analysis components, we extended our experiments for assessing the effectiveness of newer versions of these methods as well as of entirely new techniques, concerning the accuracy and the time efficiency of the analysis. For this purpose, in-house experiments and participations at international benchmarking activities were made, and the outcomes are reported in this deliverable. Moreover, we present the results of user trials regarding the developed Editor Tool, where groups of experts assessed its usability and the supported functionalities, and evaluated the usefulness and the accuracy of the implemented video segmentation approaches based on the analysis requirements of the LinkedTV scenarios. By this deliverable we complete the reporting of WP1 evaluations that aimed to assess the efficiency of the developed multimedia analysis methods throughout the project, according to the analysis requirements of the LinkedTV scenarios

    An MPEG-7 scheme for semantic content modelling and filtering of digital video

    Get PDF
    Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

    Scale-Adaptive Video Understanding.

    Full text link
    The recent rise of large-scale, diverse video data has urged a new era of high-level video understanding. It is increasingly critical for intelligent systems to extract semantics from videos. In this dissertation, we explore the use of supervoxel hierarchies as a type of video representation for high-level video understanding. The supervoxel hierarchies contain rich multiscale decompositions of video content, where various structures can be found at various levels. However, no single level of scale contains all the desired structures we need. It is essential to adaptively choose the scales for subsequent video analysis. Thus, we present a set of tools to manipulate scales in supervoxel hierarchies including both scale generation and scale selection methods. In our scale generation work, we evaluate a set of seven supervoxel methods in the context of what we consider to be a good supervoxel for video representation. We address a key limitation that has traditionally prevented supervoxel scale generation on long videos. We do so by proposing an approximation framework for streaming hierarchical scale generation that is able to generate multiscale decompositions for arbitrarily-long videos using constant memory. Subsequently, we present two scale selection methods that are able to adaptively choose the scales according to application needs. The first method flattens the entire supervoxel hierarchy into a single segmentation that overcomes the limitation induced by trivial selection of a single scale. We show that the selection can be driven by various post hoc feature criteria. The second scale selection method combines the supervoxel hierarchy with a conditional random field for the task of labeling actors and actions in videos. We formulate the scale selection problem and the video labeling problem in a joint framework. Experiments on a novel large-scale video dataset demonstrate the effectiveness of the explicit consideration of scale selection in video understanding. Aside from the computational methods, we present a visual psychophysical study to quantify how well the actor and action semantics in high-level video understanding are retained in supervoxel hierarchies. The ultimate findings suggest that some semantics are well-retained in the supervoxel hierarchies and can be used for further video analysis.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133202/1/cliangxu_1.pd

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    A Systematic Survey of ML Datasets for Prime CV Research Areas-Media and Metadata

    Get PDF
    The ever-growing capabilities of computers have enabled pursuing Computer Vision through Machine Learning (i.e., MLCV). ML tools require large amounts of information to learn from (ML datasets). These are costly to produce but have received reduced attention regarding standardization. This prevents the cooperative production and exploitation of these resources, impedes countless synergies, and hinders ML research. No global view exists of the MLCV dataset tissue. Acquiring it is fundamental to enable standardization. We provide an extensive survey of the evolution and current state of MLCV datasets (1994 to 2019) for a set of specific CV areas as well as a quantitative and qualitative analysis of the results. Data were gathered from online scientific databases (e.g., Google Scholar, CiteSeerX). We reveal the heterogeneous plethora that comprises the MLCV dataset tissue; their continuous growth in volume and complexity; the specificities of the evolution of their media and metadata components regarding a range of aspects; and that MLCV progress requires the construction of a global standardized (structuring, manipulating, and sharing) MLCV "library". Accordingly, we formulate a novel interpretation of this dataset collective as a global tissue of synthetic cognitive visual memories and define the immediately necessary steps to advance its standardization and integration

    Video Content Understanding Using Text

    Get PDF
    The rise of the social media and video streaming industry provided us a plethora of videos and their corresponding descriptive information in the form of concepts (words) and textual video captions. Due to the mass amount of available videos and the textual data, today is the best time ever to study the Computer Vision and Machine Learning problems related to videos and text. In this dissertation, we tackle multiple problems associated with the joint understanding of videos and text. We first address the task of multi-concept video retrieval, where the input is a set of words as concepts, and the output is a ranked list of full-length videos. This approach deals with multi-concept input and prolonged length of videos by incorporating multi-latent variables to tie the information within each shot (short clip of a full-video) and across shots. Secondly, we address the problem of video question answering, in which, the task is to answer a question, in the form of Fill-In-the-Blank (FIB), given a video. Answering a question is a task of retrieving a word from a dictionary (all possible words suitable for an answer) based on the input question and video. Following the FIB problem, we introduce a new problem, called Visual Text Correction (VTC), i.e., detecting and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence while benefiting 1D-CNNs/LSTMs to encode short/long term dependencies, and fix it by replacing the inaccurate word(s). Finally, as the last part of the dissertation, we propose to tackle the problem of video generation using user input natural language sentences. Our proposed video generation method constructs two distributions out of the input text, corresponding to the first and last frames latent representations. We generate high-fidelity videos by interpolating latent representations and a sequence of CNN based up-pooling blocks
    corecore