311,247 research outputs found

    A lightweight web video model with content and context descriptions for integration with linked data

    Get PDF
    The rapid increase of video data on the Web has warranted an urgent need for effective representation, management and retrieval of web videos. Recently, many studies have been carried out for ontological representation of videos, either using domain dependent or generic schemas such as MPEG-7, MPEG-4, and COMM. In spite of their extensive coverage and sound theoretical grounding, they are yet to be widely used by users. Two main possible reasons are the complexities involved and a lack of tool support. We propose a lightweight video content model for content-context description and integration. The uniqueness of the model is that it tries to model the emerging social context to describe and interpret the video. Our approach is grounded on exploiting easily extractable evolving contextual metadata and on the availability of existing data on the Web. This enables representational homogeneity and a firm basis for information integration among semantically-enabled data sources. The model uses many existing schemas to describe various ontology classes and shows the scope of interlinking with the Linked Data cloud

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Extracting semantics and content adaptive summarisation for effective video retrieval

    Get PDF
    Content-based Information Retrieval (CBIR) has been widely investigated to overcome in text-based systems; Automatic extraction of semantics is one of the fundamental tasks for CBIR applications; It is particularly important to extract objects/semantics from content-rich video sources for effective retrieval; Content-adaptive summarisation is useful in achieving effective data representation and transmission

    An audio-based sports video segmentation and event detection algorithm

    Get PDF
    In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

    Optimized Adaptive Streaming Representations based on System Dynamics

    Get PDF
    Adaptive streaming addresses the increasing and heterogenous demand of multimedia content over the Internet by offering several encoded versions for each video sequence. Each version (or representation) has a different resolution and bit rate, aimed at a specific set of users, like TV or mobile phone clients. While most existing works on adaptive streaming deal with effective playout-control strategies at the client side, we take in this paper a providers' perspective and propose solutions to improve user satisfaction by optimizing the encoding rates of the video sequences. We formulate an integer linear program that maximizes users' average satisfaction, taking into account the network dynamics, the video content information, and the user population characteristics. The solution of the optimization is a set of encoding parameters that permit to create different streams to robustly satisfy users' requests over time. We simulate multiple adaptive streaming sessions characterized by realistic network connections models, where the proposed solution outperforms commonly used vendor recommendations, in terms of user satisfaction but also in terms of fairness and outage probability. The simulation results further show that video content information as well as network constraints and users' statistics play a crucial role in selecting proper encoding parameters to provide fairness a mong users and to reduce network resource usage. We finally propose a few practical guidelines that can be used to choose the encoding parameters based on the user base characteristics, the network capacity and the type of video content

    Video Shot Clustering and Summarization through dendrograms

    Get PDF
    In the context of analysis of video documents, effective clustering of shots facilitates the access to the content and helps in understanding the associated semantics. This paper introduces a cluster analysis on video shots which employs dendrogram representation to produce hierarchical summaries of the video document. Vector quantization codebooks are used to represent the visual content and to group the shots with similar chromatic consistency. The evaluation of the cluster codebook distortions, and the exploitation of the dependency relationships on the dendrogram, allow to obtain only a few significant summaries of the whole video. Finally the user can navigate through summaries and decide which one best suites his/her needs for eventual post processing. The effectiveness of the proposed method is demonstrated, on a collection of different video programmes, in term of metrics that measure the content representational value of the summarization technique

    Extraction of Significant Video Summaries by Dendrogram Analysis

    Get PDF
    In the current video analysis scenario, effective clustering of shots facilitates the access to the content and helps in understanding the associated semantics. This paper introduces a cluster analysis on shots which employs dendrogram representation to produce hierarchical summaries of the video document. Vector quantization codebooks are used to represent the visual content and to group the shots with similar chromatic consistency. The evaluation of the cluster codebook distortions, and the exploitation of the dependency relationships on the dendrograms, allow to obtain only a few significant summaries of the whole video. Finally the user can navigate through summaries and decide which one best suites his/her needs for eventual post-processing. The effectiveness of the proposed method is demonstrated by testing it on a collection of video-data from different kinds of programmes. Results are evaluated in terms of metrics that measure the content representational value of the summarization technique
    • 

    corecore