10 research outputs found

    Activity-driven content adaptation for effective video summarisation

    Get PDF
    In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided

    Overview of Image Processing and Various Compression Schemes

    Get PDF
    Image processing is key research among researchers. Compression of images are required when need of transmission or storage of images. Demand of multimedia growth, contributes to insufficient bandwidth of network and memory storage device. Advance imaging requires capacity of extensive amounts of digitized information. Therefore data compression is more required for reducing data redundancy to save more hardware space and transmission bandwidth. Various techniques are given for image compression. Some of which are discussed in this paper

    A similarity measure between videos using alignment, graphical and speech features

    Get PDF
    A novel video similarity measure is proposed by using visual features, alignment distances and speech transcripts. First, video files are represented by a sequence of segments each of which contains colour histograms, starting time, and a set of phonemes. After, textual, alignment and visual features are extracted of these segments. The following step, bipartite matching and statistical features are applied to find correspondences between segments. Finally, a similarity is calculated between videos. Experiments have been carried out and promising results have been obtained.Ministerio de Ciencia e Innovación TIN2009–14378-C02–0

    Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context

    Full text link

    Scalable content authentication in H.264/SVC videos using perceptual hashing based on Dempster-Shafer theory

    Get PDF

    Efficient video identification based on locality sensitive hashing and triangle inequality

    Get PDF
    Master'sMASTER OF SCIENC

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    Estudio del comportamiento del iDistance en la recuperación de video basada en contenido

    Get PDF
    En este proyecto se presenta el iDistance como método de indexación de datos altamente dimensionales utilizando la técnica reducción de la dimensionalidad y se estudia su comportamiento en un sistema de video basado en contenido (Content-Based Video Retrieval, CBVR). Para poder crear el índice “iDistance” es necesario obtener los puntos de referencia del conjunto de datos dim-dimensional y para ello se va a utilizar una técnica de clusteirng llamada kmeans. Una vez creado el iDistance, este puede incluirse en un CBVR para probar su comportamiento en la identificación de subsecuencias de video, de manera que será el iDistance el que recupere los frames similares para un procesado posterior con el objetivo de la identificación de la subsecuencia de consulta. Para comparar los resultados se ha utilizado otra técnica para resolver la maldición de la dimensionalidad basada en vectores de aproximación, como es el VA-File. Mientras que respecto a la búsqueda de videos, se realizan mejoras para la identificación de subsecuencias de video. En cuanto al contenido, en este proyecto se presentan las características mas importantes de los datos altamente dimensionales así como las métricas utilizadas para su clasificación en distancia. Se introduce el árbol B+ como núcleo en el que está basado el iDistance así como todas las operaciones asociadas a dicha estructura de datos. También se trata la teoría de grafos bipartitos y matching puesto que es imprescindible para la identificación de subsecuencias de video. Posteriormente, se estudia e implementa el iDistance como motor de indexación en bases de datos altamente dimensionales, prestando especial atención en la metodología de indexación y búsqueda en la consulta de los K vecinos más cercanos. Tras dicho estudio se proponen una serie de experimentos con datos de video reales con el objetivo de estudiar el rendimiento con la variación de parámetros clave en la configuración del iDistance. Una vez estudiado el iDistance, se procede a introducir dicho motor de indexación en un sistema de recuperación de video basado en contenido para la identificación de subsecuencias de video. En este proyecto, además, se propone la recuperación de las K mejores subsecuencias en ranking estudiando el comportamiento del acierto de las mismas en una batería de experimentos posterior

    Semantics of video shots for content-based retrieval

    Get PDF
    Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future
    corecore