5 research outputs found

    Legal multimedia management and semantic annotation for improved search and retrieval

    Get PDF
    In this work, we study the possibilities of multimedia management and automatic annotation focused on legal domain. In this field,professionals are used to consume the most part of their time searching and retrieving legal information. For instance, in scenarios as e-discovery and e-learning search and retrieval of the multimedia contents are the basis of the whole applications. In addition, the legal multimedia explosion increases the need of Store these files in a structured form to facilitate the access to this information in an efficient and effective way. Furthermore, the improvements achieved by sensors and video recorders in the last years increase the size of these files, producing an enormous demand of storage capability.JPEG2000 and MPEG-7 are international standards by the ISO/IEC organization that allow to reduce, in some degrees, the amount of data needed to store these files. These standards also permit to include the semantic annotation in the considered file formats, and to access to this information without the need to decompress the contained vídeo or image. How to obtain the semantic information from multimèdia is also studied as well as the different techniques to exploit and combine this information

    Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces

    No full text

    Video Content Understanding Using Text

    Get PDF
    The rise of the social media and video streaming industry provided us a plethora of videos and their corresponding descriptive information in the form of concepts (words) and textual video captions. Due to the mass amount of available videos and the textual data, today is the best time ever to study the Computer Vision and Machine Learning problems related to videos and text. In this dissertation, we tackle multiple problems associated with the joint understanding of videos and text. We first address the task of multi-concept video retrieval, where the input is a set of words as concepts, and the output is a ranked list of full-length videos. This approach deals with multi-concept input and prolonged length of videos by incorporating multi-latent variables to tie the information within each shot (short clip of a full-video) and across shots. Secondly, we address the problem of video question answering, in which, the task is to answer a question, in the form of Fill-In-the-Blank (FIB), given a video. Answering a question is a task of retrieving a word from a dictionary (all possible words suitable for an answer) based on the input question and video. Following the FIB problem, we introduce a new problem, called Visual Text Correction (VTC), i.e., detecting and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence while benefiting 1D-CNNs/LSTMs to encode short/long term dependencies, and fix it by replacing the inaccurate word(s). Finally, as the last part of the dissertation, we propose to tackle the problem of video generation using user input natural language sentences. Our proposed video generation method constructs two distributions out of the input text, corresponding to the first and last frames latent representations. We generate high-fidelity videos by interpolating latent representations and a sequence of CNN based up-pooling blocks
    corecore