114,477 research outputs found

    QUERY CLIP GENRE RECOGNITION USING TREE PRUNING TECHNIQUE FOR VIDEO RETRIEVAL

    Get PDF
    ABSTRACT Optimal efficiency of the retrieval techniques depends on the search methodologies that are used in the data retrieving system. The use of inappropriate search methodologies may make the retrieval system ineffective. In recent years, the multimedia storage grows and the cost for storing multimedia data is cheaper. So there is huge number of videos available in the video repositories. It is difficult to retrieve the relevant videos from large video repository as per user interest. Hence, an effective video and retrieval system based on recognition is essential for searching video relevant to user query from a huge collection of videos. An approach, which retrieves video from repository by recognizing genre of user query clip is presented. The method extracts regions of interest from every frame of query clip based on motion descriptors. These regions of interest are considered as objects and are compared with similar objects from knowledge base prepared from various genre videos for object recognition and a tree pruning technique is employed to do genre recognition of query clip. Further the method retrieves videos of same genre from repository. The method is evaluated by experimentation over data set containing three genres i.e. sports movie and news videos. Experimental results indicate that the proposed algorithm is effective in genre recognition and retrieval

    A Smart Browsing System with Colour Image Enhancement for Surveillance Videos

    Get PDF
    Surveillance cameras have been widely installed in large cities to monitor and record human activities for different applications. Since surveillance cameras often record all events 24 hours/day, it necessarily takes huge workforce watching surveillance videos to search for specific targets, thus a system that helps the user quickly look for targets of interest is highly demanded. This paper proposes a smart surveillance video browsing system with colour image enhancement. The basic idea is to collect all of moving objects which carry the most significant information in surveillance videos to construct a corresponding compact video by tuning positions of these moving objects. The compact video rearranges the spatiotemporal coordinates of moving objects to enhance the compression, but the temporal relationships among moving objects are still kept. The compact video can preserve the essential activities involved in the original surveillance video. This paper presents the details of browsing system and the approach to producing the compact video from a source surveillance video. At the end we will get the compact video with high resolution. DOI: 10.17762/ijritcc2321-8169.15038

    Optimizing Neural Architecture Search using Limited GPU Time in a Dynamic Search Space: A Gene Expression Programming Approach

    Full text link
    Efficient identification of people and objects, segmentation of regions of interest and extraction of relevant data in images, texts, audios and videos are evolving considerably in these past years, which deep learning methods, combined with recent improvements in computational resources, contributed greatly for this achievement. Although its outstanding potential, development of efficient architectures and modules requires expert knowledge and amount of resource time available. In this paper, we propose an evolutionary-based neural architecture search approach for efficient discovery of convolutional models in a dynamic search space, within only 24 GPU hours. With its efficient search environment and phenotype representation, Gene Expression Programming is adapted for network's cell generation. Despite having limited GPU resource time and broad search space, our proposal achieved similar state-of-the-art to manually-designed convolutional networks and also NAS-generated ones, even beating similar constrained evolutionary-based NAS works. The best cells in different runs achieved stable results, with a mean error of 2.82% in CIFAR-10 dataset (which the best model achieved an error of 2.67%) and 18.83% for CIFAR-100 (best model with 18.16%). For ImageNet in the mobile setting, our best model achieved top-1 and top-5 errors of 29.51% and 10.37%, respectively. Although evolutionary-based NAS works were reported to require a considerable amount of GPU time for architecture search, our approach obtained promising results in little time, encouraging further experiments in evolutionary-based NAS, for search and network representation improvements.Comment: Accepted for presentation at the IEEE Congress on Evolutionary Computation (IEEE CEC) 202

    Association rules implementation for affinity analysis between elements composing multimedia objects

    Get PDF
    The multimedia objects are a constantly growing resource in the world wide web, consequently it has generated as a necessity the design of methods and tools that allow to obtain new knowledge from the information analyzed. Association rules are a technique of Data Mining, whose purpose is to search for correlations between elements of a collection of data (data) as support for decision making from the identification and analysis of these correlations. Using algorithms such as: A priori, Frequent Parent Growth, QFP Algorithm, CBA, CMAR, CPAR, among others. On the other hand, multimedia applications today require the processing of unstructured data provided by multimedia objects, which are made up of text, images, audio and videos. For the storage, processing and management of multimedia objects, solutions have been generated that allow efficient search of data of interest to the end user, considering that the semantics of a multimedia object must be expressed by all the elements that composed of. In this article an analysis of the state of the art in relation to the implementation of the Association Rules in the processing of Multimedia objects is made, in addition the analysis of the consulted literature allows to generate questions about the possibility of generating a method of association rules for the analysis of these objects.Universidad de la Costa, Universidad Pontificia Bolivariana

    Searching surveillance video contents using convolutional neural network

    Get PDF
    Manual video inspection, searching, and analyzing is exhausting and inefficient. This paper presents an intelligent system to search surveillance video contents using deep learning. The proposed system reduced the amount of work that is needed to perform video searching and improved the speed and accuracy. A pre-trained VGG-16 CNNs model is used for dataset training. In addition, key frames of videos were extracted in order to save space, reduce the amount of work, and reduce the execution time. The extracted key frames were processed using the sobel operator edge detector and the max-pooling in order to eliminate redundancy. This increases compaction and avoids similarities between extracted frames. A text file, that contains key frame index, time of occurrence, and the classification of the VGG-16 model, is produced. The text file enables humans to easily search for objects of interest. VIRAT and IVY LAB datasets were used in the experiments. In addition, 128 different classes were identified in the datasets. The classes represent important objects for surveillance systems. However, users can identify other classes and utilize the proposed methodology. Experiments and evaluation showed that the proposed system outperformed existing methods in an order of magnitude. The system achieved the best results in speed while providing a high accuracy in classification

    Annotation of multimedia learning materials for semantic search

    Get PDF
    Multimedia is the main source for online learning materials, such as videos, slides and textbooks, and its size is growing with the popularity of online programs offered by Universities and Massive Open Online Courses (MOOCs). The increasing amount of multimedia learning resources available online makes it very challenging to browse through the materials or find where a specific concept of interest is covered. To enable semantic search on the lecture materials, their content must be annotated and indexed. Manual annotation of learning materials such as videos is tedious and cannot be envisioned for the growing quantity of online materials. One of the most commonly used methods for learning video annotation is to index the video, based on the transcript obtained from translating the audio track of the video into text. Existing speech to text translators require extensive training especially for non-native English speakers and are known to have low accuracy. This dissertation proposes to index the slides, based on the keywords. The keywords extracted from the textbook index and the presentation slides are the basis of the indexing scheme. Two types of lecture videos are generally used (i.e., classroom recording using a regular camera or slide presentation screen captures using specific software) and their quality varies widely. The screen capture videos, have generally a good quality and sometimes come with metadata. But often, metadata is not reliable and hence image processing techniques are used to segment the videos. Since the learning videos have a static background of slide, it is challenging to detect the shot boundaries. Comparative analysis of the state of the art techniques to determine best feature descriptors suitable for detecting transitions in a learning video is presented in this dissertation. The videos are indexed with keywords obtained from slides and a correspondence is established by segmenting the video temporally using feature descriptors to match and align the video segments with the presentation slides converted into images. The classroom recordings using regular video cameras often have poor illumination with objects partially or totally occluded. For such videos, slide localization techniques based on segmentation and heuristics is presented to improve the accuracy of the transition detection. A region prioritized ranking mechanism is proposed that integrates the keyword location in the presentation into the ranking of the slides when searching for a slide that covers a given keyword. This helps in getting the most relevant results first. With the increasing size of course materials gathered online, a user looking to understand a given concept can get overwhelmed. The standard way of learning and the concept of “one size fits all” is no longer the best way to learn for millennials. Personalized concept recommendation is presented according to the user’s background knowledge. Finally, the contributions of this dissertation have been integrated into the Ultimate Course Search (UCS), a tool for an effective search of course materials. UCS integrates presentation, lecture videos and textbook content into a single platform with topic based search capabilities and easy navigation of lecture materials

    A framework for automatic semantic video annotation

    Get PDF
    The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation
    corecore