16 research outputs found

    An Enhanced Spatio-Temporal Human Detected Keyframe Extraction

    Get PDF
    Due to the immense availability of Closed-Circuit Television surveillance, it is quite difficult for crime investigation due to its huge storage and complex background. Content-based video retrieval is an excellent method to identify the best Keyframes from these surveillance videos. As the crime surveillance reports numerous action scenes, the existing keyframe extraction is not exemplary. At this point, the Spatio-temporal Histogram of Oriented Gradients - Support Vector Machine feature method with the combination of Background Subtraction is appended over the recovered crime video to highlight the human presence in surveillance frames. Additionally, the Visual Geometry Group trains these frames for the classification report of human-detected frames. These detected frames are processed to extract the keyframe by manipulating an inter-frame difference with its threshold value to favor the requisite human-detected keyframes. Thus, the experimental results of HOG-SVM illustrate a compression ratio of 98.54%, which is preferable to the proposed work\u27s compression ratio of 98.71%, which supports the criminal investigation

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    A review of content-based video retrieval techniques for person identification

    Get PDF
    The rise of technology spurs the advancement in the surveillance field. Many commercial spaces reduced the patrol guard in favor of Closed-Circuit Television (CCTV) installation and even some countries already used surveillance drone which has greater mobility. In recent years, the CCTV Footage have also been used for crime investigation by law enforcement such as in Boston Bombing 2013 incident. However, this led us into producing huge unmanageable footage collection, the common issue of Big Data era. While there is more information to identify a potential suspect, the massive size of data needed to go over manually is a very laborious task. Therefore, some researchers proposed using Content-Based Video Retrieval (CBVR) method to enable to query a specific feature of an object or a human. Due to the limitations like visibility and quality of video footage, only certain features are selected for recognition based on Chicago Police Department guidelines. This paper presents the comprehensive reviews on CBVR techniques used for clothing, gender and ethnic recognition of the person of interest and how can it be applied in crime investigation. From the findings, the three recognition types can be combined to create a Content-Based Video Retrieval system for person identification

    Improving Content Based Video Retrieval Performance by Using Hadoop-MapReduce Model

    Get PDF
    In this paper, we present a distributed Content- Based Video Retrieval (CBVR) system based on MapReduce pro- gramming model. A CBVR system called Bounded Coordinate of Motion Histogram (BCMH) has been implemented as case study by using Hadoop framework. Our work consists of proposing a distributed model to extract videos signatures and compute similarity with the BCMH system based on a set of Mapreduce jobs assigned to multiple nodes of the Hadoop cluster in order to reduce computation time of training process. The proposed approach is tested on HOLLYWOOD2 dataset and the obtained results demonstrate efïŹciency of the proposed approach

    Signature-based videos’ visual similarity detection and measurement

    Get PDF
    The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR) that still requires time-consuming annotating/tagging to feed the text-based search. Visual similarity is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. Furthermore, handling such amounts of video data, especially the compressed-domain, forces certain challenges for CBVR systems: speed, scalability and genericness. The situation is even more challenging with availability of nonpixelated features, due to compression, e.g. DC/AC coefficients and motion vectors, that requires sophisticated processing. Thus, a careful features’ selection is important to realize the visual similarity based matching within boundaries of the aforementioned challenges. Matching speed is crucial, because most of the current research is biased towards the accuracy and leaves the speed lagging behind, which in many cases affect the practical uses. Scalability is the key for benefiting from these enormous available videos amounts. Genericness is an essential aspect to develop systems that is applicable to, both, compressed and uncompressed videos. This thesis presents a signature-based framework for efficient visual similarity based video matching. The proposed framework represents a vital component for search and retrieval systems, where it could be used in three possible different ways: (1)Directly for CBVR systems where a user submits a query video and the system retrieves a ranked list of visually similar ones. (2)For text-based video retrieval systems, e.g. YouTube, when a user submits a textual description and the system retrieves a ranked list of relevant videos. The retrieval in this case works by finding videos that were manually assigned similar textual description (annotations). For this scenario, the framework could be used to enhance the annotation process. This is achievable by suggesting an annotations-set for the newly uploading videos. These annotations are derived from other visually similar videos that can be retrieved by the proposed framework. In this way, the framework could make annotations more relevant to video contents (compared to the manual way) which improves the overall CBVR systems’ performance as well. (3)The top-N matched list obtained by the framework, could be used as an input to higher layers, e.g. semantic analysis, where it is easier to perform complex processing on this limited set of videos. i The proposed framework contributes and addresses the aforementioned problems, i.e. speed, scalability and genericness, by encoding a given video shot into a single compact fixed-length signature. This signature is able to robustly encode the shot contents for later speedy matching and retrieval tasks. This is in contrast with the current research trend of using an exhaustive complex features/descriptors, e.g. dense trajectories. Moreover, towards a higher matching speed, the framework operates over a sequence of tiny images (DC-images) rather than full size frames. This limits the need to fully decompress compressed-videos, as the DC-images are exacted directly from the compressed stream. The DC-image is highly useful for complex processing, due to its small size compared to the full size frame. In addition, it could be generated from uncompressed videos as well, while the proposed framework is still applicable in the same manner (genericness aspect). Furthermore, for a robust capturing of the visual similarity, scene and motion information are extracted independently, to better address their different characteristics. Scene information is captured using a statistical representation of scene key colours’ profiles, while motion information is captured using a graph-based structure. Then, both information from scene and motion are fused together to generate an overall video signature. The signature’s compact fixedlength aspect contributes to the scalability aspect. This is because, compact fixedlength signatures are highly indexable entities, which facilitates the retrieval process over large-scale video data. The proposed framework is adaptive and provides two different fixed-length video signatures. Both works in a speedy and accurate manner, but with different degrees of matching speed and retrieval accuracy. Such granularity of the signatures is useful to accommodate for different applications’ trade-offs between speed and accuracy. The proposed framework was extensively evaluated using black-box tests for the overall fused signatures and white-box tests for its individual components. The evaluation was done on multiple challenging large-size datasets against a diverse set of state-ofart baselines. The results supported by the quantitative evaluation demonstrated the promisingness of the proposed framework to support real-time applications

    Exploratory search through large video corpora

    Get PDF
    Activity retrieval is a growing field in electrical engineering that specializes in the search and retrieval of relevant activities and events in video corpora. With the affordability and popularity of cameras for government, personal and retail use, the quantity of available video data is rapidly outscaling our ability to reason over it. Towards the end of empowering users to navigate and interact with the contents of these video corpora, we propose a framework for exploratory search that emphasizes activity structure and search space reduction over complex feature representations. Exploratory search is a user driven process wherein a person provides a system with a query describing the activity, event, or object he is interested in finding. Typically, this description takes the implicit form of one or more exemplar videos, but it can also involve an explicit description. The system returns candidate matches, followed by query refinement and iteration. System performance is judged by the run-time of the system and the precision/recall curve of of the query matches returned. Scaling is one of the primary challenges in video search. From vast web-video archives like youtube (1 billion videos and counting) to the 30 million active surveillance cameras shooting an estimated 4 billion hours of footage every week in the United States, trying to find a set of matches can be like looking for a needle in a haystack. Our goal is to create an efficient archival representation of video corpora that can be calculated in real-time as video streams in, and then enables a user to quickly get a set of results that match. First, we design a system for rapidly identifying simple queries in large-scale video corpora. Instead of focusing on feature design, our system focuses on the spatiotemporal relationships between those features as a means of disambiguating an activity of interest from background. We define a semantic feature vocabulary of concepts that are both readily extracted from video and easily understood by an operator. As data streams in, features are hashed to an inverted index and retrieved in constant time after the system is presented with a user's query. We take a zero-shot approach to exploratory search: the user manually assembles vocabulary elements like color, speed, size and type into a graph. Given that information, we perform an initial downsampling of the archived data, and design a novel dynamic programming approach based on genome-sequencing to search for similar patterns. Experimental results indicate that this approach outperforms other methods for detecting activities in surveillance video datasets. Second, we address the problem of representing complex activities that take place over long spans of space and time. Subgraph and graph matching methods have seen limited use in exploratory search because both problems are provably NP-hard. In this work, we render these problems computationally tractable by identifying the maximally discriminative spanning tree (MDST), and using dynamic programming to optimally reduce the archive data based on a custom algorithm for tree-matching in attributed relational graphs. We demonstrate the efficacy of this approach on popular surveillance video datasets in several modalities. Finally, we design an approach for successive search space reduction in subgraph matching problems. Given a query graph and archival data, our algorithm iteratively selects spanning trees from the query graph that optimize the expected search space reduction at each step until the archive converges. We use this approach to efficiently reason over video surveillance datasets, simulated data, as well as large graphs of protein data

    Enhancing the Potential of the Conventional Gaussian Mixture Model for Segmentation: from Images to Videos

    Get PDF
    Segmentation in images and videos has continuously played an important role in image processing, pattern recognition and machine vision. Despite having been studied for over three decades, the problem of segmentation remains challenging yet appealing due to its ill-posed nature. Maintaining spatial coherence, particularly at object boundaries, remains difficult for image segmentation. Extending to videos, maintaining spatial and temporal coherence, even partially, proves computationally burdensome for recent methods. Finally, connecting these two, foreground segmentation, also known as background suppression, suffers from noisy or dynamic backgrounds, slow foregrounds and illumination variations, to name a few. This dissertation focuses more on probabilistic model based segmentation, primarily due to its applicability in images as well as videos, its past success and mainly because it can be enhanced by incorporating spatial and temporal cues. The first part of the dissertation focuses on enhancing conventional GMM for image segmentation using Bilateral filter due to its power of spatial smoothing while preserving object boundaries. Quantitative and qualitative evaluations are done to show the improvements over a number of recent approaches. The later part of the dissertation concentrates on enhancing GMM towards foreground segmentation as a connection between image and video segmentation. First, we propose an efficient way to include multiresolution features in GMM. This novel procedure implicitly incorporates spatial information to improve foreground segmentation by suppressing noisy backgrounds. The procedure is shown with Wavelets, and gradually extended to propose a generic framework to include other multiresolution decompositions. Second, we propose a more accurate foreground segmentation method by enhancing GMM with the use of Adaptive Support Weights and Histogram of Gradients. Extensive analyses, quantitative and qualitative experiments are presented to demonstrate their performances as comparable to other state-of-the-art methods. The final part of the dissertation proposes the novel application of GMM towards spatio-temporal video segmentation connecting spatial segmentation for images and temporal segmentation to extract foreground. The proposed approach has a simple architecture and requires a low amount of memory for processing. The analysis section demonstrates the architectural efficiency over other methods while quantitative and qualitative experiments are carried out to show the competitive performance of the proposed method

    Unsupervised and Semi-supervised Methods for Human Action Analysis

    Get PDF

    Efficient Search and Localization of Human Actions in Video Databases

    Get PDF
    As digital video databases grow, so grows the problem of effectively navigating through them. In this paper we propose a novel content-based video retrieval approach to searching such video databases, specifically those involving human actions, incorporating spatio-temporal localization. We outline a novel, highly efficient localization model that first performs temporal localization based on histograms of evenly spaced time-slices, then spatial localization based on histograms of a 2-D spatial grid. We further argue that our retrieval model, based on the aforementioned localization, followed by relevance ranking, results in a highly discriminative system, while remaining an order of magnitude faster than the current stateof- the-art method. We also show how relevance feedback can be applied to our localization and ranking algorithms. As a result, the presented system is more directly applicable to real-world problems than any prior content-based video retrieval system
    corecore