52 research outputs found

    Activity recognition from videos with parallel hypergraph matching on GPUs

    Full text link
    In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult discrete energy function mixing geometric or structural terms with data attached terms involving appearance features. Traditional methods solve this minimization problem approximately, for instance with spectral techniques. In this work, instead of solving the problem approximatively, the exact solution for the optimal assignment is calculated in parallel on GPUs. The graphical structure is simplified and regularized, which allows to derive an efficient recursive minimization algorithm. The algorithm distributes subproblems over the calculation units of a GPU, which solves them in parallel, allowing the system to run faster than real-time on medium-end GPUs

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Learning Situation Hyper-Graphs for Video Question Answering

    Full text link
    Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a compact structured form. In this work, we propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs, coined Situation Hyper-Graph based Video Question Answering (SHG-VQA). To this end, we train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip. and to use cross-attention between the predicted situation hyper-graphs and the question embedding to predict the correct answer. The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction. The effectiveness of the proposed architecture is extensively evaluated on two challenging benchmarks: AGQA and STAR. Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks

    OpenCog Hyperon: A Framework for AGI at the Human Level and Beyond

    Full text link
    An introduction to the OpenCog Hyperon framework for Artificiai General Intelligence is presented. Hyperon is a new, mostly from-the-ground-up rewrite/redesign of the OpenCog AGI framework, based on similar conceptual and cognitive principles to the previous OpenCog version, but incorporating a variety of new ideas at the mathematical, software architecture and AI-algorithm level. This review lightly summarizes: 1) some of the history behind OpenCog and Hyperon, 2) the core structures and processes underlying Hyperon as a software system, 3) the integration of this software system with the SingularityNET ecosystem's decentralized infrastructure, 4) the cognitive model(s) being experimentally pursued within Hyperon on the hopeful path to advanced AGI, 5) the prospects seen for advanced aspects like reflective self-modification and self-improvement of the codebase, 6) the tentative development roadmap and various challenges expected to be faced, 7) the thinking of the Hyperon team regarding how to guide this sort of work in a beneficial direction ... and gives links and references for readers who wish to delve further into any of these aspects

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    Robust Modular Feature-Based Terrain-Aided Visual Navigation and Mapping

    Get PDF
    The visual feature-based Terrain-Aided Navigation (TAN) system presented in this thesis addresses the problem of constraining inertial drift introduced into the location estimate of Unmanned Aerial Vehicles (UAVs) in GPS-denied environment. The presented TAN system utilises salient visual features representing semantic or human-interpretable objects (roads, forest and water boundaries) from onboard aerial imagery and associates them to a database of reference features created a-priori, through application of the same feature detection algorithms to satellite imagery. Correlation of the detected features with the reference features via a series of the robust data association steps allows a localisation solution to be achieved with a finite absolute bound precision defined by the certainty of the reference dataset. The feature-based Visual Navigation System (VNS) presented in this thesis was originally developed for a navigation application using simulated multi-year satellite image datasets. The extension of the system application into the mapping domain, in turn, has been based on the real (not simulated) flight data and imagery. In the mapping study the full potential of the system, being a versatile tool for enhancing the accuracy of the information derived from the aerial imagery has been demonstrated. Not only have the visual features, such as road networks, shorelines and water bodies, been used to obtain a position ’fix’, they have also been used in reverse for accurate mapping of vehicles detected on the roads into an inertial space with improved precision. Combined correction of the geo-coding errors and improved aircraft localisation formed a robust solution to the defense mapping application. A system of the proposed design will provide a complete independent navigation solution to an autonomous UAV and additionally give it object tracking capability
    • …
    corecore