4,161 research outputs found

    Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

    Full text link
    We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.Comment: Oral paper in BMVC 201

    GPlates – Building a Virtual Earth Through Deep Time

    Get PDF
    GPlates is an open‐source, cross‐platform plate tectonic geographic information system, enabling the interactive manipulation of plate‐tectonic reconstructions and the visualization of geodata through geological time. GPlates allows the building of topological plate models representing the mosaic of evolving plate boundary networks through time, useful for computing plate velocity fields as surface boundary conditions for mantle convection models and for investigating physical and chemical exchanges of material between the surface and the deep Earth along tectonic plate boundaries. The ability of GPlates to visualize subsurface 3‐D scalar fields together with traditional geological surface data enables researchers to analyze their relationships through geological time in a common plate tectonic reference frame. To achieve this, a hierarchical cube map framework is used for rendering reconstructed surface raster data to support the rendering of subsurface 3‐D scalar fields using graphics‐hardware‐accelerated ray‐tracing techniques. GPlates enables the construction of plate deformation zones—regions combining extension, compression, and shearing that accommodate the relative motion between rigid blocks. Users can explore how strain rates, stretching/shortening factors, and crustal thickness evolve through space and time and interactively update the kinematics associated with deformation. Where data sets described by geometries (points, lines, or polygons) fall within deformation regions, the deformation can be applied to these geometries. Together, these tools allow users to build virtual Earth models that quantitatively describe continental assembly, fragmentation and dispersal and are interoperable with many other mapping and modeling tools, enabling applications in tectonics, geodynamics, basin evolution, orogenesis, deep Earth resource exploration, paleobiology, paleoceanography, and paleoclimate

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Deformable Shape Completion with Graph Convolutional Autoencoders

    Full text link
    The availability of affordable and portable depth sensors has made scanning objects and people simpler than ever. However, dealing with occlusions and missing parts is still a significant challenge. The problem of reconstructing a (possibly non-rigidly moving) 3D object from a single or multiple partial scans has received increasing attention in recent years. In this work, we propose a novel learning-based method for the completion of partial shapes. Unlike the majority of existing approaches, our method focuses on objects that can undergo non-rigid deformations. The core of our method is a variational autoencoder with graph convolutional operations that learns a latent space for complete realistic shapes. At inference, we optimize to find the representation in this latent space that best fits the generated shape to the known partial input. The completed shape exhibits a realistic appearance on the unknown part. We show promising results towards the completion of synthetic and real scans of human body and face meshes exhibiting different styles of articulation and partiality.Comment: CVPR 201
    • 

    corecore