7,902 research outputs found

    Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

    Full text link
    3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise

    Outcrop to subsurface stratigraphy of the Pennsylvanian Hermosa Group southern Paradox Basin U.S.A.

    Get PDF
    Pennsylvanian (Desmoinesian) sedimentary rocks within the Paradox Basin Four Corners area of the western United States afford a unique opportunity to study the development of sedimentary successions in a complex marine to nonmarine depositional setting. The close association of thick intervals of nonmarine fan-delta facies adjacent to and in time equivalent position to marine carbonate-evaporite facies suggests complex relationships between the factors affecting deposition. Development of an effective scheme to differentiate the depositional signatures from within these sedimentary successions is the primary goal of this study. To achieve this goal, two objectives were pursued. The first was to calibrate the diverse range of rock-types in the Hermosa Group to in-situ wellbore measurements. To facilitate this process, a neural network evaluation procedure coupled with standard petrophysical evaluation techniques were employed to aid in facies succession prediction and lateral facies correlation. This process proved to be as accurate as standard wireline analysis procedures and was able to account for variations not as detectable in conventional scheme. The second objective was to correlate the stratigraphy of the Hermosa Group from outcrops of the Animas Valley to the subsurface along the southern Paradox Basin. The key to understanding the depositional sequences within the Middle Pennsylvanian section is to determine spatial and temporal relationships between the evaporites and black-shale deposits associated with carbonate algal mound buildups and juxtaposed terrigenous clastic fan-delta depositional facies. Once the relationships of these facies successions are delineated, then a three dimensional architectural framework can be manipulated to examine all possible lateral facies successions. By utilizing these analyses, several members of the Paradox Formation were shown to be laterally equivalent and physically continuous with parts of the previously designated undifferentiated Honaker Trail Formation of the San Juan Dome region. The study required a rigorous integration process utilizing a digital workstation environment combining large and more diverse datasets than previously utilized for improved correlation control. Techniques for evaluation of facies successions involved core (42), subsurface wells (4000+), and measured sections (12+) were employed

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Patient-specific modelling in orthopedics: from image to surgery

    Get PDF
    In orthopedic surgery, to decide upon intervention and how it can be optimized, surgeons usually rely on subjective analysis of medical images of the patient, obtained from computed tomography, magnetic resonance imaging, ultrasound or other techniques. Recent advancements in computational performance, image analysis and in silico modeling techniques have started to revolutionize clinical practice through the development of quantitative tools, including patient#specific models aiming at improving clinical diagnosis and surgical treatment. Anatomical and surgical landmarks as well as features extraction can be automated allowing for the creation of general or patient-specific models based on statistical shape models. Preoperative virtual planning and rapid prototyping tools allow the implementation of customized surgical solutions in real clinical environments. In the present chapter we discuss the applications of some of these techniques in orthopedics and present new computer-aided tools that can take us from image analysis to customized surgical treatment

    Learning discriminative features for human motion understanding

    Get PDF
    Human motion understanding has attracted considerable interest in recent research for its applications to video surveillance, content-based search and healthcare. With different capturing methods, human motion can be recorded in various forms (e.g. skeletal data, video, image, etc.). Compared to the 2D video and image, skeletal data recorded by motion capture device contains full 3D movement information. To begin with, we first look into a gait motion analysis problem based on 3D skeletal data. We propose an automatic framework for identifying musculoskeletal and neurological disorders among older people based on 3D skeletal motion data. In this framework, a feature selection strategy and two new gait features are proposed to choose an optimal feature set from the input features to optimise classification accuracy. Due to self-occlusion caused by single shooting angle, 2D video and image are not able to record full 3D geometric information. Therefore, viewpoint variation dramatically affects the performance on lots of 2D based applications (e.g. arbitrary view action recognition and image-based 3D human shape reconstruction). Leveraging view-invariance from the 3D model is a popular idea to improve the performance on 2D computer vision problems. Therefore, in the second contribution, we adopt 3D models built with computer graphics technology to assist in solving the problem of arbitrary view action recognition. As a solution, a new transfer dictionary learning framework that utilises computer graphics technologies to synthesise realistic 2D and 3D training videos is proposed, which can project a real-world 2D video into a view-invariant sparse representation. In the third contribution, 3D models are utilised to build an end-to-end 3D human shape reconstruction system, which can recover the 3D human shape from a single image without any prior parametric model. In contrast to most existing methods that calculate 3D joint locations, the method proposed in this thesis can produce a richer and more useful point cloud based representation. Synthesised high-quality 2D images and dense 3D point clouds are used to train a CNN-based encoder and 3D regression module. It can be concluded that the methods introduced in this thesis try to explore human motion understanding from 3D to 2D. We investigate how to compensate for the lack of full geometric information in 2D based applications with view-invariance learnt from 3D models
    • …
    corecore