7,902 research outputs found
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
Outcrop to subsurface stratigraphy of the Pennsylvanian Hermosa Group southern Paradox Basin U.S.A.
Pennsylvanian (Desmoinesian) sedimentary rocks within the Paradox Basin Four Corners area of the western United States afford a unique opportunity to study the development of sedimentary successions in a complex marine to nonmarine depositional setting. The close association of thick intervals of nonmarine fan-delta facies adjacent to and in time equivalent position to marine carbonate-evaporite facies suggests complex relationships between the factors affecting deposition. Development of an effective scheme to differentiate the depositional signatures from within these sedimentary successions is the primary goal of this study. To achieve this goal, two objectives were pursued. The first was to calibrate the diverse range of rock-types in the Hermosa Group to in-situ wellbore measurements. To facilitate this process, a neural network evaluation procedure coupled with standard petrophysical evaluation techniques were employed to aid in facies succession prediction and lateral facies correlation. This process proved to be as accurate as standard wireline analysis procedures and was able to account for variations not as detectable in conventional scheme. The second objective was to correlate the stratigraphy of the Hermosa Group from outcrops of the Animas Valley to the subsurface along the southern Paradox Basin. The key to understanding the depositional sequences within the Middle Pennsylvanian section is to determine spatial and temporal relationships between the evaporites and black-shale deposits associated with carbonate algal mound buildups and juxtaposed terrigenous clastic fan-delta depositional facies. Once the relationships of these facies successions are delineated, then a three dimensional architectural framework can be manipulated to examine all possible lateral facies successions. By utilizing these analyses, several members of the Paradox Formation were shown to be laterally equivalent and physically continuous with parts of the previously designated undifferentiated Honaker Trail Formation of the San Juan Dome region. The study required a rigorous integration process utilizing a digital workstation environment combining large and more diverse datasets than previously utilized for improved correlation control. Techniques for evaluation of facies successions involved core (42), subsurface wells (4000+), and measured sections (12+) were employed
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Patient-specific modelling in orthopedics: from image to surgery
In orthopedic surgery, to decide upon intervention and how it can be optimized, surgeons usually rely on subjective analysis of medical images of the patient, obtained from computed tomography, magnetic resonance imaging, ultrasound or other techniques. Recent advancements in computational performance, image analysis and in silico modeling techniques have started to revolutionize clinical practice through the development of quantitative tools, including patient#specific models aiming at improving clinical diagnosis and surgical treatment. Anatomical and surgical landmarks as well as features extraction can be automated allowing for the creation of general or patient-specific models based on statistical shape models. Preoperative virtual planning and rapid prototyping tools allow the implementation of customized surgical solutions in real clinical environments. In the present chapter we discuss the applications of some of these techniques in orthopedics and present new computer-aided tools that can take us from image analysis to customized surgical treatment
Learning discriminative features for human motion understanding
Human motion understanding has attracted considerable interest in recent research for its applications to video surveillance, content-based search and healthcare. With different capturing methods, human motion can be recorded in various forms (e.g. skeletal data, video, image, etc.). Compared to the 2D video and image, skeletal data recorded by motion capture device contains full 3D movement information. To begin with, we first look into a gait motion analysis problem based on 3D skeletal data. We propose an automatic framework for identifying musculoskeletal and neurological disorders among older people based on 3D skeletal motion data. In this framework, a feature selection strategy and two new gait features are proposed to choose an optimal feature set from the input features to optimise classification accuracy.
Due to self-occlusion caused by single shooting angle, 2D video and image are not able to record full 3D geometric information. Therefore, viewpoint variation dramatically affects the performance on lots of 2D based applications (e.g. arbitrary view action recognition and image-based 3D human shape reconstruction). Leveraging view-invariance from the 3D model is a popular idea to improve the performance on 2D computer vision problems. Therefore, in the second contribution, we adopt 3D models built with computer graphics technology to assist in solving the problem of arbitrary view action recognition. As a solution, a new transfer dictionary learning framework that utilises computer graphics technologies to synthesise realistic 2D and 3D training videos is proposed, which can project a real-world 2D video into a view-invariant sparse representation.
In the third contribution, 3D models are utilised to build an end-to-end 3D human shape reconstruction system, which can recover the 3D human shape from a single image without any prior parametric model. In contrast to most existing methods that calculate 3D joint locations, the method proposed in this thesis can produce a richer and more useful point cloud based representation. Synthesised high-quality 2D images and dense 3D point clouds are used to train a CNN-based encoder and 3D regression module.
It can be concluded that the methods introduced in this thesis try to explore human motion understanding from 3D to 2D. We investigate how to compensate for the lack of full geometric information in 2D based applications with view-invariance learnt from 3D models
- …