126 research outputs found
Transport-Based Neural Style Transfer for Smoke Simulations
Artistically controlling fluids has always been a challenging task.
Optimization techniques rely on approximating simulation states towards target
velocity or density field configurations, which are often handcrafted by
artists to indirectly control smoke dynamics. Patch synthesis techniques
transfer image textures or simulation features to a target flow field. However,
these are either limited to adding structural patterns or augmenting coarse
flows with turbulent structures, and hence cannot capture the full spectrum of
different styles and semantically complex structures. In this paper, we propose
the first Transport-based Neural Style Transfer (TNST) algorithm for volumetric
smoke data. Our method is able to transfer features from natural images to
smoke simulations, enabling general content-aware manipulations ranging from
simple patterns to intricate motifs. The proposed algorithm is physically
inspired, since it computes the density transport from a source input smoke to
a desired target configuration. Our transport-based approach allows direct
control over the divergence of the stylization velocity field by optimizing
incompressible and irrotational potentials that transport smoke towards
stylization. Temporal consistency is ensured by transporting and aligning
subsequent stylized velocities, and 3D reconstructions are computed by
seamlessly merging stylizations from different camera viewpoints.Comment: ACM Transaction on Graphics (SIGGRAPH ASIA 2019), additional
materials: http://www.byungsoo.me/project/neural-flow-styl
Context-driven Object Detection and Segmentation with Auxiliary Information
One fundamental problem in computer vision and robotics is to
localize objects of interest in an image. The task can either be
formulated as an object detection problem if the objects are
described by a set of pose parameters, or an object segmentation
one if we recover object boundary precisely. A key issue in
object detection and segmentation concerns exploiting the spatial
context, as local evidence is often insufficient to determine
object pose in the presence of heavy occlusions or large object
appearance variations. This thesis addresses the object detection
and segmentation problem in such adverse conditions with
auxiliary depth data provided by RGBD cameras. We focus on four
main issues in context-aware object detection and segmentation:
1) what are the effective context representations? 2) how can we
work with limited and imperfect depth data? 3) how to design
depth-aware features and integrate depth cues into conventional
visual inference tasks? 4) how to make use of unlabeled data to
relax the labeling requirements for training data?
We discuss three object detection and segmentation scenarios
based on varying amounts of available auxiliary information. In
the first case, depth data are available for model training but
not available for testing. We propose a structured Hough voting
method for detecting objects with heavy occlusion in indoor
environments, in which we extend the Hough hypothesis space to
include both the object's location, and its visibility pattern.
We design a new score function that accumulates votes for object
detection and occlusion prediction. In addition, we explore the
correlation between objects and their environment, building a
depth-encoded object-context model based on RGBD data. In the
second case, we address the problem of localizing glass objects
with noisy and incomplete depth data. Our method integrates the
intensity and depth information from a single view point, and
builds a Markov Random Field that predicts glass boundary and
region jointly. In addition, we propose a nonparametric,
data-driven label transfer scheme for local glass boundary
estimation. A weighted voting scheme based on a joint feature
manifold is adopted to integrate depth and appearance cues, and
we learn a distance metric on the depth-encoded feature manifold.
In the third case, we make use of unlabeled data to relax the
annotation requirements for object detection and segmentation,
and propose a novel data-dependent margin distribution learning
criterion for boosting, which utilizes the intrinsic geometric
structure of datasets. One key aspect of this method is that it
can seamlessly incorporate unlabeled data by including a graph
Laplacian regularizer. We demonstrate the performance of our
models and compare with baseline methods on several real-world
object detection and segmentation tasks, including indoor object
detection, glass object segmentation and foreground segmentation
in video
A multidisciplinary approach to the study of shape and motion processing and representation in rats
During my PhD I investigated how shape and motion information are processed by the rat visual system, so as to establish how advanced is the representation of higher-order visual information in this species and, ultimately, to understand to what extent rats can present a valuable alternative to monkeys, as experimental models, in vision studies. Specifically, in my thesis work, I have investigated:
1) The possible visual strategies underlying shape recognition.
2) The ability of rat visual cortical areas to represent motion and shape information.
My work contemplated two different, but complementary experimental approaches:
psychophysical measurements of the rat\u2019s recognition ability and strategy, and in vivo extracellular recordings in anaesthetized animals passively exposed to various (static and moving) visual stimulation.
The first approach implied training the rats to an invariant object recognition task, i.e. to tolerate different ranges of transformations in the object\u2019s appearance, and the application of an mage classification technique known as The Bubbles to reveal the visual strategy the animals were able, under different conditions of stimulus discriminability, to adopt in order to perform the task.
The second approach involved electrophysiological exploration of different visual areas in the rat\u2019s cortex, in order to investigate putative functional hierarchies (or streams of processing) in the computation of motion and shape information. Results show, on one hand, that rats are able, under conditions of highly stimulus discriminability, to adopt a shape-based, view-invariant, multi-featural recognition strategy; on the other hand, the functional properties of neurons recorded from different visual areas suggest the presence of a putative shape-based, ventral-like stream of processing in the rat\u2019s visual cortex.
The general purpose of my work is and has been the unveiling the neural mechanisms that make object recognition happen, with the goal of eventually 1) be able to relate my findings on rats to those on more visually-advanced species, such as human and non-human primates; and 2) collect enough biological data to support the artificial simulation of visual recognition processes, which still presents an important scientific challenge
Multiperspective mosaics and layered representation for scene visualization
This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene
A topological solution to object segmentation and tracking
The world is composed of objects, the ground, and the sky. Visual perception
of objects requires solving two fundamental challenges: segmenting visual input
into discrete units, and tracking identities of these units despite appearance
changes due to object deformation, changing perspective, and dynamic occlusion.
Current computer vision approaches to segmentation and tracking that approach
human performance all require learning, raising the question: can objects be
segmented and tracked without learning? Here, we show that the mathematical
structure of light rays reflected from environment surfaces yields a natural
representation of persistent surfaces, and this surface representation provides
a solution to both the segmentation and tracking problems. We describe how to
generate this surface representation from continuous visual input, and
demonstrate that our approach can segment and invariantly track objects in
cluttered synthetic video despite severe appearance changes, without requiring
learning.Comment: 21 pages, 6 main figures, 3 supplemental figures, and supplementary
material containing mathematical proof
Facial Expression Analysis under Partial Occlusion: A Survey
Automatic machine-based Facial Expression Analysis (FEA) has made substantial
progress in the past few decades driven by its importance for applications in
psychology, security, health, entertainment and human computer interaction. The
vast majority of completed FEA studies are based on non-occluded faces
collected in a controlled laboratory environment. Automatic expression
recognition tolerant to partial occlusion remains less understood, particularly
in real-world scenarios. In recent years, efforts investigating techniques to
handle partial occlusion for FEA have seen an increase. The context is right
for a comprehensive perspective of these developments and the state of the art
from this perspective. This survey provides such a comprehensive review of
recent advances in dataset creation, algorithm development, and investigations
of the effects of occlusion critical for robust performance in FEA systems. It
outlines existing challenges in overcoming partial occlusion and discusses
possible opportunities in advancing the technology. To the best of our
knowledge, it is the first FEA survey dedicated to occlusion and aimed at
promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM
Computing Surveys (accepted on 02-Nov-2017
Estimating Correspondences of Deformable Objects “In-the-wild”
During the past few years we have witnessed the development of many methodologies for building and fitting Statistical Deformable Models (SDMs). The construction of accurate SDMs requires careful annotation of images with regards to a consistent set of landmarks. However, the manual annotation of a large amount of images is a tedious, laborious and expensive procedure. Furthermore, for several deformable objects, e.g. human body, it is difficult to define a consistent set of landmarks, and, thus, it becomes impossible to train humans in order to accurately annotate a collection of images. Nevertheless, for the majority of objects, it is possible to extract the shape by object segmentation or even by shape drawing. In this paper, we show for the first time, to the best of our knowledge, that it is possible to construct SDMs by putting object shapes in dense correspondence. Such SDMs can be built with much less effort for a large battery of objects. Additionally, we show that, by sampling the dense model, a part-based SDM can be learned with its parts being in correspondence. We employ our framework to develop SDMs of human arms and legs, which can be used for the segmentation of the outline of the human body, as well as to provide better and more consistent annotations for body joints
Estimating correspondences of deformable objects "in-the-wild"
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordDuring the past few years we have witnessed the development of many methodologies for building and fitting Statistical Deformable Models (SDMs). The construction of accurate SDMs requires careful annotation of images with regards to a consistent set of landmarks. However, the manual annotation of a large amount of images is a tedious, laborious and expensive procedure. Furthermore, for several deformable objects, e.g. human body, it is difficult to define a consistent set of landmarks, and, thus, it becomes impossible to train humans in order to accurately annotate a collection of images. Nevertheless, for the majority of objects, it is possible to extract the shape by object segmentation or even by shape drawing. In this paper, we show for the first time, to the best of our knowledge, that it is possible to construct SDMs by putting object shapes in dense correspondence. Such SDMs can be built with much less effort for a large battery of objects. Additionally, we show that, by sampling the dense model, a part-based SDM can be learned with its parts being in correspondence. We employ our framework to develop SDMs of human arms and legs, which can be used for the segmentation of the outline of the human body, as well as to provide better and more consistent annotations for body joints.Engineering and Physical Sciences Research Council (EPSRC)TekesEuropean Community Horizon 202
Recognizing complex faces and gaits via novel probabilistic models
In the field of computer vision, developing automated systems to recognize people
under unconstrained scenarios is a partially solved problem. In unconstrained sce-
narios a number of common variations and complexities such as occlusion, illumi-
nation, cluttered background and so on impose vast uncertainty to the recognition
process. Among the various biometrics that have been emerging recently, this
dissertation focus on two of them namely face and gait recognition.
Firstly we address the problem of recognizing faces with major occlusions amidst
other variations such as pose, scale, expression and illumination using a novel
PRObabilistic Component based Interpretation Model (PROCIM) inspired by key
psychophysical principles that are closely related to reasoning under uncertainty.
The model basically employs Bayesian Networks to establish, learn, interpret and
exploit intrinsic similarity mappings from the face domain. Then, by incorporating
e cient inference strategies, robust decisions are made for successfully recognizing
faces under uncertainty. PROCIM reports improved recognition rates over recent
approaches.
Secondly we address the newly upcoming gait recognition problem and show that
PROCIM can be easily adapted to the gait domain as well. We scienti cally
de ne and formulate sub-gaits and propose a novel modular training scheme to
e ciently learn subtle sub-gait characteristics from the gait domain. Our results
show that the proposed model is robust to several uncertainties and yields sig-
ni cant recognition performance. Apart from PROCIM, nally we show how a
simple component based gait reasoning can be coherently modeled using the re-
cently prominent Markov Logic Networks (MLNs) by intuitively fusing imaging,
logic and graphs.
We have discovered that face and gait domains exhibit interesting similarity map-
pings between object entities and their components. We have proposed intuitive
probabilistic methods to model these mappings to perform recognition under vari-
ous uncertainty elements. Extensive experimental validations justi es the robust-
ness of the proposed methods over the state-of-the-art techniques.
- …