142 research outputs found
Geometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction
Local learning of sparse image models has proven to be very effective to
solve inverse problems in many computer vision applications. To learn such
models, the data samples are often clustered using the K-means algorithm with
the Euclidean distance as a dissimilarity metric. However, the Euclidean
distance may not always be a good dissimilarity measure for comparing data
samples lying on a manifold. In this paper, we propose two algorithms for
determining a local subset of training samples from which a good local model
can be computed for reconstructing a given input test sample, where we take
into account the underlying geometry of the data. The first algorithm, called
Adaptive Geometry-driven Nearest Neighbor search (AGNN), is an adaptive scheme
which can be seen as an out-of-sample extension of the replicator graph
clustering method for local model learning. The second method, called
Geometry-driven Overlapping Clusters (GOC), is a less complex nonadaptive
alternative for training subset selection. The proposed AGNN and GOC methods
are evaluated in image super-resolution, deblurring and denoising applications
and shown to outperform spectral clustering, soft clustering, and geodesic
distance based subset selection in most settings.Comment: 15 pages, 10 figures and 5 table
Generative Models as Distributions of Functions
Generative models are typically trained on grid-like data such as images. As
a result, the size of these models usually scales directly with the underlying
grid resolution. In this paper, we abandon discretized grids and instead
parameterize individual data points by continuous functions. We then build
generative models by learning distributions over such functions. By treating
data points as functions, we can abstract away from the specific type of data
we train on and construct models that are agnostic to discretization. To train
our model, we use an adversarial approach with a discriminator that acts on
continuous signals. Through experiments on a wide variety of data modalities
including images, 3D shapes and climate data, we demonstrate that our model can
learn rich distributions of functions independently of data type and
resolution.Comment: Added experiments for learning distributions of functions on
manifolds. Added more 3D experiments and comparisons to baseline
Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery
We studied the applicability of point clouds derived from tri-stereo satellite imagery for
semantic segmentation for generalized sparse convolutional neural networks by the example of
an Austrian study area. We examined, in particular, if the distorted geometric information, in addition
to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this
regard, we trained a fully convolutional neural network that uses generalized sparse convolution
one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching),
and twice on 3D geometric as well as color information. In the first experiment, we did not use
class weights, whereas in the second we did. We compared the results with a fully convolutional
neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on
hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color
features. The decision tree using hand-crafted features has been successfully applied to aerial laser
scanning data in the literature. Hence, we compared our main interest of study, a representation
learning technique, with another representation learning technique, and a non-representation learning
technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is
a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily
unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our
study area, we reported that geometric and color information only improves the performance of the
Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a
higher overall performance in our case. We also found that training the network with median class
weighting partially reverts the effects of adding color. The network also started to learn the classes
with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto
generally outperforms the other two with a kappa score of over 90% and an average per class accuracy
of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2%
higher accuracy for roads
Learning based Deep Disentangling Light Field Reconstruction and Disparity Estimation Application
Light field cameras have a wide range of uses due to their ability to
simultaneously record light intensity and direction. The angular resolution of
light fields is important for downstream tasks such as depth estimation, yet is
often difficult to improve due to hardware limitations. Conventional methods
tend to perform poorly against the challenge of large disparity in sparse light
fields, while general CNNs have difficulty extracting spatial and angular
features coupled together in 4D light fields. The light field disentangling
mechanism transforms the 4D light field into 2D image format, which is more
favorable for CNN for feature extraction. In this paper, we propose a Deep
Disentangling Mechanism, which inherits the principle of the light field
disentangling mechanism and further develops the design of the feature
extractor and adds advanced network structure. We design a light-field
reconstruction network (i.e., DDASR) on the basis of the Deep Disentangling
Mechanism, and achieve SOTA performance in the experiments. In addition, we
design a Block Traversal Angular Super-Resolution Strategy for the practical
application of depth estimation enhancement where the input views is often
higher than 2x2 in the experiments resulting in a high memory usage, which can
reduce the memory usage while having a better reconstruction performance
Algorithms for super-resolution of images based on Sparse Representation and Manifolds
lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorTese (Doutorado)Super-resolução de imagens é definido como urna classe de técnicas que melhora a resolução espacial de imagens. Métodos de super-resolução podem ser subdivididos em métodos para urna única imagens e métodos para múltiplas imagens. Esta tese foca no desenvolvimento de algoritmos baseados em teorias matemáticas para problemas de super-resolução de urna única imagem. Com o propósito de estimar urna imagem de saída, nós adotamos urna abordagem mista, ou seja: nós usamos dicionários de patches com restrição de esparsidade (método baseado em aprendizagem) e termos de regularização (método baseado em reconstrução). Embora os métodos existentes sejam eficientes, eles nao levam em consideração a geometria dos dados para: regularizar a solução, clusterizar os dados (dados sao frequentemente clusterizados usando algoritmos com a distancia Euclideana como métrica de dissimilaridade), aprendizado de dicionários (eles sao frequentemente treinados usando PCA ou K-SVD). Portante, os métodos do estado da arte ainda tem algumas deficiencias. Neste trabalho, nós propomos tres métodos originais para superar estas deficiencias. Primeiro, nós desenvolvemos SE-ASDS (um termo de regularização baseado em structure tensor) afim de melhorar a nitidez das bordas das imagens. SE-ASDS alcança resultados muito melhores que os algoritmos do estado da arte. Em seguida, nós propomos os algoritmos AGNN e GOC para determinar um subconjunto de amostras de treinamento a partir das quais um bom modelo local pode ser calculado para reconstruir urna dada amostra de entrada considerando a geometria dos dados. Os métodos AGNN e GOC superamos métodos spectral clustering, soft clustering e os métodos baseados em distancia geodésica na maioria dos casos. Depois, nós propomos o método aSOB que leva em consideração a geometria dos dados e o tamanho do dicionário. O método aSOB supera os métodos PCA e PGA. Finalmente, nós combinamos todos os métodos que propomos em um único algoritmo, a saber, G2SR. Nosso algoritmo G2SR mostra resultados melhores que os métodos do estado da arte em termos de PSRN, SSIM, FSIM e qualidade visual
Learning from the Artist: Theory and Practice of Example-Based Character Deformation
Movie and game production is very laborious, frequently involving hundreds of person-years for a single project. At present this work is difficult to fully automate, since it involves subjective and artistic judgments.
Broadly speaking, in this thesis we explore an approach that works with the artist, accelerating their work without attempting to replace them. More specifically, we describe an “example-based” approach, in which artists provide examples of the desired shapes of the character, and the results gradually improve as more examples are given. Since a character’s skin shape deforms as the pose or expression changes, or particular problem will be termed character deformation.
The overall goal of this thesis is to contribute a complete investigation and development of an example-based approach to character deformation. A central observation guiding this research is that character animation can be formulated as a high-dimensional problem, rather than the two- or three-dimensional viewpoint that is commonly adopted in computer graphics. A second observation guiding our inquiry is that statistical learning concepts are relevant. We show that example-based character animation algorithms can be informed, developed, and improved using these observations.
This thesis provides definitive surveys of example-based facial and body skin deformation.
This thesis analyzes the two leading families of example-based character deformation algorithms from the point of view of statistical regression. In doing so we show that a wide variety of existing tools in machine learning are applicable to our problem. We also identify several techniques that are not suitable due to the nature of the training data, and the high-dimensional nature of this regression problem. We evaluate the design decisions underlying these example-based algorithms, thus providing the groundwork for a ”best practice” choice of specific algorithms.
This thesis develops several new algorithms for accelerating example-based facial animation. The first algorithm allows unspecified degrees of freedom to be automatically determined based on the style of previous, completed animations. A second algorithm allows rapid editing and control of the process of transferring motion capture of a human actor to a computer graphics character.
The thesis identifies and develops several unpublished relations between the underlying mathematical techniques.
Lastly, the thesis provides novel tutorial derivations of several mathematical concepts, using only the linear algebra tools that are likely to be familiar to experts in computer graphics.
Portions of the research in this thesis have been published in eight papers, with two appearing in premier forums in the field
Blur aware metric depth estimation with multi-focus plenoptic cameras
While a traditional camera only captures one point of view of a scene, a
plenoptic or light-field camera, is able to capture spatial and angular
information in a single snapshot, enabling depth estimation from a single
acquisition. In this paper, we present a new metric depth estimation algorithm
using only raw images from a multi-focus plenoptic camera. The proposed
approach is especially suited for the multi-focus configuration where several
micro-lenses with different focal lengths are used. The main goal of our blur
aware depth estimation (BLADE) approach is to improve disparity estimation for
defocus stereo images by integrating both correspondence and defocus cues. We
thus leverage blur information where it was previously considered a drawback.
We explicitly derive an inverse projection model including the defocus blur
providing depth estimates up to a scale factor. A method to calibrate the
inverse model is then proposed. We thus take into account depth scaling to
achieve precise and accurate metric depth estimates. Our results show that
introducing defocus cues improves the depth estimation. We demonstrate the
effectiveness of our framework and depth scaling calibration on relative depth
estimation setups and on real-world 3D complex scenes with ground truth
acquired with a 3D lidar scanner.Comment: 21 pages, 12 Figures, 3 Table
Context-Patch Face Hallucination Based on Thresholding Locality-Constrained Representation and Reproducing Learning
Face hallucination is a technique that reconstruct high-resolution (HR) faces from low-resolution (LR) faces, by using the prior knowledge learned from HR/LR face pairs. Most state-of-the-arts leverage position-patch prior knowledge of human face to estimate the optimal representation coefficients for each image patch. However, they focus only the position information and usually ignore the context information of image patch. In addition, when they are confronted with misalignment or the Small Sample Size (SSS) problem, the hallucination performance is very poor. To this end, this study incorporates the contextual information of image patch and proposes a powerful and efficient context-patch based face hallucination approach, namely Thresholding Locality-constrained Representation and Reproducing learning (TLcR-RL). Under the context-patch based framework, we advance a thresholding based representation method to enhance the reconstruction accuracy and reduce the computational complexity. To further improve the performance of the proposed algorithm, we propose a promotion strategy called reproducing learning. By adding the estimated HR face to the training set, which can simulates the case that the HR version of the input LR face is present in the training set, thus iteratively enhancing the final hallucination result. Experiments demonstrate that the proposed TLcR-RL method achieves a substantial increase in the hallucinated results, both subjectively and objectively. Additionally, the proposed framework is more robust to face misalignment and the SSS problem, and its hallucinated HR face is still very good when the LR test face is from the real-world. The MATLAB source code is available at https://github.com/junjun-jiang/TLcR-RL
- …