4,037 research outputs found
Symmetric image registration with directly calculated inverse deformation field
This paper presents a novel technique for a symmetric deformable image registration based on a new method for fast and accurate direct inversion of a large motion model deformation field. The proposed image registration algorithm maintain a one-to-one mapping between registered images by symmetrically warping them to each other, and by ensuring the inverse consistency criterion at each iteration. This makes the final estimation of forward and backward deformation fields anatomically plausible. The quantitative validation of the method has been performed on magnetic resonance data obtained for a pelvis area demonstrating applicability of the method to adaptive prostate radiotherapy. The experiments demonstrate the improved robustness in terms of inverse consistency error when compared to previously proposed methods for symmetric image registration
Structure-aware image denoising, super-resolution, and enhancement methods
Denoising, super-resolution and structure enhancement are classical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.Entrauschen, Superresolution und Strukturverbesserung sind klassische Anwendungen der Bildverarbeitung. Ihre Existenz bedingt sich in dem Bestreben, die visuelle Begutachtung digitaler Bildrohdaten zu unterstützen. Trotz erheblicher Fortschritte in diesen Feldern bedürfen bestimmte schwierige Probleme noch weiterer Forschung. So sind beispielsweise Entrauschungsund Superresolutionsverfahren, welche alle der folgenden Eingenschaften besitzen, sehr selten: die Erhaltung wichtiger Strukturen wie Ecken, Robustheit bezüglich der Rauschverteilung, Vermeidung unerwünschter Artefakte und niedrige Laufzeit. Auch im Gebiet der Strukturverbesserung liegt ein ungelöstes Problem vor: Bisher wurde nur sehr wenig Forschungsaufwand in die Entwicklung von Modellen investieret, welche anisotrope Deformationen in bildgebenden Verfahren bewältigen können. In dieser Arbeit entwerfen wir neue Methoden in Form von partiellen Differentialgleichungen, patch-basierten Ansätzen und Variationsmodellen um die oben erwähnten Hindernisse zu überwinden. In den meisten Fällen übertreffen unsere Methoden nicht nur qualitativ die bisher verwendeten Ansätze, sondern lösen die gestellten Aufgaben auch schneller. Zudem decken wir mit unseren Modellen einen breiteren Bereich praktischer Fragestellungen ab
Recommended from our members
Deep Convolutional Network on Point Clouds for 3D Scene Understanding
As one of the most popular data types, the point cloud is widely used in various appli- cations, including computer vision, computer graphics and robotics. The capability to directly measure 3D point clouds is invaluable in those applications as depth information could remove a lot of the segmentation ambiguities in 2D images. Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. To address this problem, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution ker- nels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks, and density functions through kernel density estima- tion. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space.
The proposed PointConv have opened doors to new 3D-centric approaches to scene understanding. We show how we can adapt and apply PointConv to an important perception problem in robotics: 3D scene flow estimation. We propose a novel end-to- end deep scene flow model, called PointPWC-Net, that directly processes 3D point cloud scenes with large motions in a coarse-to-fine fashion. Flow computed at the coarse level is upsampled and warped to a finer level, enabling the algorithm to accommodate for large motion without a prohibitive search space. We introduce novel cost volume, upsampling, and warping layers to efficiently handle 3D point cloud data. Unlike traditional cost volumes that require exhaustively computing all the cost values on a high-dimensional grid, our point-based formulation discretizes the cost volume onto input 3D points, and a PointConv operation efficiently computes convolutions on the cost volume.
Finally, inspired by the recent development of Transformer, We introduce PointCon- vFormer, a novel building block for point cloud based deep neural network architectures. PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers where the attention computation takes the features into account. In our proposed new operation, feature difference between points in the neighborhood serves as an indicator to re-weight the convolutional weights. Hence, we preserved some of the translation-invariance of the convolution operation whereas taken attention into account to choose the relevant points for convolution. We also explore multi-head mechanisms as well. To validate the effectiveness of PointCon- vFormer, we experiment on both semantic segmentation and scene flow estimation tasks on point clouds with multiple datasets including ScanNet, SemanticKitti, FlyingTh- ings3D and KITTI. Our results show that PointConvFormer substantially outperforms classic convolutions, regular transformers, and voxelized sparse convolution approaches with smaller, more computationally efficient networks
Integration of Static and Self-motion-Based Depth Cues for Efficient Reaching and Locomotor Actions
The common approach to estimate the distance of an object in computer vision and robotics is to use stereo vision. Stereopsis, however, provides good estimates only within near space and thus is more suitable for reaching actions. In order to successfully plan and execute an action in far space, other depth cues must be taken into account. Self-body movements, such as head and eye movements or locomotion can provide rich information of depth. This paper proposes a model for integration of static and self-motion-based depth cues for a humanoid robot. Our results show that self-motion-based visual cues improve the accuracy of distance perception and combined with other depth cues provide the robot with a robust distance estimator suitable for both reaching and walking actions
Bootstrap Based Surface Reconstruction
Surface reconstruction is one of the main research areas in computer graphics. The goal is to find the best surface representation of the boundary of a real object. The typical input of a surface reconstruction algorithm is a point cloud, possibly obtained by a laser 3D scanner. The raw data from the scanner is usually noisy and contains outliers. Apart from creating models of high visual quality, assuring that a model is as faithful as possible to the original object is also one of the main aims of surface reconstruction.
Most surface reconstruction algorithms proposed in the literature assess the reconstructed models either by visual inspection or, in cases where subjective manual input is not possible, by measuring the training error of the model. However, the training error underestimates systematically the test error and encourages overfitting.
In this thesis, we provide a method for quantitative assessment in surface reconstruction. We integrate a model averaging method from statistics called bootstrap and define it into our context. Bootstrapping is a resampling procedure that provides statistical parameter. In surface fitting, we obtained error estimate which detect error caused by noise or bad fitting. We also define bootstrap method in context of normal estimation. We obtain variance and error estimates which we use as a quality measure of normal estimates. As application, we provide smoothing algorithm for point clouds and normal smoothing that can handle feature area. We also developed feature detection algorithm
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
Human and Group Activity Recognition from Video Sequences
A good solution to human activity recognition enables the creation of a wide variety of useful applications such as applications in visual surveillance, vision-based Human-Computer-Interaction (HCI) and gesture recognition.
In this thesis, a graph based approach to human activity recognition is proposed which models spatio-temporal features as contextual space-time graphs.
In this method, spatio-temporal gradient cuboids were extracted at significant regions of activity,
and feature graphs (gradient, space-time, local neighbours, immediate neighbours) are constructed using the similarity matrix.
The Laplacian representation of the graph is utilised to reduce the computational complexity and to allow the use of traditional statistical classifiers.
A second methodology is proposed to detect and localise abnormal activities in crowded scenes.
This approach has two stages: training and identification. During the training stage, specific human activities are identified and characterised
by employing modelling of medium-term movement flow through streaklines. Each streakline is formed by multiple optical flow vectors that represent
and track locally the movement in the scene. A dictionary of activities is recorded for a given scene during the training stage. During the testing stage,
the consistency of each observed activity with those from the dictionary is verified using the Kullback-Leibler (KL) divergence.
The anomaly detection of the proposed methodology is compared to state of the art, producing state of the art results for localising anomalous activities.
Finally, we propose an automatic group activity recognition approach by modelling the interdependencies of
group activity features over time. We propose to model the group
interdependences in both motion and location spaces. These spaces are extended to time-space and time-movement spaces and modelled
using Kernel Density Estimation (KDE).
The recognition performance of the proposed methodology shows an improvement in recognition performance over state of the art results on group activity datasets
- …