20,960 research outputs found
Efficient Algorithms for Clustering and Interpolation of Large Spatial Data Sets
Categorizing, analyzing, and integrating large spatial data sets are of great importance in various areas such as image processing, pattern recognition, remote sensing, and life sciences. For example, NASA alone is faced with huge data sets gathered from around the globe on a daily basis to help scientists better understand our planet. Many approaches for accurately clustering, interpolating, and integrating these data sets are very computationally expensive.
The focus of my PhD thesis is on the development of efficient implementations of data clustering and interpolation methods for large spatial data sets, and the application of these methods to geostatistics and remote sensing. In particular, I have developed fast implementations of ISODATA clustering and kriging interpolation algorithms. These implementations derive their efficiency through the use of both exact and approximate computational techniques from computational geometry and scientific computing.
My work on the ISODATA clustering algorithm employs the kd-tree data structure and the filtering algorithm to speed up distance and nearest neighbor calculations. In the case of kriging interpolation, I applied techniques from scientific computing including iterative methods, tapering, fast multipole methods, and nearest neighbor searching techniques. I also present an application of kriging interpolation method to the problem of data fusion of remotely sensed data
A visual workspace for constructing hybrid MDS algorithms and coordinating multiple views
Data can be distinguished according to volume, variable types and distribution, and each of these characteristics imposes constraints upon the choice of applicable algorithms for their visualisation. This has led to an abundance of often disparate algorithmic techniques. Previous work has shown that a hybrid algorithmic approach can be successful in addressing the impact of data volume on the feasibility of multidimensional scaling (MDS). This paper presents a system and framework in which a user can easily explore algorithms as well as their hybrid conjunctions and the data flowing through them. Visual programming and a novel algorithmic architecture let the user semi-automatically define data flows and the co-ordination of multiple views of algorithmic and visualisation components. We propose that our approach has two main benefits: significant improvements in run times of MDS algorithms can be achieved, and intermediate views of the data and the visualisation program structure can provide greater insight and control over the visualisation process
Interpolating point spread function anisotropy
Planned wide-field weak lensing surveys are expected to reduce the
statistical errors on the shear field to unprecedented levels. In contrast,
systematic errors like those induced by the convolution with the point spread
function (PSF) will not benefit from that scaling effect and will require very
accurate modeling and correction. While numerous methods have been devised to
carry out the PSF correction itself, modeling of the PSF shape and its spatial
variations across the instrument field of view has, so far, attracted much less
attention. This step is nevertheless crucial because the PSF is only known at
star positions while the correction has to be performed at any position on the
sky. A reliable interpolation scheme is therefore mandatory and a popular
approach has been to use low-order bivariate polynomials. In the present paper,
we evaluate four other classical spatial interpolation methods based on splines
(B-splines), inverse distance weighting (IDW), radial basis functions (RBF) and
ordinary Kriging (OK). These methods are tested on the Star-challenge part of
the GRavitational lEnsing Accuracy Testing 2010 (GREAT10) simulated data and
are compared with the classical polynomial fitting (Polyfit). We also test all
our interpolation methods independently of the way the PSF is modeled, by
interpolating the GREAT10 star fields themselves (i.e., the PSF parameters are
known exactly at star positions). We find in that case RBF to be the clear
winner, closely followed by the other local methods, IDW and OK. The global
methods, Polyfit and B-splines, are largely behind, especially in fields with
(ground-based) turbulent PSFs. In fields with non-turbulent PSFs, all
interpolators reach a variance on PSF systematics better than
the upper bound expected by future space-based surveys, with
the local interpolators performing better than the global ones
Machine Learning for Neuroimaging with Scikit-Learn
Statistical machine learning methods are increasingly used for neuroimaging
data analysis. Their main virtue is their ability to model high-dimensional
datasets, e.g. multivariate analysis of activation images or resting-state time
series. Supervised learning is typically used in decoding or encoding settings
to relate brain images to behavioral or clinical observations, while
unsupervised learning can uncover hidden structures in sets of images (e.g.
resting state functional MRI) or find sub-populations in large cohorts. By
considering different functional neuroimaging applications, we illustrate how
scikit-learn, a Python machine learning library, can be used to perform some
key analysis steps. Scikit-learn contains a very large set of statistical
learning algorithms, both supervised and unsupervised, and its application to
neuroimaging data provides a versatile tool to study the brain.Comment: Frontiers in neuroscience, Frontiers Research Foundation, 2013, pp.1
A virtual workspace for hybrid multidimensional scaling algorithms
In visualising multidimensional data, it is well known that different types of algorithms to process them. Data sets might be distinguished according to volume, variable types and distribution, and each of these characteristics imposes constraints upon the choice of applicable algorithms for their visualization. Previous work has shown that a hybrid algorithmic approach can be successful in addressing the impact of data volume on the feasibility of multidimensional scaling (MDS). This suggests that hybrid combinations of appropriate algorithms might also successfully address other characteristics of data. This paper presents a system and framework in which a user can easily explore hybrid algorithms and the data flowing through them. Visual programming and a novel algorithmic architecture let the user semi-automatically define data flows and the co-ordination of multiple views
Sparse optical flow regularisation for real-time visual tracking
Optical flow can greatly improve the robustness of visual tracking algorithms. While dense optical flow algorithms have various applications, they can not be used for real-time solutions without resorting to GPU calculations. Furthermore, most optical flow algorithms fail in challenging lighting environments due to the violation of the brightness constraint. We propose a simple but effective iterative regularisation scheme for real-time, sparse optical flow algorithms, that is shown to be robust to sudden illumination changes and can handle large displacements. The algorithm proves to outperform well known techniques in real life video sequences, while being much faster to calculate. Our solution increases the robustness of a real-time particle filter based tracking application, consuming only a fraction of the available CPU power. Furthermore, a new and realistic optical flow dataset with annotated ground truth is created and made freely available for research purposes
Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity
A general framework for solving image inverse problems is introduced in this
paper. The approach is based on Gaussian mixture models, estimated via a
computationally efficient MAP-EM algorithm. A dual mathematical interpretation
of the proposed framework with structured sparse estimation is described, which
shows that the resulting piecewise linear estimate stabilizes the estimation
when compared to traditional sparse inverse problem techniques. This
interpretation also suggests an effective dictionary motivated initialization
for the MAP-EM algorithm. We demonstrate that in a number of image inverse
problems, including inpainting, zooming, and deblurring, the same algorithm
produces either equal, often significantly better, or very small margin worse
results than the best published ones, at a lower computational cost.Comment: 30 page
A Multiscale Pyramid Transform for Graph Signals
Multiscale transforms designed to process analog and discrete-time signals
and images cannot be directly applied to analyze high-dimensional data residing
on the vertices of a weighted graph, as they do not capture the intrinsic
geometric structure of the underlying graph data domain. In this paper, we
adapt the Laplacian pyramid transform for signals on Euclidean domains so that
it can be used to analyze high-dimensional data residing on the vertices of a
weighted graph. Our approach is to study existing methods and develop new
methods for the four fundamental operations of graph downsampling, graph
reduction, and filtering and interpolation of signals on graphs. Equipped with
appropriate notions of these operations, we leverage the basic multiscale
constructs and intuitions from classical signal processing to generate a
transform that yields both a multiresolution of graphs and an associated
multiresolution of a graph signal on the underlying sequence of graphs.Comment: 16 pages, 13 figure
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
- …