729,094 research outputs found
Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond
AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable building blocks for the algorithm construction; they can encapsulate advanced algorithms and data structures. A symbolic-algebraic system is used to find closed-form solutions for problems and emerging subproblems. In this paper, we describe the application of AUTOBAYES to the analysis of planetary nebulae images taken by the Hubble Space Telescope. We explain the system architecture, and present in detail the automatic derivation of the scientists’ original analysis [1] as well as a refined analysis using clustering models. This study demonstrates that AUTOBAYES is now mature enough so that it can be applied to realistic scientific data analysis tasks
How to Extract the Geometry and Topology from Very Large 3D Segmentations
Segmentation is often an essential intermediate step in image analysis. A
volume segmentation characterizes the underlying volume image in terms of
geometric information--segments, faces between segments, curves in which
several faces meet--as well as a topology on these objects. Existing algorithms
encode this information in designated data structures, but require that these
data structures fit entirely in Random Access Memory (RAM). Today, 3D images
with several billion voxels are acquired, e.g. in structural neurobiology.
Since these large volumes can no longer be processed with existing methods, we
present a new algorithm which performs geometry and topology extraction with a
runtime linear in the number of voxels and log-linear in the number of faces
and curves. The parallelizable algorithm proceeds in a block-wise fashion and
constructs a consistent representation of the entire volume image on the hard
drive, making the structure of very large volume segmentations accessible to
image analysis. The parallelized C++ source code, free command line tools and
MATLAB mex files are avilable from
http://hci.iwr.uni-heidelberg.de/software.phpComment: C++ source code, free command line tools and MATLAB mex files are
avilable from http://hci.iwr.uni-heidelberg.de/software.ph
Parallel processors and nonlinear structural dynamics algorithms and software
The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14
Reliability analysis of laminated CMC components through shell subelement techniques
An updated version of the integrated design program Composite Ceramics Analysis and Reliability Evaluation of Structures (C/CARES) was developed for the reliability evaluation of ceramic matrix composites (CMC) laminated shell components. The algorithm is now split into two modules: a finite-element data interface program and a reliability evaluation algorithm. More flexibility is achieved, allowing for easy implementation with various finite-element programs. The interface program creates a neutral data base which is then read by the reliability module. This neutral data base concept allows easy data transfer between different computer systems. The new interface program from the finite-element code Matrix Automated Reduction and Coupling (MARC) also includes the option of using hybrid laminates (a combination of plies of different materials or different layups) and allows for variations in temperature fields throughout the component. In the current version of C/CARES, a subelement technique was implemented, enabling stress gradients within an element to be taken into account. The noninteractive reliability function is now evaluated at each Gaussian integration point instead of using averaging techniques. As a result of the increased number of stress evaluation points, considerable improvements in the accuracy of reliability analyses were realized
MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package
Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in closed form. Markov chain Monte Carlo methods solve this problem by sampling from a series of simpler conditional distributions that can be evaluated. The R package MCMCglmm implements such an algorithm for a range of model fitting problems. More than one response variable can be analyzed simultaneously, and these variables are allowed to follow Gaussian, Poisson, multi(bi)nominal, exponential, zero-inflated and censored distributions. A range of variance structures are permitted for the random effects, including interactions with categorical or continuous variables (i.e., random regression), and more complicated variance structures that arise through shared ancestry, either through a pedigree or through a phylogeny. Missing values are permitted in the response variable(s) and data can be known up to some level of measurement error as in meta-analysis. All simu- lation is done in C/ C++ using the CSparse library for sparse linear systems.
A PDE Approach to Data-driven Sub-Riemannian Geodesics in SE(2)
We present a new flexible wavefront propagation algorithm for the boundary
value problem for sub-Riemannian (SR) geodesics in the roto-translation group
with a metric tensor depending on a smooth
external cost , , computed from
image data. The method consists of a first step where a SR-distance map is
computed as a viscosity solution of a Hamilton-Jacobi-Bellman (HJB) system
derived via Pontryagin's Maximum Principle (PMP). Subsequent backward
integration, again relying on PMP, gives the SR-geodesics. For
we show that our method produces the global minimizers. Comparison with exact
solutions shows a remarkable accuracy of the SR-spheres and the SR-geodesics.
We present numerical computations of Maxwell points and cusp points, which we
again verify for the uniform cost case . Regarding image
analysis applications, tracking of elongated structures in retinal and
synthetic images show that our line tracking generically deals with crossings.
We show the benefits of including the sub-Riemannian geometry.Comment: Extended version of SSVM 2015 conference article "Data-driven
Sub-Riemannian Geodesics in SE(2)
Consistency of random forests
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45
(2001) 5--32] that combines several randomized decision trees and aggregates
their predictions by averaging. Despite its wide usage and outstanding
practical performance, little is known about the mathematical properties of the
procedure. This disparity between theory and practice originates in the
difficulty to simultaneously analyze both the randomization process and the
highly data-dependent tree structure. In the present paper, we take a step
forward in forest exploration by proving a consistency result for Breiman's
[Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive
regression models. Our analysis also sheds an interesting light on how random
forests can nicely adapt to sparsity. 1. Introduction. Random forests are an
ensemble learning method for classification and regression that constructs a
number of randomized decision trees during the training phase and predicts by
averaging the results. Since its publication in the seminal paper of Breiman
(2001), the procedure has become a major data analysis tool, that performs well
in practice in comparison with many standard methods. What has greatly
contributed to the popularity of forests is the fact that they can be applied
to a wide range of prediction problems and have few parameters to tune. Aside
from being simple to use, the method is generally recognized for its accuracy
and its ability to deal with small sample sizes, high-dimensional feature
spaces and complex data structures. The random forest methodology has been
successfully involved in many practical problems, including air quality
prediction (winning code of the EMC data science global hackathon in 2012, see
http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al.
(2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3
Convex Constraint Decomposition of Circular Dichroism Curves of Proteins
A new algorithm, called convex analysis, has been developed
to deduce the chiral contribution of the common secondary structures
directly from experimental circular dichroism (CD) curves
of a large number of proteins. The analysis is based on CD data
reported by Yang et aU Test runs were performed on sets of artificial
protein spectra created by the Monte Carlo technique using
poly-u-Iysine based component spectra. Application of the decomposition algorithm for the created sets of spectra resulted in
component spectra [B (2, i)] and weights [C (i, k)] with excellent
Pearson correlation coefficients (r).2 The algori thm, independent
of X-ray data, revealed that the CD spectrum of a given protein
is composed of at least four independent sources of chirality. Three
of the computed component curves show remarkable resemblance
to the CD spectra of known protein secondary structures. This
approach yields a significant improvement compared to the eigenvector analysis of Hennessey and Johnson." The new method is a useful tool not only in analyzing CD spectra but also in treating
other decomposition problems where an additivity constraint is
valid
- …