729,094 research outputs found

    Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    No full text
    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable building blocks for the algorithm construction; they can encapsulate advanced algorithms and data structures. A symbolic-algebraic system is used to find closed-form solutions for problems and emerging subproblems. In this paper, we describe the application of AUTOBAYES to the analysis of planetary nebulae images taken by the Hubble Space Telescope. We explain the system architecture, and present in detail the automatic derivation of the scientists’ original analysis [1] as well as a refined analysis using clustering models. This study demonstrates that AUTOBAYES is now mature enough so that it can be applied to realistic scientific data analysis tasks

    How to Extract the Geometry and Topology from Very Large 3D Segmentations

    Full text link
    Segmentation is often an essential intermediate step in image analysis. A volume segmentation characterizes the underlying volume image in terms of geometric information--segments, faces between segments, curves in which several faces meet--as well as a topology on these objects. Existing algorithms encode this information in designated data structures, but require that these data structures fit entirely in Random Access Memory (RAM). Today, 3D images with several billion voxels are acquired, e.g. in structural neurobiology. Since these large volumes can no longer be processed with existing methods, we present a new algorithm which performs geometry and topology extraction with a runtime linear in the number of voxels and log-linear in the number of faces and curves. The parallelizable algorithm proceeds in a block-wise fashion and constructs a consistent representation of the entire volume image on the hard drive, making the structure of very large volume segmentations accessible to image analysis. The parallelized C++ source code, free command line tools and MATLAB mex files are avilable from http://hci.iwr.uni-heidelberg.de/software.phpComment: C++ source code, free command line tools and MATLAB mex files are avilable from http://hci.iwr.uni-heidelberg.de/software.ph

    Parallel processors and nonlinear structural dynamics algorithms and software

    Get PDF
    The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14

    Reliability analysis of laminated CMC components through shell subelement techniques

    Get PDF
    An updated version of the integrated design program Composite Ceramics Analysis and Reliability Evaluation of Structures (C/CARES) was developed for the reliability evaluation of ceramic matrix composites (CMC) laminated shell components. The algorithm is now split into two modules: a finite-element data interface program and a reliability evaluation algorithm. More flexibility is achieved, allowing for easy implementation with various finite-element programs. The interface program creates a neutral data base which is then read by the reliability module. This neutral data base concept allows easy data transfer between different computer systems. The new interface program from the finite-element code Matrix Automated Reduction and Coupling (MARC) also includes the option of using hybrid laminates (a combination of plies of different materials or different layups) and allows for variations in temperature fields throughout the component. In the current version of C/CARES, a subelement technique was implemented, enabling stress gradients within an element to be taken into account. The noninteractive reliability function is now evaluated at each Gaussian integration point instead of using averaging techniques. As a result of the increased number of stress evaluation points, considerable improvements in the accuracy of reliability analyses were realized

    MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package

    Get PDF
    Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in closed form. Markov chain Monte Carlo methods solve this problem by sampling from a series of simpler conditional distributions that can be evaluated. The R package MCMCglmm implements such an algorithm for a range of model fitting problems. More than one response variable can be analyzed simultaneously, and these variables are allowed to follow Gaussian, Poisson, multi(bi)nominal, exponential, zero-inflated and censored distributions. A range of variance structures are permitted for the random effects, including interactions with categorical or continuous variables (i.e., random regression), and more complicated variance structures that arise through shared ancestry, either through a pedigree or through a phylogeny. Missing values are permitted in the response variable(s) and data can be known up to some level of measurement error as in meta-analysis. All simu- lation is done in C/ C++ using the CSparse library for sparse linear systems.

    A PDE Approach to Data-driven Sub-Riemannian Geodesics in SE(2)

    Get PDF
    We present a new flexible wavefront propagation algorithm for the boundary value problem for sub-Riemannian (SR) geodesics in the roto-translation group SE(2)=R2S1SE(2) = \mathbb{R}^2 \rtimes S^1 with a metric tensor depending on a smooth external cost C:SE(2)[δ,1]\mathcal{C}:SE(2) \to [\delta,1], δ>0\delta>0, computed from image data. The method consists of a first step where a SR-distance map is computed as a viscosity solution of a Hamilton-Jacobi-Bellman (HJB) system derived via Pontryagin's Maximum Principle (PMP). Subsequent backward integration, again relying on PMP, gives the SR-geodesics. For C=1\mathcal{C}=1 we show that our method produces the global minimizers. Comparison with exact solutions shows a remarkable accuracy of the SR-spheres and the SR-geodesics. We present numerical computations of Maxwell points and cusp points, which we again verify for the uniform cost case C=1\mathcal{C}=1. Regarding image analysis applications, tracking of elongated structures in retinal and synthetic images show that our line tracking generically deals with crossings. We show the benefits of including the sub-Riemannian geometry.Comment: Extended version of SSVM 2015 conference article "Data-driven Sub-Riemannian Geodesics in SE(2)

    Consistency of random forests

    Get PDF
    Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

    Convex Constraint Decomposition of Circular Dichroism Curves of Proteins

    Get PDF
    A new algorithm, called convex analysis, has been developed to deduce the chiral contribution of the common secondary structures directly from experimental circular dichroism (CD) curves of a large number of proteins. The analysis is based on CD data reported by Yang et aU Test runs were performed on sets of artificial protein spectra created by the Monte Carlo technique using poly-u-Iysine based component spectra. Application of the decomposition algorithm for the created sets of spectra resulted in component spectra [B (2, i)] and weights [C (i, k)] with excellent Pearson correlation coefficients (r).2 The algori thm, independent of X-ray data, revealed that the CD spectrum of a given protein is composed of at least four independent sources of chirality. Three of the computed component curves show remarkable resemblance to the CD spectra of known protein secondary structures. This approach yields a significant improvement compared to the eigenvector analysis of Hennessey and Johnson." The new method is a useful tool not only in analyzing CD spectra but also in treating other decomposition problems where an additivity constraint is valid
    corecore