2,043 research outputs found

    AutoBayes: A System for Generating Data Analysis Programs from Statistical Models

    No full text
    Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schema-guided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes augments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. It is well-suited for tasks like estimating best-fitting model parameters for the given data. Here, we describe AutoBayes's system architecture, in particular the schema-guided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks

    Synthesizing Certified Code

    No full text
    Code certification is a lightweight approach for formally demonstrating software quality. Its basic idea is to require code producers to provide formal proofs that their code satisfies certain quality properties. These proofs serve as certificates that can be checked independently. Since code certification uses the same underlying technology as program verification, it requires detailed annotations (e.g., loop invariants) to make the proofs possible. However, manually adding annotations to the code is time-consuming and error-prone. We address this problem by combining code certification with automatic program synthesis. Given a high-level specification, our approach simultaneously generates code and all annotations required to certify the generated code. We describe a certification extension of AutoBayes, a synthesis tool for automatically generating data analysis programs. Based on built-in domain knowledge, proof annotations are added and used to generate proof obligations that are discharged by the automated theorem prover E-SETHEO. We demonstrate our approach by certifying operator- and memory-safety on a data-classification program. For this program, our approach was faster and more precise than PolySpace, a commercial static analysis tool

    Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    No full text
    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable building blocks for the algorithm construction; they can encapsulate advanced algorithms and data structures. A symbolic-algebraic system is used to find closed-form solutions for problems and emerging subproblems. In this paper, we describe the application of AUTOBAYES to the analysis of planetary nebulae images taken by the Hubble Space Telescope. We explain the system architecture, and present in detail the automatic derivation of the scientists’ original analysis [1] as well as a refined analysis using clustering models. This study demonstrates that AUTOBAYES is now mature enough so that it can be applied to realistic scientific data analysis tasks

    Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

    No full text
    Machine learning has reached a point where many probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab toolboxes or even tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without new programming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems. We present automatically-derived algorithms ranging from closed-form solutions of Bayesian textbook problems to recently-proposed EM algorithms for clustering, regression, and a multinomial form of PCA

    Entwicklung von Diagnose- und Förderkompetenz in Mathematik

    Get PDF
    Das vom Stifterverband für die Deutsche Wissenschaft prämierte Modellvorhaben Lehrerausbildung im Verbundprojekt OLAW widmet sich dem Aufbau diagnostischer Kompetenz bei angehenden Lehrkräften. Im Mittelpunkt stehen Leistungsfeststellung, Lernprozessanalyse und Förderdiagnostik. Eine Zwischenevaluation ließ Fortschritte in diagnostischer Kompetenz deutlich werden. Es zeigen sich aber auch grundsätzliche Probleme, die mit spezifischen Veranstaltungen zur Diagnostik allein nicht zu bewältigen sind

    Long range corrections in liquid-vapor interface simulations

    Get PDF
    Long range corrections (lrc) for the potential energy and for the force in planar liquid-vapor interface simulations are considered for spherically symmetric interactions. First, it is stated that for the Lennard-Jones (LJ) fluid the lrc for the energy Δu of Janeček [J. Phys. Chem. B 110, 6264 (2006)] is the same as that of Lotfi et al. [Mol. Simul. 5, 233 (1990)]. Second, we present the lrc for the force ΔF for any spherically symmetric interaction as a derivative of Δu plus a surface integral over the cut-off sphere by using the extended Leibniz rule of Flanders [Am. Math. Monthly 80, 615 (1973)]. This ΔF corrects the incomplete lrc Δ1F of Lotfi et al. and agrees with the result of Janeček obtained by direct averaging of the forces. Third, we show that the molecular dynamics (MD) results for the surface tension γ of the LJ fluid with size parameter σ obtained by Werth et al. [Physica A 392, 2359 (2013)] with the lrc ΔF of Janeček and a cut-off radius rc = 3σ agree with the results of Mecke et al. [J. Chem. Phys. 107, 9264 (1997)] obtained with the lrc Δ1F of Lotfi et al. and rc = 6.5σ within −0.4% to +1.6%. Moreover, using only the MD results for γ of Werth et al., we obtain for the LJ fluid a new surface tension correlation which also represents the γ-values of Mecke et al. within ±0.7%. The critical temperature resulting from the correlation is Tc = 1.317 66 and is in very good agreement with Tc,ref = 1.32 of the reference equation of state for the LJ fluid given by Thol et al. [J. Phys. Chem. Ref. Data 45, 023101 (2016)]

    Bicropping-Verfahren von Winterungen (Getreide und Raps) mit Beisaaten abfrierender Körnerleguminosen

    Get PDF
    Bicropping systems based on forage legumes undersown in winter cereals can achieve yield and quality improvements with sufficient water supply during the main growth phase, but fail under dry site conditions (continental climate, sandy soil). To improve the N-supply without the risk of interspecific water and nutrient competition, bicropping systems were developed with simultaneously or parallel intercropped nonhardy grain legumes at in early sown winter cereals or oil seed rape. They were tested in practice with on-farm experiments carried out in Brandenburg and Bavaria. Besides a sufficient erosion protection, a significant N-input can be achieved with grain legumes in autumn especially within rape and early sown winter cereals. But only cereals showed a significant increase of yield and quality. Lodging frozen biomass (e.g. pea) can cause yield losses in rape

    Counting Finite Topologies

    Full text link
    In this paper we study the number of finite topologies on an nn-element set subject to various restrictions.Comment: 12 page
    corecore