PALSE: Python Analysis of Large Scale (Computer) Experiments

Abstract

A tenet of Science is the ability to reproduce the results, and a related issue is the possibility to archive and interpret the raw results of (computer) experiments. This paper presents an elementary python framework addressing this latter goal. Consider a computing pipeline consisting of raw data generation, raw data parsing, and data analysis i.e. graphical and statistical analysis. palse addresses these last two steps by leveraging the hierarchical structure of XML documents. More precisely, assume that the raw results of a program are stored in XML format, possibly generated by the serialization mechanism of the boost C++ libraries. For raw data parsing, palse imports the raw data as XML documents, and exploits the tree structure of the XML together with the XML Path Language to access and select specific values. For graphical and statistical analysis, palse gives direct access to ScientificPython, R, and gnuplot. In a nutshell, palse combines standards languages ( python, XML, XML Path Language) and tools (Boost serialization, ScientificPython, R, gnuplot) in such a way that once the raw data have been generated, graphical plots and statistical analysis just require a handful of lines of python code. The framework applies to virtually any type of data, and may find a broad class of applications

    Similar works