36 research outputs found
Electronic Laboratory Notebook on Web2py Framework
Proper experimental record-keeping is an important cornerstone in research and development for the purpose of auditing. The gold standard of record-keeping is based on the judicious use of physical, permanent notebooks. However, advances in technology had resulted in large amounts of electronic records making it virtually impossible to maintain a full set of records in physical notebooks. Electronic laboratory notebook systems aim to meet the stringency for keeping records electronically. This manuscript describes CyNote which is an electronic laboratory notebook system that is compliant with 21 CFP Part 11 controls on electronic records, requirements set by USA Food and Drug Administration for electronic records. CyNote is implemented on web2py framework and is adhering to the architectural paradigm of model-view-controller (MVC), allowing for extension modules to be built for CyNote. CyNote is available at http://cynote.sf.net
PyCon Singapore 2013
Python Conference (PyCon) is a series of community-based conference where Pythonistas gathers and exchange updates and experience on various topics related to Python programming language. Singapore has been hosting PyCon APAC, PyCon to represent the region until 2012 when Singapore handed over to Tokyo the 2013 event
Principles for data analysis workflows
Traditional data science education often omits training on research
workflows: the process that moves a scientific investigation from raw data to
coherent research question to insightful contribution. In this paper, we
elaborate basic principles of a reproducible data analysis workflow by defining
three phases: the Exploratory, Refinement, and Polishing Phases. Each workflow
phase is roughly centered around the audience to whom research decisions,
methodologies, and results are being immediately communicated. Importantly,
each phase can also give rise to a number of research products beyond
traditional academic publications. Where relevant, we draw analogies between
principles for data-intensive research workflows and established practice in
software development. The guidance provided here is not intended to be a strict
rulebook; rather, the suggestions for practices and tools to advance
reproducible, sound data-intensive analysis may furnish support for both
students and current professionals
Hardware-accelerated interactive data visualization for neuroscience in Python.
Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intuition, discover unexpected patterns, and find guidance about subsequent analysis steps. Existing visualization tools mostly focus on static publication-quality figures and do not support interactive visualization of large datasets. While working on Python software for visualization of neurophysiological data, we developed techniques to leverage the computational power of modern graphics cards for high-performance interactive data visualization. We were able to achieve very high performance despite the interpreted and dynamic nature of Python, by using state-of-the-art, fast libraries such as NumPy, PyOpenGL, and PyTables. We present applications of these methods to visualization of neurophysiological data. We believe our tools will be useful in a broad range of domains, in neuroscience and beyond, where there is an increasing need for scalable and fast interactive visualization
Dias: Dynamic Rewriting of Pandas Code
In recent years, dataframe libraries, such as pandas have exploded in
popularity. Due to their flexibility, they are increasingly used in ad-hoc
exploratory data analysis (EDA) workloads. These workloads are diverse,
including custom functions which can span libraries or be written in pure
Python. The majority of systems available to accelerate EDA workloads focus on
bulk-parallel workloads, which contain vastly different computational patterns,
typically within a single library. As a result, they can introduce excessive
overheads for ad-hoc EDA workloads due to their expensive optimization
techniques. Instead, we identify program rewriting as a lightweight technique
which can offer substantial speedups while also avoiding slowdowns. We
implemented our techniques in Dias, which rewrites notebook cells to be more
efficient for ad-hoc EDA workloads. We develop techniques for efficient
rewrites in Dias, including dynamic checking of preconditions under which
rewrites are correct and just-in-time rewrites for notebook environments. We
show that Dias can rewrite individual cells to be 57 faster compared to
pandas and 1909 faster compared to optimized systems such as modin.
Furthermore, Dias can accelerate whole notebooks by up to 3.6 compared
to pandas and 26.4 compared to modin.Comment: 16 pages, 22 figure
tsdownsample: high-performance time series downsampling for scalable visualization
Interactive line chart visualizations greatly enhance the effective
exploration of large time series. Although downsampling has emerged as a
well-established approach to enable efficient interactive visualization of
large datasets, it is not an inherent feature in most visualization tools.
Furthermore, there is no library offering a convenient interface for
high-performance implementations of prominent downsampling algorithms. To
address these shortcomings, we present tsdownsample, an open-source Python
package specifically designed for CPU-based, in-memory time series
downsampling. Our library focuses on performance and convenient integration,
offering optimized implementations of leading downsampling algorithms. We
achieve this optimization by leveraging low-level SIMD instructions and
multithreading capabilities in Rust. In particular, SIMD instructions were
employed to optimize the argmin and argmax operations. This SIMD optimization,
along with some algorithmic tricks, proved crucial in enhancing the performance
of various downsampling algorithms. We evaluate the performance of tsdownsample
and demonstrate its interoperability with an established visualization
framework. Our performance benchmarks indicate that the algorithmic runtime of
tsdownsample approximates the CPU's memory bandwidth. This work marks a
significant advancement in bringing high-performance time series downsampling
to the Python ecosystem, enabling scalable visualization. The open-source code
can be found at https://github.com/predict-idlab/tsdownsampleComment: Submitted to Software
Python Programmers Have GPUs Too: Automatic Python Loop Parallelization with Staged Dependence Analysis
Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Auto-parallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers.
Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour ‘in the wild’ is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical dependence analysis techniques, then leverage the Python runtime’s rich introspection capabilities to resolve additional loop bounds and variable types in a just-in-time manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks