Search CORE

36 research outputs found

Electronic Laboratory Notebook on Web2py Framework

Author
Publication venue: The Python Papers Monograph
Publication date
Field of study

Proper experimental record-keeping is an important cornerstone in research and development for the purpose of auditing. The gold standard of record-keeping is based on the judicious use of physical, permanent notebooks. However, advances in technology had resulted in large amounts of electronic records making it virtually impossible to maintain a full set of records in physical notebooks. Electronic laboratory notebook systems aim to meet the stringency for keeping records electronically. This manuscript describes CyNote which is an electronic laboratory notebook system that is compliant with 21 CFP Part 11 controls on electronic records, requirements set by USA Food and Drug Administration for electronic records. CyNote is implemented on web2py framework and is adhering to the architectural paradigm of model-view-controller (MVC), allowing for extension modules to be built for CyNote. CyNote is available at http://cynote.sf.net

The Python Papers Anthology

PyCon Singapore 2013

Author: Kamiya Ryoichiro
Publication venue: The Python Papers
Publication date
Field of study

Python Conference (PyCon) is a series of community-based conference where Pythonistas gathers and exchange updates and experience on various topics related to Python programming language. Singapore has been hosting PyCon APAC, PyCon to represent the region until 2012 when Singapore handed over to Tokyo the 2013 event

The Python Papers Anthology

Principles for data analysis workflows

Author: Martinez Ciera C.
Stoudt Sara
Vasquez Valeri N.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/07/2020
Field of study

Traditional data science education often omits training on research workflows: the process that moves a scientific investigation from raw data to coherent research question to insightful contribution. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining three phases: the Exploratory, Refinement, and Polishing Phases. Each workflow phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between principles for data-intensive research workflows and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students and current professionals

arXiv.org e-Print Archive

Directory of Open Access Journals

Hardware-accelerated interactive data visualization for neuroscience in Python.

Author: Harris KD
Rossant C
Publication venue
Publication date: 01/01/2013
Field of study

Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intuition, discover unexpected patterns, and find guidance about subsequent analysis steps. Existing visualization tools mostly focus on static publication-quality figures and do not support interactive visualization of large datasets. While working on Python software for visualization of neurophysiological data, we developed techniques to leverage the computational power of modern graphics cards for high-performance interactive data visualization. We were able to achieve very high performance despite the interpreted and dynamic nature of Python, by using state-of-the-art, fast libraries such as NumPy, PyOpenGL, and PyTables. We present applications of these methods to visualization of neurophysiological data. We believe our tools will be useful in a broad range of domains, in neuroscience and beyond, where there is an increasing need for scalable and fast interactive visualization

UCL Discovery

PubMed Central

Frontiers - Publisher Connector

BMKT 694.V60: Telling Stories with Data

Author: Chandler john
Publication venue: ScholarWorks at University of Montana
Publication date: 01/09/2022
Field of study

University of Montana

BMKT 482.01: Telling Stories with Data

Author: Chandler John W.
Publication venue: ScholarWorks at University of Montana
Publication date: 01/09/2022
Field of study

University of Montana

Dias: Dynamic Rewriting of Pandas Code

Author: Baziotis Stefanos
Kang Daniel
Mendis Charith
Publication venue
Publication date: 28/03/2023
Field of study

In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57

\times

faster compared to pandas and 1909

\times

faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6

\times

compared to pandas and 26.4

\times

compared to modin.Comment: 16 pages, 22 figure

arXiv.org e-Print Archive

tsdownsample: high-performance time series downsampling for scalable visualization

Author: Van Der Donckt Jeroen
Van Der Donckt Jonas
Van Hoecke Sofie
Publication venue
Publication date: 05/07/2023
Field of study

Interactive line chart visualizations greatly enhance the effective exploration of large time series. Although downsampling has emerged as a well-established approach to enable efficient interactive visualization of large datasets, it is not an inherent feature in most visualization tools. Furthermore, there is no library offering a convenient interface for high-performance implementations of prominent downsampling algorithms. To address these shortcomings, we present tsdownsample, an open-source Python package specifically designed for CPU-based, in-memory time series downsampling. Our library focuses on performance and convenient integration, offering optimized implementations of leading downsampling algorithms. We achieve this optimization by leveraging low-level SIMD instructions and multithreading capabilities in Rust. In particular, SIMD instructions were employed to optimize the argmin and argmax operations. This SIMD optimization, along with some algorithmic tricks, proved crucial in enhancing the performance of various downsampling algorithms. We evaluate the performance of tsdownsample and demonstrate its interoperability with an established visualization framework. Our performance benchmarks indicate that the algorithmic runtime of tsdownsample approximates the CPU's memory bandwidth. This work marks a significant advancement in bringing high-performance time series downsampling to the Python ecosystem, enabling scalable visualization. The open-source code can be found at https://github.com/predict-idlab/tsdownsampleComment: Submitted to Software

arXiv.org e-Print Archive

Python Programmers Have GPUs Too: Automatic Python Loop Parallelization with Staged Dependence Analysis

Author: Jacob Dejice
Trinder Phil
Singer Jeremy
Publication venue
Publication date: 20/10/2019
Field of study

Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Auto-parallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers. Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour ‘in the wild’ is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical dependence analysis techniques, then leverage the Python runtime’s rich introspection capabilities to resolve additional loop bounds and variable types in a just-in-time manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks

OPUS Augsburg

Enlighten