6 research outputs found
Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
Natural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly available specifications which facilitate data sharing and reproducible research. Hierarchical Data Format 5 (HDF5) is a popular open format widely used in neuroscience, often as a foundation for other, more specialized formats. However, drawbacks related to HDF5's complex specification have initiated a discussion for an improved replacement. We propose a novel alternative, the Experimental Directory Structure (Exdir), an open specification for data storage in experimental pipelines which amends drawbacks associated with HDF5 while retaining its advantages. HDF5 stores data and metadata in a hierarchy within a complex binary file which, among other things, is not human-readable, not optimal for version control systems, and lacks support for easy access to raw data from external applications. Exdir, on the other hand, uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a specification for organizing files in a directory structure. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model. Several research groups are already using data stored in a directory hierarchy as an alternative to HDF5, but no common standard exists. This complicates and limits the opportunity for data sharing and development of common tools for reading, writing, and analyzing data. Exdir facilitates improved data storage, data sharing, reproducible research, and novel insight from interdisciplinary collaboration. With the publication of Exdir, we invite the scientific community to join the development to create an open specification that will serve as many needs as possible and as a foundation for open access to and exchange of data
Uncertainpy: A Python Toolbox for Uncertainty Quantification and Sensitivity Analysis in Computational Neuroscience
Computational models in neuroscience typically contain many parameters that are poorly constrained by experimental data. Uncertainty quantification and sensitivity analysis provide rigorous procedures to quantify how the model output depends on this parameter uncertainty. Unfortunately, the application of such methods is not yet standard within the field of neuroscience. Here we present Uncertainpy, an open-source Python toolbox, tailored to perform uncertainty quantification and sensitivity analysis of neuroscience models. Uncertainpy aims to make it quick and easy to get started with uncertainty analysis, without any need for detailed prior knowledge. The toolbox allows uncertainty quantification and sensitivity analysis to be performed on already existing models without needing to modify the model equations or model implementation. Uncertainpy bases its analysis on polynomial chaos expansions, which are more efficient than the more standard Monte-Carlo based approaches. Uncertainpy is tailored for neuroscience applications by its built-in capability for calculating characteristic features in the model output. The toolbox does not merely perform a point-to-point comparison of the “raw” model output (e.g., membrane voltage traces), but can also calculate the uncertainty and sensitivity of salient model response features such as spike timing, action potential width, average interspike interval, and other features relevant for various neural and neural network models. Uncertainpy comes with several common models and features built in, and including custom models and new features is easy. The aim of the current paper is to present Uncertainpy to the neuroscience community in a user-oriented manner. To demonstrate its broad applicability, we perform an uncertainty quantification and sensitivity analysis of three case studies relevant for neuroscience: the original Hodgkin-Huxley point-neuron model for action potential generation, a multi-compartmental model of a thalamic interneuron implemented in the NEURON simulator, and a sparsely connected recurrent network model implemented in the NEST simulator
Intelligent Radio Spectrum Monitoring
[EN] Spectrum monitoring is an important part of the radio spectrum management
process, providing feedback on the workflow that allows for our current wirelessly
interconnected lifestyle. The constantly increasing number of users and uses of wireless
technologies is pushing the limits and capabilities of the existing infrastructure,
demanding new alternatives to manage and analyse the extremely large volume of data
produced by existing spectrum monitoring networks. This study addresses this problem
by proposing an information management system architecture able to increase the
analytical level of a spectrum monitoring measurement network. This proposal includes
an alternative to manage the data produced by such network, methods to analyse the
spectrum data and to automate the data gathering process. The study was conducted
employing system requirements from the Brazilian National Telecommunications
Agency and related functional concepts were aggregated from the reviewed scientific
literature and publications from the International Telecommunication Union. The
proposed solution employs microservice architecture to manage the data, including tasks
such as format conversion, analysis, optimization and automation. To enable efficient
data exchange between services, we proposed the use of a hierarchical structure created
using the HDF5 format. The suggested architecture was partially implemented as a pilot
project, which allowed to demonstrate the viability of presented ideas and perform an
initial refinement of the proposed data format and analytical algorithms. The results
pointed to the potential of the solution to solve some of the limitations of the existing
spectrum monitoring workflow. The proposed system may play a crucial role in the
integration of the spectrum monitoring activities into open data initiatives, promoting
transparency and data reusability for this important public service.[ES] El control y análisis de uso del espectro electromagnético, un servicio conocido como
comprobación técnica del espectro, es una parte importante del proceso de gestión del espectro
de radiofrecuencias, ya que proporciona la información necesaria al flujo de trabajo que permite
nuestro estilo de vida actual, interconectado e inalámbrico. El número cada vez más grande de
usuarios y el creciente uso de las tecnologías inalámbricas amplían las demandas sobre la
infraestructura existente, exigiendo nuevas alternativas para administrar y analizar el gran
volumen de datos producidos por las estaciones de medición del espectro. Este estudio aborda
este problema al proponer una arquitectura de sistema para la gestión de información capaz de
aumentar la capacidad de análisis de una red de equipos de medición dedicados a la comprobación
técnica del espectro. Esta propuesta incluye una alternativa para administrar los datos producidos
por dicha red, métodos para analizar los datos recolectados, así como una propuesta para
automatizar el proceso de recopilación. El estudio se realizó teniendo como referencia los
requisitos de la Agencia Nacional de Telecomunicaciones de Brasil, siendo considerados
adicionalmente requisitos funcionales relacionados descritos en la literatura científica y en las
publicaciones de la Unión Internacional de Telecomunicaciones. La solución propuesta emplea
una arquitectura de microservicios para la administración de datos, incluyendo tareas como la
conversión de formatos, análisis, optimización y automatización. Para permitir el intercambio
eficiente de datos entre servicios, sugerimos el uso de una estructura jerárquica creada usando el
formato HDF5. Esta arquitectura se implementó parcialmente dentro de un proyecto piloto, que
permitió demostrar la viabilidad de las ideas presentadas, realizar mejoras en el formato de datos
propuesto y en los algoritmos analíticos. Los resultados señalaron el potencial de la solución para
resolver algunas de las limitaciones del tradicional flujo de trabajo de comprobación técnica del
espectro. La utilización del sistema propuesto puede mejorar la integración de las actividades e
impulsar iniciativas de datos abiertos, promoviendo la transparencia y la reutilización de datos
generados por este importante servicio público[CA] El control i anàlisi d'ús de l'espectre electromagnètic, un servei conegut com a
comprovació tècnica de l'espectre, és una part important del procés de gestió de
l'espectre de radiofreqüències, ja que proporciona la informació necessària al flux de
treball que permet el nostre estil de vida actual, interconnectat i sense fils. El número
cada vegada més gran d'usuaris i el creixent ús de les tecnologies sense fils amplien la
demanda sobre la infraestructura existent, exigint noves alternatives per a administrar i
analitzar el gran volum de dades produïdes per les xarxes d'estacions de mesurament.
Aquest estudi aborda aquest problema en proposar una arquitectura de sistema per a la
gestió d'informació capaç d’augmentar la capacitat d’anàlisi d'una xarxa d'equips de
mesurament dedicats a la comprovació tècnica de l'espectre. Aquesta proposta inclou
una alternativa per a administrar les dades produïdes per aquesta xarxa, mètodes per a
analitzar les dades recol·lectades, així com una proposta per a automatitzar el procés de
recopilació. L'estudi es va realitzar tenint com a referència els requisits de l'Agència
Nacional de Telecomunicacions del Brasil, sent considerats addicionalment requisits
funcionals relacionats descrits en la literatura científica i en les publicacions de la Unió
Internacional de Telecomunicacions. La solució proposada empra una arquitectura de
microserveis per a l'administració de dades, incloent tasques com la conversió de
formats, anàlisi, optimització i automatització. Per a permetre l'intercanvi eficient de
dades entre serveis, suggerim l'ús d'una estructura jeràrquica creada usant el format
HDF5. Aquesta arquitectura es va implementar parcialment dins d'un projecte pilot, que
va permetre demostrar la viabilitat de les idees presentades, realitzar millores en el
format de dades proposat i en els algorismes analítics. Els resultats van assenyalar el
potencial de la solució per a resoldre algunes de les limitacions del tradicional flux de
treball de comprovació tècnica de l'espectre. La utilització del sistema proposat pot
millorar la integració de les activitats i impulsar iniciatives de dades obertes, promovent
la transparència i la reutilització de dades generades per aquest important servei públicSantos Lobão, F. (2019). Intelligent Radio Spectrum Monitoring. http://hdl.handle.net/10251/128850TFG
Scalable software and models for large-scale extracellular recordings
The brain represents information about the world through the electrical activity of
populations of neurons. By placing an electrode near a neuron that is firing (spiking), it
is possible to detect the resulting extracellular action potential (EAP) that is transmitted
down an axon to other neurons. In this way, it is possible to monitor the communication
of a group of neurons to uncover how they encode and transmit information. As the
number of recorded neurons continues to increase, however, so do the data processing
and analysis challenges. It is crucial that scalable software and analysis tools are developed
and made available to the neuroscience community to keep up with the large
amounts of data that are already being gathered.
This thesis is composed of three pieces of work which I develop in order to better
process and analyze large-scale extracellular recordings. My work spans all stages of extracellular
analysis from the processing of raw electrical recordings to the development
of statistical models to reveal underlying structure in neural population activity.
In the first work, I focus on developing software to improve the comparison and adoption
of different computational approaches for spike sorting. When analyzing neural
recordings, most researchers are interested in the spiking activity of individual neurons,
which must be extracted from the raw electrical traces through a process called
spike sorting. Much development has been directed towards improving the performance
and automation of spike sorting. This continuous development, while essential,
has contributed to an over-saturation of new, incompatible tools that hinders rigorous
benchmarking and complicates reproducible analysis. To address these limitations, I
develop SpikeInterface, an open-source, Python framework designed to unify preexisting
spike sorting technologies into a single toolkit and to facilitate straightforward
benchmarking of different approaches. With this framework, I demonstrate that modern,
automated spike sorters have low agreement when analyzing the same dataset, i.e.
they find different numbers of neurons with different activity profiles; This result holds
true for a variety of simulated and real datasets. Also, I demonstrate that utilizing a
consensus-based approach to spike sorting, where the outputs of multiple spike sorters
are combined, can dramatically reduce the number of falsely detected neurons.
In the second work, I focus on developing an unsupervised machine learning approach
for determining the source location of individually detected spikes that are
recorded by high-density, microelectrode arrays. By localizing the source of individual
spikes, my method is able to determine the approximate position of the recorded neuriii
ons in relation to the microelectrode array. To allow my model to work with large-scale
datasets, I utilize deep neural networks, a family of machine learning algorithms that
can be trained to approximate complicated functions in a scalable fashion. I evaluate
my method on both simulated and real extracellular datasets, demonstrating that it is
more accurate than other commonly used methods. Also, I show that location estimates
for individual spikes can be utilized to improve the efficiency and accuracy of spike
sorting. After training, my method allows for localization of one million spikes in approximately
37 seconds on a TITAN X GPU, enabling real-time analysis of massive
extracellular datasets.
In my third and final presented work, I focus on developing an unsupervised machine
learning model that can uncover patterns of activity from neural populations
associated with a behaviour being performed. Specifically, I introduce Targeted Neural
Dynamical Modelling (TNDM), a statistical model that jointly models the neural activity
and any external behavioural variables. TNDM decomposes neural dynamics (i.e.
temporal activity patterns) into behaviourally relevant and behaviourally irrelevant dynamics;
the behaviourally relevant dynamics constitute all activity patterns required
to generate the behaviour of interest while behaviourally irrelevant dynamics may be
completely unrelated (e.g. other behavioural or brain states), or even related to behaviour
execution (e.g. dynamics that are associated with behaviour generally but are not
task specific). Again, I implement TNDM using a deep neural network to improve its
scalability and expressivity. On synthetic data and on real recordings from the premotor
(PMd) and primary motor cortex (M1) of a monkey performing a center-out reaching
task, I show that TNDM is able to extract low-dimensional neural dynamics that are
highly predictive of behaviour without sacrificing its fit to the neural data