361 research outputs found
A common goodness-of-fit framework for neural population models using marked point process time-rescaling
A critical component of any statistical modeling procedure is the ability to assess the goodness-of-fit between a model and observed data. For spike train models of individual neurons, many goodness-of-fit measures rely on the time-rescaling theorem and assess model quality using rescaled spike times. Recently, there has been increasing interest in statistical models that describe the simultaneous spiking activity of neuron populations, either in a single brain region or across brain regions. Classically, such models have used spike sorted data to describe relationships between the identified neurons, but more recently clusterless modeling methods have been used to describe population activity using a single model. Here we develop a generalization of the time-rescaling theorem that enables comprehensive goodness-of-fit analysis for either of these classes of population models. We use the theory of marked point processes to model population spiking activity, and show that under the correct model, each spike can be rescaled individually to generate a uniformly distributed set of events in time and the space of spike marks. After rescaling, multiple well-established goodness-of-fit procedures and statistical tests are available. We demonstrate the application of these methods both to simulated data and real population spiking in rat hippocampus. We have made the MATLAB and Python code used for the analyses in this paper publicly available through our Github repository at https://github.com/Eden-Kramer-Lab/popTRT.This work was supported by grants from the NIH (MH105174, NS094288) and the Simons Foundation (542971). (MH105174 - NIH; NS094288 - NIH; 542971 - Simons Foundation)Published versio
Recommended from our members
Statistical Machine Learning Methods for the Large Scale Analysis of Neural Data
Modern neurotechnologies enable the recording of neural activity at the scale of entire brains and with single-cell resolution. However, the lack of principled approaches to extract structure from these massive data streams prevent us from fully exploiting the potential of these technologies. This thesis, divided in three parts, introduces new statistical machine learning methods to enable the large-scale analysis of some of these complex neural datasets. In the first part, I present a method that leverages Gaussian quadrature to accelerate inference of neural encoding models from a certain type of observed neural point processes --- spike trains --- resulting in substantial improvements over existing methods.
The second part focuses on the simultaneous electrical stimulation and recording of neurons using large electrode arrays. There, identification of neural activity is hindered by stimulation artifacts that are much larger than spikes, and overlap temporally with spikes. To surmount this challenge, I develop an algorithm to infer and cancel this artifact, enabling inference of the neural signal of interest. This algorithm is based on a a bayesian generative model for recordings, where a structured gaussian process is used to represent prior knowledge of the artifact. The algorithm achieves near perfect accuracy and enables the analysis of data hundreds of time faster than previous approaches.
The third part is motivated by the problem of inference of neural dynamics in the worm C.elegans: when taking a data-driven approach to this question, e.g., when using whole-brain calcium imaging data, one is faced with the need to match neural recordings to canonical neural identities, in practice resolved by tedious human labor. Alternatively, on a bayesian setup this problem may be cast as posterior inference of a latent permutation. I introduce methods that enable gradient-based approximate posterior inference of permutations, overcoming the difficulties imposed by the combinatorial and discrete nature of this object. Results suggest the feasibility of automating neural identification, and demonstrate variational inference in permutations is a sensible alternative to MCMC
Contributions to statistical analysis methods for neural spiking activity
With the technical advances in neuroscience experiments in the past few decades, we have seen a massive expansion in our ability to record neural activity. These advances enable neuroscientists to analyze more complex neural coding and communication properties, and at the same time, raise new challenges for analyzing neural spiking data, which keeps growing in scale, dimension, and complexity.
This thesis proposes several new statistical methods that advance statistical analysis approaches for neural spiking data, including sequential Monte Carlo (SMC) methods for efficient estimation of neural dynamics from membrane potential threshold crossings, state-space models using multimodal observation processes, and goodness-of-fit analysis methods for neural marked point process models.
In a first project, we derive a set of iterative formulas that enable us to simulate trajectories from stochastic, dynamic neural spiking models that are consistent with a set of spike time observations. We develop a SMC method to simultaneously estimate the parameters of the model and the unobserved dynamic variables from spike train data. We investigate the performance of this approach on a leaky integrate-and-fire model.
In another project, we define a semi-latent state-space model to estimate information related to the phenomenon of hippocampal replay. Replay is a recently discovered phenomenon where patterns of hippocampal spiking activity that typically occur during exploration of an environment are reactivated when an animal is at rest. This reactivation is accompanied by high frequency oscillations in hippocampal local field potentials. However, methods to define replay mathematically remain undeveloped. In this project, we construct a novel state-space model that enables us to identify whether replay is occurring, and if so to estimate the movement trajectories consistent with the observed neural activity, and to categorize the content of each event. The state-space model integrates information from the spiking activity from the hippocampal population, the rhythms in the local field potential, and the rat's movement behavior.
Finally, we develop a new, general time-rescaling theorem for marked point processes, and use this to develop a general goodness-of-fit framework for neural population spiking models. We investigate this approach through simulation and a real data application
Recommended from our members
Inference of functional neural connectivity and convergence acceleration methods
The knowledge of the maps of neuronal interactions is key for system neuroscience, but at the moment we possess relatively little of it . The recent development of experimental methods which allow a simultaneous recording of the spiking activity, but not the intracellular voltage, of thousands of neurons gives us an opportunity to start filling that gap. In Chapter 2, I present a method for the inference of the parameters of the leaky integrate-and-fire (LIF) model featuring time-dependent currents and conductances based only on the extracellular recording of spiking in the network. The fitted parameters can describe the functional connections in the network, as well as the internal properties of the cells. The method can also be used to determine whether a single-compartment model of a neuron should include conductance- or current-based synapses, or their mixture. In addition, because the same mathematical model describes some of the flavors of the Drift Diffusion Model (DDM), popular in the studies of decision making process, the presented method can be readily used to fit their parameters. Making the proposed inference procedure -- based on the expectation-maximization (EM) algorithm -- accurate and robust, necessitated a development of a new numerical adaptive-grid (AG) method for the forward-backward (FB) propagation of the probability density, which is required in the computation of the sufficient statistic in the EM algorithm. These topics are covered in Chapter 3. Another issue which had to be addressed in order to obtain a usable inference algorithm is the well known slow convergence of the EM algorithm in the flat regions of the loglikelihood. Two complementary approaches to this issue are presented in this dissertation. In Chapter 4, I present a new framework for the acceleration of convergence of iterative algorithms (not limited to the EM) which unifies all previously known methods and allows us to construct a new method demonstrating the best performance of them all. To make the computations even faster, I wrote a Matlab package which allows them to be done in parallel on several machines and clusters. As one can see, all the aforementioned projects were sprouted up from one "head" project on the inference of the LIF model parameters. At the end of the dissertation, I briefly describe a disconnected project which is devoted to the development of a flexible experimental setup (software and hardware) for behavioral experiments, with a specific application to a particular type of the virtual Morris water maze experiment (VMWM)
Advances in point process modeling: feature selection, goodness-of-fit and novel applications
The research contained in this thesis extends multivariate marked point process modeling methods for neuroscience, generalizes goodness-of-fit techniques for the class of marked point processes, and introduces the use of a general history-dependent point process model to the domain of sleep apnea.
Our first project involves further development of a modeling tool for spiking data from neural populations using the theory of marked point processes. This marked point process model uses features of spike waveforms as marks in order to estimate a state variable of interest. We examine the informational content of geometric features as well as principal components of the waveforms at hippocampal place cell activity by comparing decoding accuracies of a rat's position along a track. We determined that there was additional information available beyond that contained in traditional geometric features used for decoding in practice.
The expanded use of this marked point process model in neuroscience necessitates corresponding goodness-of-fit protocols for the marked case. In our second project, we develop a generalized time-rescaling method for marked point processes that produces uniformly distributed spikes under a proper model. Once rescaled, the ground process then behaves as a Poisson process and can be analyzed using traditional point process goodness-of-fit methods. We demonstrate the method's ability to detect quality and manner of fit through both simulation and real neural data analysis.
In the final project, we introduce history-dependent point process modeling as a superior method for characterizing severe sleep apnea over the current clinical standard known as the apnea-hypopnea index (AHI). We analyze model fits using combinations of both clinical covariates and event observations themselves through functions of history. Ultimately, apnea onset times were consistently estimated with significantly higher accuracy when history was incorporated alongside sleep stage. We present this method to the clinical audience as a means to gain detailed information on patterns of apnea and to provide more customized diagnoses and treatment prescriptions.
These separate yet complementary projects extend existing point process modeling methods and further demonstrate their value in the neurosciences, sleep sciences, and beyond
A Hierarchical, Fuzzy Inference Approach to Data Filtration and Feature Prioritization in the Connected Manufacturing Enterprise
The current big data landscape is one such that the technology and capability to capture and storage of data has preceded and outpaced the corresponding capability to analyze and interpret it. This has led naturally to the development of elegant and powerful algorithms for data mining, machine learning, and artificial intelligence to harness the potential of the big data environment. A competing reality, however, is that limitations exist in how and to what extent human beings can process complex information. The convergence of these realities is a tension between the technical sophistication or elegance of a solution and its transparency or interpretability by the human data scientist or decision maker. This dissertation, contextualized in the connected manufacturing enterprise, presents an original Fuzzy Approach to Feature Reduction and Prioritization (FAFRAP) approach that is designed to assist the data scientist in filtering and prioritizing data for inclusion in supervised machine learning models. A set of sequential filters reduces the initial set of independent variables, and a fuzzy inference system outputs a crisp numeric value associated with each feature to rank order and prioritize for inclusion in model training. Additionally, the fuzzy inference system outputs a descriptive label to assist in the interpretation of the feature’s usefulness with respect to the problem of interest. Model testing is performed using three publicly available datasets from an online machine learning data repository and later applied to a case study in electronic assembly manufacture. Consistency of model results is experimentally verified using Fisher’s Exact Test, and results of filtered models are compared to results obtained by the unfiltered sets of features using a proposed novel metric of performance-size ratio (PSR)
Acoustic sounding of snow water equivalent
An acoustic frequency-swept wave was investigated as a means for determining Snow Water Equivalent (SWE) in cold wind-swept prairie and sub-alpine environments. Building on previous research conducted by investigators who have examined the propagation of sound in snow, digital signal processing was used to determine acoustic pressure wave reflection coefficients at the interfaces between 'layers' indicative of changes in acoustic impedance. Using an iterative approach involving boundary conditions at the interfaces, the depth-integrated SWE was determined using the Berryman equation from porous media physics. Apparatuses used to send and receive sound waves were designed and deployed during the winter season at field sites situated near the city of Saskatoon, Saskatchewan, and in Yoho National Park, British Columbia. Data collected by gravimetric sampling was used as comparison for the SWE values determined by acoustic sounding. The results are encouraging and suggest that this procedure is similar in accuracy to SWE data collected using gravimetric sampling. Further research is required to determine the applicability of this technique for snow situated at other geographic locations
A Multi-Band Far-Infrared Survey with a Balloon-Borne Telescope
Nine additional radiation sources, above a 3-sigma confidence level of 1300 Jy, were identified at 100 microns by far infrared photometry of the galactic plane using a 0.4 meter aperture, liquid helium cooled, multichannel far infrared balloon-borne telescope. The instrument is described, including its electronics, pointing and suspension systems, and ground support equipment. Testing procedures and flight staging are discussed along with the reduction and analysis of the data acquired. The history of infrared astronomy is reviewed. General infrared techniques and the concerns of balloon astronomers are explored
Bayesian Parametric Receptive-Field Identification from Sparse or Noisy Data
Die Charakterisierung der Stimulusselektivität sensorischer Neuronen ist ein wichtiger Schritt zum Begreifen, wie die Information über die Umgebung im Gehirn dargestellt werden. Aufgrund der probabilistischen Natur der Beziehung zwischen externen Stimuli und neuronalen Reaktionen und der hohen Dimensionalität des Raums der natürlichen Stimuli ist dies jedoch eine besonders rechenintensive Aufgabe. Moderne, auf empirischen Bayes-Modellen basierte Methoden zur Identifizierung des rezeptiven Feldes skalieren alles andere als optimal für ein hochdimensionale Raum. Rechnerisch effiziente Implementierungen beruhen sich aber auf strengen Annahmen über den Spike-Generierungsprozess. Zudem liefern diese Modelle keine prinzipiell Glaubwürdigkeitintervalle für experimentell relevante Parameter, so dass es schwierig ist, die Messunsicherheit für Hypothesentests in Situationen mit spärlichen oder verrauschten Daten zu propagieren. Hier stellen wir einen vollständigen Bayes'schen Ansatz zur Identifizierung rezeptiver Felder bei spärlichen Daten vor, der auch eine prinzipielle Quantifizierung der Messunsicherheit ermöglicht. Wir profitieren davon, dass für viele sensorische Bereiche kanonische Modelle bereits existieren, die erklären, wie Neuronen ihre Eingangssignale in Feuerungsraten kodieren. Diese Modelle bestehen in der Regel aus wenigen, interpretierbaren Parametern und können den Raum der mit den Daten kompatiblen rezeptiven Felder einschränken. Obwohl solche Modelle möglicherweise nicht alle Nuancen eines bestimmten rezeptiven Feldes erfassen, können sie eine schnelle Charakterisierung der Kodierungseigenschaften eines Neurons ermöglichen. Wir führen eine Bayes'sche Inferenz direkt auf diesen Modellparametern durch und zeigen, dass wir das Vorhandensein eines rezeptiven Feldes mit einigen Zehntel der gemessenen Spikes unter physiologischen Bedingungen erkennen können. Darüber hinaus untersuchen wir, wie unterschiedliche Datenmengen die Modellparameter einschränken. Wir zeigen auch, wie ein vollständiger Bayes'scher Ansatz verwendet werden kann, um konkurrierende Hypothesen zu testen und einen Datensatz realer, spärlich gemessener Neuronen zu charakterisieren. In dieser Arbeit liegt unser Schwerpunkt zwar auf der Modellierung von Neuronen in visuellen kortikalen Bereichen, aber unser flexibler Ansatz hat das Potenzial, auf Neuronen in anderen Hirnbereichen mit unterschiedlichen Input-Output-Eigenschaften generalisiert zu werden.Characterizing the stimulus selectivity of sensory neurons is an important step towards understanding how information about the world is represented in the brain. However, this is a computationally challenging task, in particular due to the probabilistic nature of the relationship between external stimuli and neural responses and the high dimensionality of the space of natural stimuli. State-of-the-art receptive field identification methods based on empirical Bayes scale poorly to high-dimensional settings, and computationally efficient implementations rely on stringent assumptions about the spike generation process. Furthermore, these models fail to provide principled credible intervals for experimentally relevant parameters making it hard to propagate uncertainty for hypothesis testing in regimes of sparse or noisy data.
Here, we present a full Bayesian approach to identify receptive fields in sparse data regimes, which provides also a principled quantification of the estimation uncertainty. We take advantage of the fact that, for many sensory areas, there are canonical models that explain how neurons encode their inputs into firing rates. These models usually rely on few, interpretable parameters and can be used to constrain the space of receptive fields that can explain the data. While such models may not be flexible enough to capture all nuances of a particular receptive field, they can be effective for obtaining a fast characterization of the encoding properties of a neuron.
We perform Bayesian inference directly on these model parameters and we show that we can detect the presence of a receptive field with a few tenths of measured spikes in physiological conditions. Furthermore, we investigate how different amounts of data constrain the model parameters and we illustrate how a full Bayesian approach can be used to test competing hypotheses and characterize a dataset of real, sparsely-sampled neurons. In this work, our focus is directed in modeling neurons in visual cortical areas, but our flexible approach has the potential to be generalized to neurons in other brain areas, with different input-output properties
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
- …