12 research outputs found
Unsupervised discovery of temporal sequences in high-dimensional datasets, with applications to neuroscience.
Identifying low-dimensional features that describe large-scale neural recordings is a major challenge in neuroscience. Repeated temporal patterns (sequences) are thought to be a salient feature of neural dynamics, but are not succinctly captured by traditional dimensionality reduction techniques. Here, we describe a software toolbox-called seqNMF-with new methods for extracting informative, non-redundant, sequences from high-dimensional neural data, testing the significance of these extracted patterns, and assessing the prevalence of sequential structure in data. We test these methods on simulated data under multiple noise conditions, and on several real neural and behavioral datas. In hippocampal data, seqNMF identifies neural sequences that match those calculated manually by reference to behavioral events. In songbird data, seqNMF discovers neural sequences in untutored birds that lack stereotyped songs. Thus, by identifying temporal structure directly from neural data, seqNMF enables dissection of complex neural circuits without relying on temporal references from stimuli or behavioral outputs
Real-time detection of overlapping sound events with non-negative matrix factorization
International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time
Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications
Nonnegative matrix factorization (NMF) has become a workhorse for signal and
data analytics, triggered by its model parsimony and interpretability. Perhaps
a bit surprisingly, the understanding to its model identifiability---the major
reason behind the interpretability in many applications such as topic mining
and hyperspectral imaging---had been rather limited until recent years.
Beginning from the 2010s, the identifiability research of NMF has progressed
considerably: Many interesting and important results have been discovered by
the signal processing (SP) and machine learning (ML) communities. NMF
identifiability has a great impact on many aspects in practice, such as
ill-posed formulation avoidance and performance-guaranteed algorithm design. On
the other hand, there is no tutorial paper that introduces NMF from an
identifiability viewpoint. In this paper, we aim at filling this gap by
offering a comprehensive and deep tutorial on model identifiability of NMF as
well as the connections to algorithms and applications. This tutorial will help
researchers and graduate students grasp the essence and insights of NMF,
thereby avoiding typical `pitfalls' that are often times due to unidentifiable
NMF formulations. This paper will also help practitioners pick/design suitable
factorization tools for their own problems.Comment: accepted version, IEEE Signal Processing Magazine; supplementary
materials added. Some minor revisions implemente
Constrained Nonnegative Matrix Factorization with Applications to Music Transcription
In this work we explore using nonnegative matrix factorization (NMF) for music transcription, as well as several other applications. NMF is an unsupervised learning method capable of finding a parts-based additive model of data. Since music has an additive property (each time point in a musical piece is composed of a sum of notes) NMF is a natural fit for analysis. NMF is able to exploit this additivity in order to factorize out both the individual notes and the transcription from an audio sample.
In order to improve the performance of NMF we apply different constraints to the model. We consider sparsity as well as piecewise smoothness with aligned breakpoints. We show the novelty of our method on real music data and demonstrate promising results which exceed the current state of the art. Other applications are also considered, such as instrument and speaker separation and handwritten character analysis
Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation
The brain effortlessly extracts latent causes of stimuli, but how it does
this at the network level remains unknown. Most prior attempts at this problem
proposed neural networks that implement independent component analysis which
works under the limitation that latent causes are mutually independent. Here,
we relax this limitation and propose a biologically plausible neural network
that extracts correlated latent sources by exploiting information about their
domains. To derive this network, we choose maximum correlative information
transfer from inputs to outputs as the separation objective under the
constraint that the outputs are restricted to their presumed sets. The online
formulation of this optimization problem naturally leads to neural networks
with local learning rules. Our framework incorporates infinitely many source
domain choices and flexibly models complex latent structures. Choices of
simplex or polytopic source domains result in networks with piecewise-linear
activation functions. We provide numerical examples to demonstrate the superior
correlated source separation capability for both synthetic and natural sources.Comment: Preprint, 32 page
Recommended from our members
Scalable Tools for Information Extraction and Causal Modeling of Neural Data
Systems neuroscience has entered in the past 20 years into an era that one might call "large scale systems neuroscience". From tuning curves and single neuron recordings there has been a conceptual shift towards a more holistic understanding of how the neural circuits work and as a result how their representations produce neural tunings.
With the introduction of a plethora of datasets in various scales, modalities, animals, and systems; we as a community have witnessed invaluable insights that can be gained from the collective view of a neural circuit which was not possible with small scale experimentation. The concurrency of the advances in neural recordings such as the production of wide field imaging technologies and neuropixels with the developments in statistical machine learning and specifically deep learning has brought system neuroscience one step closer to data science. With this abundance of data, the need for developing computational models has become crucial. We need to make sense of the data, and thus we need to build models that are constrained up to the acceptable amount of biological detail and probe those models in search of neural mechanisms.
This thesis consists of sections covering a wide range of ideas from computer vision, statistics, machine learning, and dynamical systems. But all of these ideas share a common purpose, which is to help automate neuroscientific experimentation process in different levels. In chapters 1, 2, and 3, I develop tools that automate the process of extracting useful information from raw neuroscience data in the model organism C. elegans. The goal of this is to avoid manual labor and pave the way for high throughput data collection aiming at better quantification of variability across the population of worms. Due to its high level of structural and functional stereotypy, and its relative simplicity, the nematode C. elegans has been an attractive model organism for systems and developmental research. With 383 neurons in males and 302 neurons in hermaphrodites, the positions and function of neurons is remarkably conserved across individuals. Furthermore, C. elegans remains the only organism for which a complete cellular, lineage, and anatomical map of the entire nervous system has been described for both sexes. Here, I describe the analysis pipeline that we developed for the recently proposed NeuroPAL technique in C. elegans. Our proposed pipeline consists of atlas building (chapter 1), registration, segmentation, neural tracking (chapter 2), and signal extraction (chapter 3). I emphasize that categorizing the analysis techniques as a pipeline consisting of the above steps is general and can be applied to virtually every single animal model and emerging imaging modality. I use the language of probabilistic generative modeling and graphical models to communicate the ideas in a rigorous form, therefore some familiarity with those concepts could help the reader navigate through the chapters of this thesis more easily.
In chapters 4 and 5 I build models that aim to automate hypothesis testing and causal interrogation of neural circuits. The notion of functional connectivity (FC) has been instrumental in our understanding of how information propagates in a neural circuit. However, an important limitation is that current techniques do not dissociate between causal connections and purely functional connections with no mechanistic correspondence. I start chapter 4 by introducing causal inference as a unifying language for the following chapters. In chapter 4 I define the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit providing a more mechanistic description of the information flow. I then investigate which functional connectivity metrics are best predictive of IC in simulations and real data. Following this framework, I discuss how stimulations and interventions can be used to improve fitting and generalization properties of time series models. Building on the literature of model identification and active causal discovery I develop a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories. Finally in chapter 5 I develop a new FC metric that separates the transferred information from one variable to the other into unique and synergistic sources.
In all projects, I have abstracted out concepts that are specific to the datasets at hand and developed the methods in the most general form. This makes the presented methods applicable to a broad range of datasets, potentially leading to new findings. In addition, all projects are accompanied with extensible and documented code packages, allowing theorists to repurpose the modules for novel applications and experimentalists to run analysis on their datasets efficiently and scalably.
In summary my main contribution in this thesis are the following:
1) Building the first atlases of hermaphrodite and male C. elegans and developing a generic statistical framework for constructing atlases for a broad range of datasets.
2) Developing a semi-automated analysis pipeline for neural registration, segmentation, and tracking in C. elegans.
3) Extending the framework of non-negative matrix factorization to datasets with deformable motion and developing algorithms for joint tracking and signal demixing from videos of semi-immobilized C. elegans.
4) Defining the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit and investigating which functional connectivity metrics are best predictive of IC in simulations and real data.
5) Developing a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories.
6) Developing a new functional connectivity metric that separates the transferred information from one variable to the other into unique and synergistic sources.
7) Implementing extensible, well documented, open source code packages for each of the above contributions