9 research outputs found

    A variational formulation for GTM through time: Theoretical foundations

    Get PDF
    Generative Topographic Mapping (GTM) is a latent variable model that, in its standard version, was conceived to provide clustering and visualization of multivariate, real-valued, i.i.d. data. It was also extended to deal with non-i.i.d. data such as multivariate time series in a variant called GTM Through Time (GTMTT), defined as a constrained Hidden Markov Model (HMM). In this technical report, we provide the theoretical foundations of the reformulation of GTM-TT within the Variational Bayesian framework. This approach, in its application, should naturally handle the presence of noise in the time series, helping to avert the problem of data overfitting.Postprint (published version

    Capturing the dynamics of multivariate time series through visualization using generative topographic mapping through time

    Get PDF
    Most of the existing research on time series concerns supervised forecasting problems. In comparison, little research has been devoted to unsupervised methods for the visual exploration of multivariate time series. In this paper, the capabilities of the Generative Topographic Mapping Through Time, a model with solid foundations in probability theory that performs simultaneous time series data clustering and visualization, are assessed in detail in several experiments. The focus is placed on the detection of atypical data, the visualization of the evolution of signal regimes, and the exploration of sudden transitions, for which a novel identification index is defined.Postprint (published version

    Tracking topic birth and death in LDA.

    Get PDF
    Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways

    Advanced Statistical Machine Learning Methods for the Analysis of Neurophysiologic Data with Medical Application

    Get PDF
    Transcranial magnetic stimulation procedures use a magnetic field to carry a short-lasting electrical current pulse into the brain, where it stimulates neurons, particularly in superficial regions of the cerebral cortex. It is a powerfull tool to calculate several parameters related to the intracortical excitability and inhibition of the motor cortex. The cortical silent period (CSP), evoked by magnetic stimulation, corresponds to the suppression of muscle activity for a short period after a muscle response to a magnetic stimulation. The duration of the CSP is paramount to assess intracortical inhibition, and it is known to be correlated with the prognosis of stroke patients’ motor ability. Current mechanisms to estimate the duration of the CSP are mostly based on the analysis of raw electromyographical (EMG) signal and they are very sensitive to the presence of noise. This master thesis is devoted to the analysis of the EMG signal of stroke patients under rehabilitation. The use of advanced statistical machine learning techniques that behave robustly in the presence of noise for this analysis allows us to accurately estimate signal parameters such as the CSP. The research reported in this thesis provides us with a first evidence about their applicability in other areas of neuroscience

    Advanced Statistical Machine Learning Methods for the Analysis of Neurophysiologic Data with Medical Application

    Get PDF
    Transcranial magnetic stimulation procedures use a magnetic field to carry a short-lasting electrical current pulse into the brain, where it stimulates neurons, particularly in superficial regions of the cerebral cortex. It is a powerfull tool to calculate several parameters related to the intracortical excitability and inhibition of the motor cortex. The cortical silent period (CSP), evoked by magnetic stimulation, corresponds to the suppression of muscle activity for a short period after a muscle response to a magnetic stimulation. The duration of the CSP is paramount to assess intracortical inhibition, and it is known to be correlated with the prognosis of stroke patients’ motor ability. Current mechanisms to estimate the duration of the CSP are mostly based on the analysis of raw electromyographical (EMG) signal and they are very sensitive to the presence of noise. This master thesis is devoted to the analysis of the EMG signal of stroke patients under rehabilitation. The use of advanced statistical machine learning techniques that behave robustly in the presence of noise for this analysis allows us to accurately estimate signal parameters such as the CSP. The research reported in this thesis provides us with a first evidence about their applicability in other areas of neuroscience

    A dynamic probabilistic model to visualise topic evolution in text streams

    No full text
    We propose a novel probabilistic method, based on latent variable models, for unsupervised topographic visualisation of dynamically evolving, coherent textual information. This can be seen as a complementary tool for topic detection and tracking applications. This is achieved by the exploitation of the a priori domain knowledge available, that there are relatively homogeneous temporal segments in the data stream. In a different manner from topographical techniques previously utilized for static text collections, the topography is an outcome of the coherence in time of the data stream in the proposed model. Simulation results on both toy-data settings and an actual application on Internet chat line discussion analysis is presented by way of demonstration
    corecore