487 research outputs found

    Efficient transfer entropy analysis of non-stationary neural time series

    Full text link
    Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these observations, available estimators assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that deals with the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method. We test the performance and robustness of our implementation on data from simulated stochastic processes and demonstrate the method's applicability to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscientific data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON

    AMIC:An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data

    Get PDF

    Self-improving Algorithms for Convex Hulls

    Full text link

    Design and analysis of rule induction systems

    Get PDF
    The RULES family of algorithms is reviewed in this work and the drawback of the variation in their generalisation performance is investigated. This results in a new data ordering method (DOM) for the RULES family of inductive learning algorithms. DOM is based on the selection of the most representative example; the method has been tested as a pre-processing stage for many data sets and has shown promising results. Another difficulty faced is the growing size of training data sets, which results in long algorithm execution times and less compact generated rules. In this study a new data sorting method (DSM) is developed for ordering the whole data set and reducing the training time. This is based on selecting relevant attributes and best possible examples to represent a data set. Finally, the order in which the raw data is introduced to the RULES family algorithms considerably affects the accuracy of the generated rules. This work presents a new data grouping method (DGM) to solve this problem, which is based on clustering. This method, in the form of an algorithm, is integrated into a data mining tool and applied to a real project; as a result, better variation in the classification percentage and a lower number of rules formed has been achieved

    Discovering robust dependencies from data

    Get PDF
    Science revolves around forming hypotheses, designing experiments, collecting data, and tests. It was not until recently, with the advent of modern hardware and data analytics, that science shifted towards a big-data-driven paradigm that led to an unprecedented success across various fields. What is perhaps the most astounding feature of this new era, is that interesting hypotheses can now be automatically discovered from observational data. This dissertation investigates knowledge discovery procedures that do exactly this. In particular, we seek algorithms that discover the most informative models able to compactly “describe” aspects of the phenomena under investigation, in both supervised and unsupervised settings. We consider interpretable models in the form of subsets of the original variable set. We want the models to capture all possible interactions, e.g., linear, non-linear, between all types of variables, e.g., discrete, continuous, and lastly, we want their quality to be meaningfully assessed. For this, we employ information-theoretic measures, and particularly, the fraction of information for the supervised setting, and the normalized total correlation for the unsupervised. The former measures the uncertainty reduction of the target variable conditioned on a model, and the latter measures the information overlap of the variables included in a model. Without access to the true underlying data generating process, we estimate the aforementioned measures from observational data. This process is prone to statistical errors, and in our case, the errors manifest as biases towards larger models. This can lead to situations where the results are utterly random, hindering therefore further analysis. We correct this behavior with notions from statistical learning theory. In particular, we propose regularized estimators that are unbiased under the hypothesis of independence, leading to robust estimation from limited data samples and arbitrary dimensionalities. Moreover, we do this for models consisting of both discrete and continuous variables. Lastly, to discover the top scoring models, we derive effective optimization algorithms for exact, approximate, and heuristic search. These algorithms are powered by admissible, tight, and efficient-to-compute bounding functions for our proposed estimators that can be used to greatly prune the search space. Overall, the products of this dissertation can successfully assist data analysts with data exploration, discovering powerful description models, or concluding that no satisfactory models exist, implying therefore new experiments and data are required for the phenomena under investigation. This statement is supported by Materials Science researchers who corroborated our discoveries.In der Wissenschaft geht es um Hypothesenbildung, Entwerfen von Experimenten, Sammeln von Daten und Tests. Jüngst hat sich die Wissenschaft, durch das Aufkommen moderner Hardware und Datenanalyse, zu einem Big-Data-basierten Paradigma hin entwickelt, das zu einem beispiellosen Erfolg in verschiedenen Bereichen geführt hat. Ein erstaunliches Merkmal dieser neuen ra ist, dass interessante Hypothesen jetzt automatisch aus Beobachtungsdaten entdeckt werden k nnen. In dieser Dissertation werden Verfahren zur Wissensentdeckung untersucht, die genau dies tun. Insbesondere suchen wir nach Algorithmen, die Modelle identifizieren, die in der Lage sind, Aspekte der untersuchten Ph nomene sowohl in beaufsichtigten als auch in unbeaufsichtigten Szenarien kompakt zu “beschreiben”. Hierzu betrachten wir interpretierbare Modelle in Form von Untermengen der ursprünglichen Variablenmenge. Ziel ist es, dass diese Modelle alle m glichen Interaktionen erfassen (z.B. linear, nicht-lineare), zwischen allen Arten von Variablen unterscheiden (z.B. diskrete, kontinuierliche) und dass schlussendlich ihre Qualit t sinnvoll bewertet wird. Dazu setzen wir informationstheoretische Ma e ein, insbesondere den Informationsanteil für das überwachte und die normalisierte Gesamtkorrelation für das unüberwachte Szenario. Ersteres misst die Unsicherheitsreduktion der Zielvariablen, die durch ein Modell bedingt ist, und letztere misst die Informationsüberlappung der enthaltenen Variablen. Ohne Kontrolle des Datengenerierungsprozesses werden die oben genannten Ma e aus Beobachtungsdaten gesch tzt. Dies ist anf llig für statistische Fehler, die zu Verzerrungen in gr  eren Modellen führen. So entstehen Situationen, wobei die Ergebnisse v llig zuf llig sind und somit weitere Analysen st ren. Wir korrigieren dieses Verhalten mit Methoden aus der statistischen Lerntheorie. Insbesondere schlagen wir regularisierte Sch tzer vor, die unter der Hypothese der Unabh ngigkeit nicht verzerrt sind und somit zu einer robusten Sch tzung aus begrenzten Datenstichproben und willkürlichen-Dimensionalit ten führen. Darüber hinaus wenden wir dies für Modelle an, die sowohl aus diskreten als auch aus kontinuierlichen Variablen bestehen. Um die besten Modelle zu entdecken, leiten wir effektive Optimierungsalgorithmen mit verschiedenen Garantien ab. Diese Algorithmen basieren auf speziellen Begrenzungsfunktionen der vorgeschlagenen Sch tzer und erlauben es den Suchraum stark einzuschr nken. Insgesamt sind die Produkte dieser Arbeit sehr effektiv für die Wissensentdeckung. Letztere Aussage wurde von Materialwissenschaftlern best tigt

    Cortico-hippocampal activations for high entropy visual stimulus: an fMRI perspective

    Get PDF
    We perceive the environment around us in order to act upon it. To gain the desirable outcome effectively, we not only need the incoming information to be processed efficiently but we also need to know how reliable this information is. How this uncertainty is extracted from the visual input and how is it represented in the brain are still open questions. The hippocampus reacts to different measures of uncertainty. Because it is strongly connected to different cortical and subcortical regions, the hippocampus has the resources to communicate such information to other brain regions involved in visual processing and other cognitive processes. In this thesis, we investigate the aspects of uncertainty to which the hippocampus reacts. Is it the uncertainty in the ongoing recognition attempt of a temporally unfolding stimulus or is it the low-level spatiotemporal entropy? To answer this question, we used a dynamic visual stimulus with varying spatial and spatiotemporal entropy. We used well-structured virtual tunnel videos and the corresponding phase-scrambled videos with matching local luminance and contrast per frame. We also included pixel scrambled videos with high spatial and spatiotemporal entropy in our stimulus set. Brain responses (fMRI images) from the participants were recorded while they watched these videos and performed an engaging but cognitively independent task. Using the General Linear Model (GLM), we modeled the brain responses corresponding to different video types and found that the early visual cortex and the hippocampus had a stronger response to videos with higher spatiotemporal entropy. Using independent component analysis, we further investigated which underlying networks were recruited in processing high entropy visual information. We also discovered how these networks might influence each other. We found two cortico-hippocampal networks involved in processing our stimulus videos. While one of them represented a general primary visual processing network, the other was activated strongly by the high entropy videos and deactivated by the well-structured virtual tunnel videos. We also found a hierarchy in the processing stream with information flowing from less stimulus-specific to more stimulus-specific networks
    corecore