16,773 research outputs found

    Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information

    Get PDF
    Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear and high-dimensional dependencies. Here a fully non-parametric test for continuous data based on conditional mutual information combined with a local permutation scheme is presented. Through a nearest neighbor approach, the test efficiently adapts also to non-smooth distributions due to strongly nonlinear dependencies. Numerical experiments demonstrate that the test reliably simulates the null distribution even for small sample sizes and with high-dimensional conditioning sets. The test is better calibrated than kernel-based tests utilizing an analytical approximation of the null distribution, especially for non-smooth densities, and reaches the same or higher power levels. Combining the local permutation scheme with the kernel tests leads to better calibration, but suffers in power. For smaller sample sizes and lower dimensions, the test is faster than random fourier feature-based kernel tests if the permutation scheme is (embarrassingly) parallelized, but the runtime increases more sharply with sample size and dimensionality. Thus, more theoretical research to analytically approximate the null distribution and speed up the estimation for larger sample sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl

    Efficient transfer entropy analysis of non-stationary neural time series

    Full text link
    Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these observations, available estimators assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that deals with the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method. We test the performance and robustness of our implementation on data from simulated stochastic processes and demonstrate the method's applicability to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscientific data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Science Concierge: A fast content-based recommendation system for scientific publications

    Full text link
    Finding relevant publications is important for scientists who have to cope with exponentially increasing numbers of scholarly material. Algorithms can help with this task as they help for music, movie, and product recommendations. However, we know little about the performance of these algorithms with scholarly material. Here, we develop an algorithm, and an accompanying Python library, that implements a recommendation system based on the content of articles. Design principles are to adapt to new content, provide near-real time suggestions, and be open source. We tested the library on 15K posters from the Society of Neuroscience Conference 2015. Human curated topics are used to cross validate parameters in the algorithm and produce a similarity metric that maximally correlates with human judgments. We show that our algorithm significantly outperformed suggestions based on keywords. The work presented here promises to make the exploration of scholarly material faster and more accurate.Comment: 12 pages, 5 figure
    • …
    corecore