50,892 research outputs found

    Neural activity classification with machine learning models trained on interspike interval series data

    Full text link
    The flow of information through the brain is reflected by the activity patterns of neural cells. Indeed, these firing patterns are widely used as input data to predictive models that relate stimuli and animal behavior to the activity of a population of neurons. However, relatively little attention was paid to single neuron spike trains as predictors of cell or network properties in the brain. In this work, we introduce an approach to neuronal spike train data mining which enables effective classification and clustering of neuron types and network activity states based on single-cell spiking patterns. This approach is centered around applying state-of-the-art time series classification/clustering methods to sequences of interspike intervals recorded from single neurons. We demonstrate good performance of these methods in tasks involving classification of neuron type (e.g. excitatory vs. inhibitory cells) and/or neural circuit activity state (e.g. awake vs. REM sleep vs. nonREM sleep states) on an open-access cortical spiking activity dataset

    Finding Statistically Significant Interactions between Continuous Features

    Full text link
    The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019

    Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

    Full text link
    We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

    Nonparametric Hierarchical Clustering of Functional Data

    Full text link
    In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

    Discovering Patterns of Interest in IP Traffic Using Cliques in Bipartite Link Streams

    Full text link
    Studying IP traffic is crucial for many applications. We focus here on the detection of (structurally and temporally) dense sequences of interactions, that may indicate botnets or coordinated network scans. More precisely, we model a MAWI capture of IP traffic as a link streams, i.e. a sequence of interactions (t1,t2,u,v)(t_1 , t_2 , u, v) meaning that devices uu and vv exchanged packets from time t1t_1 to time t2t_2 . This traffic is captured on a single router and so has a bipartite structure: links occur only between nodes in two disjoint sets. We design a method for finding interesting bipartite cliques in such link streams, i.e. two sets of nodes and a time interval such that all nodes in the first set are linked to all nodes in the second set throughout the time interval. We then explore the bipartite cliques present in the considered trace. Comparison with the MAWILab classification of anomalous IP addresses shows that the found cliques succeed in detecting anomalous network activity

    Inferring Social Status and Rich Club Effects in Enterprise Communication Networks

    Full text link
    Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among the most important motivating forces in social behaviors. In this paper, we consider the notion of status from the perspective of a position or title held by a person in an enterprise. We study the intersection of social status and social networks in an enterprise. We study whether enterprise communication logs can help reveal how social interactions and individual status manifest themselves in social networks. To that end, we use two enterprise datasets with three communication channels --- voice call, short message, and email --- to demonstrate the social-behavioral differences among individuals with different status. We have several interesting findings and based on these findings we also develop a model to predict social status. On the individual level, high-status individuals are more likely to be spanned as structural holes by linking to people in parts of the enterprise networks that are otherwise not well connected to one another. On the community level, the principle of homophily, social balance and clique theory generally indicate a "rich club" maintained by high-status individuals, in the sense that this community is much more connected, balanced and dense. Our model can predict social status of individuals with 93% accuracy.Comment: 13 pages, 4 figure
    • 

    corecore