2,924 research outputs found

    Statistical clustering of temporal networks through a dynamic stochastic block model

    Get PDF
    Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm

    A Simple BATSE Measure of GRB Duty Cycle

    Get PDF
    We introduce a definition of gamma-ray burst (GRB) duty cycle that describes the GRB's efficiency as an emitter; it is the GRB's average flux relative to the peak flux. This GRB duty cycle is easily described in terms of measured BATSE parameters; it is essentially fluence divided by the quantity peak flux times duration. Since fluence and duration are two of the three defining characteristics of the GRB classes identified by statistical clustering techniques (the other is spectral hardness), duty cycle is a potentially valuable probe for studying properties of these classes.Comment: 4 pages, 1 figure, presented at the 5th Huntsville Gamma-Ray Burst Symposiu

    Properties of Gamma-Ray Burst Classes

    Get PDF
    The three gamma-ray burst (GRB) classes identified by statistical clustering analysis (Mukherjee et al. 1998) are examined using the pattern recognition algorithm C4.5 (Quinlan 1986). Although the statistical existence of Class 3 (intermediate duration, intermediate fluence, soft) is supported, the properties of this class do not need to arise from a distinct source population. Class 3 properties can easily be produced from Class 1 (long, high fluence, intermediate hardness) by a combination of measurement error, hardness/intensity correlation, and a newly-identified BATSE bias (the fluence duration bias). Class 2 (short, low fluence, hard) does not appear to be related to Class 1.Comment: 5 pages, 4 imbedded figures, presented at the 5th Huntsville Gamma-Ray Burst Symposiu

    Statistical Clustering of Glioblastoma Multiforme for Graph Theory Analysis

    Get PDF
    In statistical clustering, proteins that cluster together are likely to possess a functional relationship with each other. By statistically clustering and filtering proteomic data, networks can be created so that the vast perplexity of protein-protein interaction data can be understood and meaningfully analyzed. Here, glioblastoma and glioblastoma multiforme phosphorylation data was obtained from PhosphoSitePlus and subsequently analyzed using R. The binary data were input into a dataframe and collapsed by their gene names. The Spearman-Euclidean and Euclidean distances were then calculated, with t-stochastic neighbor embedding being performed separately on the outputs. The results were then divided into discrete clusters. Offensively large clusters were broken down to a manageable size via a penalized matrix decomposition. The rank of the penalized matrix decomposition was determined by interpolating values of the data cluster using DINEOF, running PCA on the populated dataframe, plotting the number of principle components against the proportion of variance explained, and finally choosing the point of diminishing returns that still explained over 90% of the variance. Clusters were transformed into network and then visualized in Cytoscape. The final networks represent a useful tool for researchers concerned with protein-protein interactions in glioblastomas. Work is being done to integrate these networks with those obtained from mass spectrometry peak intensities, allowing meaningful analysis of legacy datasets

    Discovery of Activities via Statistical Clustering of Fixation Patterns

    Get PDF
    Human behavior often consists of a series of distinct activities, each characterized by a unique signature of visual behavior. This is true even in a restricted domain, such as piloting an aircraft, where patterns of visual signatures might represent activities like communicating, navigating, and monitoring. We propose a novel analysis method for gaze-tracking data, to perform blind discovery of these activities based on their behavioral signatures. The method is in some respects similar to recurrence analysis, but here we compare not individual fixations, but groups of fixations aggregated over a fixed time interval. The duration of this interval is a parameter that we will refer to as . We assume that the environment has been divided into a set of N different areas-of-interest (AOIs). For a given interval of time of duration , we compute the proportion of time spent fixating each AOI, resulting in an N-dimensional vector. These proportions can be converted to counts by multiplying by divided by the average fixation duration (another parameter that we fix at 280 milliseconds). We compare different intervals by computing the chi-square statistic. The p-value associated with the statistic is the likelihood of observing the data under the hypothesis that the data in the two intervals were generated by a single process with a single set of probabilities governing the fixation of each AOI. We have investigated the method using a set of 10 synthetic "activities," that sample 4 AOIs. Four of these activities visit 3 of the 4 AOIs, with equal probability; as there are four different ways to leave-one- out, there are four such activities. Similarly, there are six different activities that leave-two-out. Sequences of simulated behavior were generated by running each activity for 40 seconds, in sequence, for a total of 6.7 minutes. The figure to the right shows the matrix of chi-square statistics, using a value of 2.8 seconds for , corresponding to 10 fixations. Low values (dark) indicate poor evidence for activity differences, while high values (bright) indicate strong evidence. The dark squares along the main diagonal each correspond to the forty second intervals in which the activity was held constant; the 4x4 block at the lower left corresponds to the four leave-one-out activities, while the 6x6 block in the upper right corresponds to the leave-two-out activities. (The anti-diagonal pattern of white squares indicates those activity pairs that share no AOIs.) The chi-square values can be binarized by choosing a particular significance level; we are interested in grouping bins that represent the same activity, effectively accepting the null hypothesis. Therefore, we may adopt a relatively lax criterion; for example, choosing a p-value of 0.2 means that two behaviors that have only a 1-in-5 chance of being produced by a single activity might nevertheless be clustered together. We have explored several methods to perform clustering on the data and solving for the activity probabilities. Greedy methods begin by selecting the time bin that is similar to the most (or least) other bins, and then forming a cluster from it and all other non-discriminable bins. These methods show mediocre performance, as they do not take into account temporal contiguity. Preliminary results indicate that methods that "grow" clusters in time from seed points perform better

    Novel Statistical Clustering Method for Accurate Characterization of Word Pronunciation

    Get PDF
    This paper discusses the development method to determine the accuracy of pronunciation of the word using global statistical signal analysis parameters. An engineering word that has been chosen is ‘leaching’. The pronunciation of the word ‘leaching’ in the French language has been recorded from 1 native speaker and 4 students. The recording processes use a microphone-laptop system configuration and the signal analyzing processes use MATLAB software. Time and frequency domain plots show a variety of waveforms according to the recorded pronunciation. For data processing, statistical signal analysis parameters involved to extract the signal’s features are kurtosis, root mean square and skewness. The mapping process has been performed to cluster each data. The position of the samples from the students is referred to the samples from the native speaker. The result of the accuracy of the pronunciation of words for each student can be evaluated through the comparison of the position of all the samples. In conclusion, the development of mapping and clustering methods are able to characterize the accuracy of the pronunciation of words
    • 

    corecore