8 research outputs found

    On methods to assess the significance of community structure in networks of financial time series

    Get PDF
    We consider the problem of determining whether the community structure found by a clustering algorithm applied to nancial time series is statistically signi cant, or is due to pure chance, when no other information than the observed values and a similarity measure among time series are available. As a subsidiary problem we also analyse the in uence of the choice of similarity measure in the accuracy of the clustering method. We propose two raw-data based methods for assessing robustness of clustering algorithms on time-dependent data linked by a relation of similarity: One based on community scoring functions that quantify some topological property that characterises ground-truth communities, and another based on random perturbations and quanti cation of the variation in the community structure. These methodologies are well-established in the realm of unweighted networks; our contribution are versions of these methodologies properly adapted to complete weighted networks.Peer ReviewedPostprint (published version

    On methods to assess the significance of community structure in networks of financial time series

    Get PDF
    We consider the problem of determining whether the community structure found by a clustering algorithm applied to financial time series is statistically significant, when no other information than the observed values and a similarity measure among time series is available. We propose two raw-data based methods for assessing robustness of clustering algorithms on time-dependent data linked by a relation of similarity: One based on community scoring functions that quantify some topological property that characterizes ground-truth communities, the other based on random perturbations and quantification of the variation in the community structure. These methodologies are well-established in the realm of unweighted networks; our contribution are versions adapted to complete weighted networks. We reinforce our assessment of the accuracy of the clustering algorithm by testing its performance on synthetic ground-truth communities of time series built through Monte Carlo simulations of VARMA processes

    Dynamic sporulation gene co-expression networks for Bacillus subtilis 168 and the food-borne isolate Bacillus amyloliquefaciens:a transcriptomic model

    Get PDF
    Sporulation is a survival strategy, adapted by bacterial cells in response to harsh environmental adversities. The adaptation potential differs between strains and the variations may arise from differences in gene regulation. Gene networks are a valuable way of studying such regulation processes and establishing associations between genes. We reconstructed and compared sporulation gene co-expression networks (GCNs) of the model laboratory strain Bacillus subtilis 168 and the food-borne industrial isolate Bacillus amyloliquefaciens. Transcriptome data obtained from samples of six stages during the sporulation process were used for network inference. Subsequently, a gene set enrichment analysis was performed to compare the reconstructed GCNs of B. subtilis 168 and B. amyloliquefaciens with respect to biological functions, which showed the enriched modules with coherent functional groups associated with sporulation. On basis of the GCNs and time-evolution of differentially expressed genes, we could identify novel candidate genes strongly associated with sporulation in B. subtilis 168 and B. amyloliquefaciens. The GCNs offer a framework for exploring transcription factors, their targets, and co-expressed genes during sporulation. Furthermore, the methodology described here can conveniently be applied to other species or biological processes

    Modeling Employment and Automation in the United States

    Get PDF
    When people change jobs, it is useful for both employers and employees to find best-fit jobs on the basis of the employees’ skillsets. We utilize the O*NET database to introduce the notion of the job distance, which allows us to measure the difference between jobs based on the skillsets required to successfully perform them. We then apply this measure to data from the Bureau of Labor Statistics (BLS) to model the job distribution in each metropolitan or rural area. Novel graph metrics are found along the way, but we ultimately address the impact of automation by combining a gravity and Markov model.Ope

    Clustering assessment in weighted networks

    Get PDF
    We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics

    Proclivity or Popularity? Exploring Agent Heterogeneity in Network Formation

    Get PDF
    The Barabasi-Albert model (BA model) is the standard algorithm used to describe the emergent mechanism of a scale-free network. This dissertation argues that the BA model, and its variants, rarely take agent heterogeneity into account in the analysis of network formation. In social networks, however, people\u27s decisions to connect are strongly affected by the extent of similarity. In this dissertation, the author applies an agent-based modeling (ABM) approach to reassess the Barabasi-Albert model. This study proposes that, in forming social networks, agents are constantly balancing between instrumental and intrinsic preferences. After systematic simulation and subsequent analysis, this study finds that agents\u27 preference of popularity and proclivity strongly shapes various attributes of simulated social networks. Moreover, this analysis of simulated networks investigates potential ways to detect this balance within real-world networks. Particularly, the scale parameter of the power-distribution is found sensitive solely to agents\u27 preference popularity. Finally, this study employs the social media data (i.e., diffusion of different emotions) for Sina Weibo—a Chinese version Tweet—to valid the findings, and results suggest that diffusion of anger is more popularity-driven

    A clustering coefficient for complete weighted networks

    No full text
    corecore