197 research outputs found

    Computing Vertex Centrality Measures in Massive Real Networks with a Neural Learning Model

    Full text link
    Vertex centrality measures are a multi-purpose analysis tool, commonly used in many application environments to retrieve information and unveil knowledge from the graphs and network structural properties. However, the algorithms of such metrics are expensive in terms of computational resources when running real-time applications or massive real world networks. Thus, approximation techniques have been developed and used to compute the measures in such scenarios. In this paper, we demonstrate and analyze the use of neural network learning algorithms to tackle such task and compare their performance in terms of solution quality and computation time with other techniques from the literature. Our work offers several contributions. We highlight both the pros and cons of approximating centralities though neural learning. By empirical means and statistics, we then show that the regression model generated with a feedforward neural networks trained by the Levenberg-Marquardt algorithm is not only the best option considering computational resources, but also achieves the best solution quality for relevant applications and large-scale networks. Keywords: Vertex Centrality Measures, Neural Networks, Complex Network Models, Machine Learning, Regression ModelComment: 8 pages, 5 tables, 2 figures, version accepted at IJCNN 2018. arXiv admin note: text overlap with arXiv:1810.1176

    Discriminative Distance-Based Network Indices with Application to Link Prediction

    Full text link
    In large networks, using the length of shortest paths as the distance measure has shortcomings. A well-studied shortcoming is that extending it to disconnected graphs and directed graphs is controversial. The second shortcoming is that a huge number of vertices may have exactly the same score. The third shortcoming is that in many applications, the distance between two vertices not only depends on the length of shortest paths, but also on the number of shortest paths. In this paper, first we develop a new distance measure between vertices of a graph that yields discriminative distance-based centrality indices. This measure is proportional to the length of shortest paths and inversely proportional to the number of shortest paths. We present algorithms for exact computation of the proposed discriminative indices. Second, we develop randomized algorithms that precisely estimate average discriminative path length and average discriminative eccentricity and show that they give (ϵ,δ)(\epsilon,\delta)-approximations of these indices. Third, we perform extensive experiments over several real-world networks from different domains. In our experiments, we first show that compared to the traditional indices, discriminative indices have usually much more discriminability. Then, we show that our randomized algorithms can very precisely estimate average discriminative path length and average discriminative eccentricity, using only few samples. Then, we show that real-world networks have usually a tiny average discriminative path length, bounded by a constant (e.g., 2). Fourth, in order to better motivate the usefulness of our proposed distance measure, we present a novel link prediction method, that uses discriminative distance to decide which vertices are more likely to form a link in future, and show its superior performance compared to the well-known existing measures

    Domino D5.1 - Metrics and analysis approach

    Get PDF
    This deliverable presents the metrics proposed to assess the impact of innovations in the ATM system and a stylized ABM model, called a ‘toy model’, to be used as a test ground for the metrics. Existing network metrics are reviewed and their limitations are highlighted by applying them to real data. New metrics are then suggested to overcome these limitations. Their better results in measuring interconnections and causal relationships between the elements of the ATM system are shown for empirical case studies. The design of the toy model is presented and preliminary results of its baseline implementation are shown

    Model-based clustering reveals vitamin D dependent multi-centrality hubs in a network of vitamin-related proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nutritional systems biology offers the potential for comprehensive predictions that account for all metabolic changes with the intricate biological organization and the multitudinous interactions between the cellular proteins. Protein-protein interaction (PPI) networks can be used for an integrative description of molecular processes. Although widely adopted in nutritional systems biology, these networks typically encompass a single category of functional interaction (<it>i.e</it>., metabolic, regulatory or signaling) or nutrient. Incorporating multiple nutrients and functional interaction categories under an integrated framework represents an informative approach for gaining system level insight on nutrient metabolism.</p> <p>Results</p> <p>We constructed a multi-level PPI network starting from the interactions of 200 vitamin-related proteins. Its final size was 1,657 proteins, with 2,700 interactions. To characterize the role of the proteins we computed 6 centrality indices and applied model-based clustering. We detected a subgroup of 22 proteins that were highly central and significantly related to vitamin D. Immune system and cancer-related processes were strongly represented among these proteins. Clustering of the centralities revealed a degree of redundancy among the indices; a repeated analysis using subsets of the centralities performed well in identifying the original set of 22 most central proteins.</p> <p>Conclusions</p> <p>Hierarchical and model-based clustering revealed multi-centrality hubs in a vitamin PPI network and redundancies among the centrality indices. Vitamin D-related proteins were strongly represented among network hubs, highlighting the pervasive effects of this nutrient. Our integrated approach to network construction identified promiscuous transcription factors, cytokines and enzymes - primarily related to immune system and cancer processes - representing potential gatekeepers linking vitamin intake to disease.</p

    A graphical LASSO analysis of global quality of life, sub scales of the EORTC QLQ-C30 instrument and depression in early breast cancer

    Get PDF
    We aimed to (a) investigate the interplay between depression, symptoms and level of functioning, and (b) understand the paths through which they influence health related quality of life (QOL) during the first year of rehabilitation period of early breast cancer. A network analysis method was used. The population consisted of 487 women aged 35-68 years, who had recently completed adjuvant chemotherapy or started endocrine therapy for early breast cancer. At baseline and at the first year from randomization QOL, symptomatology and functioning by the EORTC QLQ-C30 and BR-23 questionnaires, and depression by the Finnish version of Beck's 13-item depression scale, were collected. The multivariate interplay between the related scales was analysed via regularized partial correlation networks (graphical LASSO). The median global quality of life (gQoL) at baseline was 69.9 +/- 19.0 (16.7-100) and improved to 74.9 +/- 19.0 (0-100) after 1 year. Scales related to mental health (emotional functioning, cognitive functioning, depression, insomnia, body image, future perspective) were clustered together at both time points. Fatigue was mediated through a different route, having the strongest connection with physical functioning and no direct connection with depression. Multiple paths existed connecting symptoms and functioning types with gQoL. Factors with the strongest connections to gQoL included: social functioning, depression and fatigue at baseline; emotional functioning and fatigue at month 12. Overall, the most important nodes were depression, gQoL and fatigue. The graphical LASSO network analysis revealed that scales related to fatigue and emotional health had the strongest associations to the EORTC QLQ-C30 gQoL score. When we plan interventions for patients with impaired QOL it is important to consider both psychological support and interventions that improve fatigue and physical function like exercise.Peer reviewe

    Using graph theory to analyze biological networks

    Get PDF
    Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system

    Maximum likelihood estimation for randomized shortest paths with trajectory data

    Get PDF
    Randomized shortest paths (RSPs) are tool developed in recent years for different graph and network analysis applications, such as modelling movement or flow in networks. In essence, the RSP framework considers the temperature-dependent Gibbs–Boltzmann distribution over paths in the network. At low temperatures, the distribution focuses solely on the shortest or least-cost paths, while with increasing temperature, the distribution spreads over random walks on the network. Many relevant quantities can be computed conveniently from this distribution, and these often generalize traditional network measures in a sensible way. However, when modelling real phenomena with RSPs, one needs a principled way of estimating the parameters from data. In this work, we develop methods for computing the maximum likelihood estimate of the model parameters, with focus on the temperature parameter, when modelling phenomena based on movement, flow or spreading processes. We test the validity of the derived methods with trajectories generated on artificial networks as well as with real data on the movement of wild reindeer in a geographic landscape, used for estimating the degree of randomness in the movement of the animals. These examples demonstrate the attractiveness of the RSP framework as a generic model to be used in diverse applications. randomized shortest paths; random walk; shortest path; parameter estimation; maximum likelihood; animal movement modellingpublishedVersio
    corecore