1,777 research outputs found

    Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders

    Full text link
    We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both next-element and full sequence prediction given a sequence-prefix of any length. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. The results show that our method out-performs state-of-the-art algorithms for next-element prediction. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.Comment: 18 pages, 5 figures, 2 table

    Understanding Complex Systems: From Networks to Optimal Higher-Order Models

    Full text link
    To better understand the structure and function of complex systems, researchers often represent direct interactions between components in complex systems with networks, assuming that indirect influence between distant components can be modelled by paths. Such network models assume that actual paths are memoryless. That is, the way a path continues as it passes through a node does not depend on where it came from. Recent studies of data on actual paths in complex systems question this assumption and instead indicate that memory in paths does have considerable impact on central methods in network science. A growing research community working with so-called higher-order network models addresses this issue, seeking to take advantage of information that conventional network representations disregard. Here we summarise the progress in this area and outline remaining challenges calling for more research.Comment: 8 pages, 4 figure

    Predicting Off-target Effects in CRISPR-Cas9 System using Graph Convolutional Network

    Get PDF
    CRISPR-Cas9 is a powerful genome editing technology that has been widely applied in target gene repair and gene expression regulation. One of the main challenges for the CRISPR-Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far that predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques. Unfortunately, they implement a convoluted process that is difficult to understand and implement by researchers. This thesis focuses on developing a novel graph-based approach to predict off-target efficacy of sgRNA in CRISPR-Cas9 system that is easy to understand and replicate by researchers. This is achieved by creating a graph with sequences as nodes and by performing link prediction using Graph Convolutional Network (GCN) to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences

    Locating Community Smells in Software Development Processes Using Higher-Order Network Centralities

    Full text link
    Community smells are negative patterns in software development teams' interactions that impede their ability to successfully create software. Examples are team members working in isolation, lack of communication and collaboration across departments or sub-teams, or areas of the codebase where only a few team members can work on. Current approaches aim to detect community smells by analysing static network representations of software teams' interaction structures. In doing so, they are insufficient to locate community smells within development processes. Extending beyond the capabilities of traditional social network analysis, we show that higher-order network models provide a robust means of revealing such hidden patterns and complex relationships. To this end, we develop a set of centrality measures based on the MOGen higher-order network model and show their effectiveness in predicting influential nodes using five empirical datasets. We then employ these measures for a comprehensive analysis of a product team at the German IT security company genua GmbH, showcasing our method's success in identifying and locating community smells. Specifically, we uncover critical community smells in two areas of the team's development process. Semi-structured interviews with five team members validate our findings: while the team was aware of one community smell and employed measures to address it, it was not aware of the second. This highlights the potential of our approach as a robust tool for identifying and addressing community smells in software development teams. More generally, our work contributes to the social network analysis field with a powerful set of higher-order network centralities that effectively capture community dynamics and indirect relationships.Comment: 48 pages, 19 figures, 4 tables; accepted at Social Network Analysis and Mining (SNAM

    Disease spread through animal movements: a static and temporal network analysis of pig trade in Germany

    Full text link
    Background: Animal trade plays an important role for the spread of infectious diseases in livestock populations. As a case study, we consider pig trade in Germany, where trade actors (agricultural premises) form a complex network. The central question is how infectious diseases can potentially spread within the system of trade contacts. We address this question by analyzing the underlying network of animal movements. Methodology/Findings: The considered pig trade dataset spans several years and is analyzed with respect to its potential to spread infectious diseases. Focusing on measurements of network-topological properties, we avoid the usage of external parameters, since these properties are independent of specific pathogens. They are on the contrary of great importance for understanding any general spreading process on this particular network. We analyze the system using different network models, which include varying amounts of information: (i) static network, (ii) network as a time series of uncorrelated snapshots, (iii) temporal network, where causality is explicitly taken into account. Findings: Our approach provides a general framework for a topological-temporal characterization of livestock trade networks. We find that a static network view captures many relevant aspects of the trade system, and premises can be classified into two clearly defined risk classes. Moreover, our results allow for an efficient allocation strategy for intervention measures using centrality measures. Data on trade volume does barely alter the results and is therefore of secondary importance. Although a static network description yields useful results, the temporal resolution of data plays an outstanding role for an in-depth understanding of spreading processes. This applies in particular for an accurate calculation of the maximum outbreak size.Comment: main text 33 pages, 17 figures, supporting information 7 pages, 7 figure

    Terrain and Behavior Modeling for Projecting Multistage Cyber Attacks

    Get PDF
    Contributions from the information fusion community have enabled comprehensible traces of intrusion alerts occurring on computer networks. Traced or tracked cyber attacks are the bases for threat projection in this work. Due to its complexity, we separate threat projection into two subtasks: predicting likely next targets and predicting attacker behavior. A virtual cyber terrain is proposed for identifying likely targets. Overlaying traced alerts onto the cyber terrain reveals exposed vulnerabilities, services, and hosts. Meanwhile, a novel attempt to extract cyber attack behavior is discussed. Leveraging traditional work on prediction and compression, this work identifies behavior patterns from traced cyber attack data. The extracted behavior patterns are expected to further refine projections deduced from the cyber terrain

    Route Planning in Transportation Networks

    Full text link
    We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while others can deal efficiently with real-time traffic. Journey planning on public transportation systems, although conceptually similar, is a significantly harder problem due to its inherent time-dependent and multicriteria nature. Although exact algorithms are fast enough for interactive queries on metropolitan transit systems, dealing with continent-sized instances requires simplifications or heavy preprocessing. The multimodal route planning problem, which seeks journeys combining schedule-based transportation (buses, trains) with unrestricted modes (walking, driving), is even harder, relying on approximate solutions even for metropolitan inputs.Comment: This is an updated version of the technical report MSR-TR-2014-4, previously published by Microsoft Research. This work was mostly done while the authors Daniel Delling, Andrew Goldberg, and Renato F. Werneck were at Microsoft Research Silicon Valle
    corecore