1,777 research outputs found
Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders
We propose a novel sequence prediction method for sequential data capturing
node traversals in graphs. Our method builds on a statistical modelling
framework that combines multiple higher-order network models into a single
multi-order model. We develop a technique to fit such multi-order models in
empirical sequential data and to select the optimal maximum order. Our
framework facilitates both next-element and full sequence prediction given a
sequence-prefix of any length. We evaluate our model based on six empirical
data sets containing sequences from website navigation as well as public
transport systems. The results show that our method out-performs
state-of-the-art algorithms for next-element prediction. We further demonstrate
the accuracy of our method during out-of-sample sequence prediction and
validate that our method can scale to data sets with millions of sequences.Comment: 18 pages, 5 figures, 2 table
Understanding Complex Systems: From Networks to Optimal Higher-Order Models
To better understand the structure and function of complex systems,
researchers often represent direct interactions between components in complex
systems with networks, assuming that indirect influence between distant
components can be modelled by paths. Such network models assume that actual
paths are memoryless. That is, the way a path continues as it passes through a
node does not depend on where it came from. Recent studies of data on actual
paths in complex systems question this assumption and instead indicate that
memory in paths does have considerable impact on central methods in network
science. A growing research community working with so-called higher-order
network models addresses this issue, seeking to take advantage of information
that conventional network representations disregard. Here we summarise the
progress in this area and outline remaining challenges calling for more
research.Comment: 8 pages, 4 figure
Predicting Off-target Effects in CRISPR-Cas9 System using Graph Convolutional Network
CRISPR-Cas9 is a powerful genome editing technology that has been widely applied in target
gene repair and gene expression regulation. One of the main challenges for the CRISPR-Cas9
system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them
is necessary due to its relevance in gene editing research. Very few deep learning models have
been developed so far that predict the off-target propensity of single guide RNA (sgRNA) at
specific DNA fragments by using artificial feature extract operations and machine learning techniques.
Unfortunately, they implement a convoluted process that is difficult to understand and
implement by researchers. This thesis focuses on developing a novel graph-based approach to
predict off-target efficacy of sgRNA in CRISPR-Cas9 system that is easy to understand and
replicate by researchers. This is achieved by creating a graph with sequences as nodes and by
performing link prediction using Graph Convolutional Network (GCN) to predict the presence
of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences
are extracted from within the sequences
Locating Community Smells in Software Development Processes Using Higher-Order Network Centralities
Community smells are negative patterns in software development teams'
interactions that impede their ability to successfully create software.
Examples are team members working in isolation, lack of communication and
collaboration across departments or sub-teams, or areas of the codebase where
only a few team members can work on. Current approaches aim to detect community
smells by analysing static network representations of software teams'
interaction structures. In doing so, they are insufficient to locate community
smells within development processes. Extending beyond the capabilities of
traditional social network analysis, we show that higher-order network models
provide a robust means of revealing such hidden patterns and complex
relationships. To this end, we develop a set of centrality measures based on
the MOGen higher-order network model and show their effectiveness in predicting
influential nodes using five empirical datasets. We then employ these measures
for a comprehensive analysis of a product team at the German IT security
company genua GmbH, showcasing our method's success in identifying and locating
community smells. Specifically, we uncover critical community smells in two
areas of the team's development process. Semi-structured interviews with five
team members validate our findings: while the team was aware of one community
smell and employed measures to address it, it was not aware of the second. This
highlights the potential of our approach as a robust tool for identifying and
addressing community smells in software development teams. More generally, our
work contributes to the social network analysis field with a powerful set of
higher-order network centralities that effectively capture community dynamics
and indirect relationships.Comment: 48 pages, 19 figures, 4 tables; accepted at Social Network Analysis
and Mining (SNAM
Disease spread through animal movements: a static and temporal network analysis of pig trade in Germany
Background: Animal trade plays an important role for the spread of infectious
diseases in livestock populations. As a case study, we consider pig trade in
Germany, where trade actors (agricultural premises) form a complex network. The
central question is how infectious diseases can potentially spread within the
system of trade contacts. We address this question by analyzing the underlying
network of animal movements.
Methodology/Findings: The considered pig trade dataset spans several years
and is analyzed with respect to its potential to spread infectious diseases.
Focusing on measurements of network-topological properties, we avoid the usage
of external parameters, since these properties are independent of specific
pathogens. They are on the contrary of great importance for understanding any
general spreading process on this particular network. We analyze the system
using different network models, which include varying amounts of information:
(i) static network, (ii) network as a time series of uncorrelated snapshots,
(iii) temporal network, where causality is explicitly taken into account.
Findings: Our approach provides a general framework for a
topological-temporal characterization of livestock trade networks. We find that
a static network view captures many relevant aspects of the trade system, and
premises can be classified into two clearly defined risk classes. Moreover, our
results allow for an efficient allocation strategy for intervention measures
using centrality measures. Data on trade volume does barely alter the results
and is therefore of secondary importance. Although a static network description
yields useful results, the temporal resolution of data plays an outstanding
role for an in-depth understanding of spreading processes. This applies in
particular for an accurate calculation of the maximum outbreak size.Comment: main text 33 pages, 17 figures, supporting information 7 pages, 7
figure
Terrain and Behavior Modeling for Projecting Multistage Cyber Attacks
Contributions from the information fusion community have enabled comprehensible traces of intrusion alerts occurring on computer networks. Traced or tracked cyber attacks are the bases for threat projection in this work. Due to its complexity, we separate threat projection into two subtasks: predicting likely next targets and predicting attacker behavior. A virtual cyber terrain is proposed for identifying likely targets. Overlaying traced alerts onto the cyber terrain reveals exposed vulnerabilities, services, and hosts. Meanwhile, a novel attempt to extract cyber attack behavior is discussed. Leveraging traditional work on prediction and compression, this work identifies behavior patterns from traced cyber attack data. The extracted behavior patterns are expected to further refine projections deduced from the cyber terrain
Route Planning in Transportation Networks
We survey recent advances in algorithms for route planning in transportation
networks. For road networks, we show that one can compute driving directions in
milliseconds or less even at continental scale. A variety of techniques provide
different trade-offs between preprocessing effort, space requirements, and
query time. Some algorithms can answer queries in a fraction of a microsecond,
while others can deal efficiently with real-time traffic. Journey planning on
public transportation systems, although conceptually similar, is a
significantly harder problem due to its inherent time-dependent and
multicriteria nature. Although exact algorithms are fast enough for interactive
queries on metropolitan transit systems, dealing with continent-sized instances
requires simplifications or heavy preprocessing. The multimodal route planning
problem, which seeks journeys combining schedule-based transportation (buses,
trains) with unrestricted modes (walking, driving), is even harder, relying on
approximate solutions even for metropolitan inputs.Comment: This is an updated version of the technical report MSR-TR-2014-4,
previously published by Microsoft Research. This work was mostly done while
the authors Daniel Delling, Andrew Goldberg, and Renato F. Werneck were at
Microsoft Research Silicon Valle
- …