47 research outputs found
API Requirements for Dynamic Graph Prediction
Given a large-scale time-evolving multi-modal and multi-relational complex network (a.k.a., a large-scale dynamic semantic graph), we want to implement algorithms that discover patterns of activities on the graph and learn predictive models of those discovered patterns. This document outlines the application programming interface (API) requirements for fast prototyping of feature extraction, learning, and prediction algorithms on large dynamic semantic graphs. Since our algorithms must operate on large-scale dynamic semantic graphs, we have chosen to use the graph API developed in the CASC Complex Networks Project. This API is supported on the back end by a semantic graph database (developed by Scott Kohn and his team). The advantages of using this API are (i) we have full-control of its development and (ii) the current API meets almost all of the requirements outlined in this document
Recommended from our members
Leveraging Structure to Improve Classification Performance in Sparsely Labeled Networks
We address the problem of classification in a partially labeled network (a.k.a. within-network classification), with an emphasis on tasks in which we have very few labeled instances to start with. Recent work has demonstrated the utility of collective classification (i.e., simultaneous inferences over class labels of related instances) in this general problem setting. However, the performance of collective classification algorithms can be adversely affected by the sparseness of labels in real-world networks. We show that on several real-world data sets, collective classification appears to offer little advantage in general and hurts performance in the worst cases. In this paper, we explore a complimentary approach to within-network classification that takes advantage of network structure. Our approach is motivated by the observation that real-world networks often provide a great deal more structural information than attribute information (e.g., class labels). Through experiments on supervised and semi-supervised classifiers of network data, we demonstrate that a small number of structural features can lead to consistent and sometimes dramatic improvements in classification performance. We also examine the relative utility of individual structural features and show that, in many cases, it is a combination of both local and global network structure that is most informative
A Collection of Features for Semantic Graphs
Semantic graphs are commonly used to represent data from one or more data sources. Such graphs extend traditional graphs by imposing types on both nodes and links. This type information defines permissible links among specified nodes and can be represented as a graph commonly referred to as an ontology or schema graph. Figure 1 depicts an ontology graph for data from National Association of Securities Dealers. Each node type and link type may also have a list of attributes. To capture the increased complexity of semantic graphs, concepts derived for standard graphs have to be extended. This document explains briefly features commonly used to characterize graphs, and their extensions to semantic graphs. This document is divided into two sections. Section 2 contains the feature descriptions for static graphs. Section 3 extends the features for semantic graphs that vary over time
Link homophily in the application layer and its usage in traffic classification
Abstract-This paper addresses the following questions. Is there link homophily in the application layer traffic? If so, can it be used to accurately classify traffic in network trace data without relying on payloads or properties at the flow level? Our research shows that the answers to both of these questions are affirmative in real network trace data. Specifically, we define link homophily to be the tendency for flows with common IP hosts to have the same application (P2P, Web, etc.) compared to randomly selected flows. The presence of link homophily in trace data provides us with statistical dependencies between flows that share common IP hosts. We utilize these dependencies to classify application layer traffic without relying on payloads or properties at the flow level. In particular, we introduce a new statistical relational learning algorithm, called Neighboring Link Classifier with Relaxation Labeling (NLC+RL). Our algorithm has no training phase and does not require features to be constructed. All that it needs to start the classification process is traffic information on a small portion of the initial flows, which we refer to as seeds. In all our traces, NLC+RL achieves above 90% accuracy with less than 5% seed size; it is robust to errors in the seeds and various seed-selection biases; and it is able to accurately classify challenging traffic such as P2P with over 90% Precision and Recall
What science can do for democracy – A complexity science approach
Political scientists have conventionally assumed that achieving democracy is a one-way ratchet. Only very recently has the question of ‘democratic backsliding’ attracted any research attention. We argue that democratic instability is best understood with tools from complexity science. The explanatory power of complexity science arises from several features of complex systems. Their relevance in the context of democracy is discussed. Several policy recommendations are offered to help (re)stabilize current systems of representative democracy
Recommended from our members
Data Sciences Technology for Homeland Security Information Management and Knowledge Discovery
The Department of Homeland Security (DHS) has vast amounts of data available, but its ultimate value cannot be realized without powerful technologies for knowledge discovery to enable better decision making by analysts. Past evidence has shown that terrorist activities leave detectable footprints, but these footprints generally have not been discovered until the opportunity for maximum benefit has passed. The challenge faced by the DHS is to discover the money transfers, border crossings, and other activities in advance of an attack and use that information to identify potential threats and vulnerabilities. The data to be analyzed by DHS comes from many sources ranging from news feeds, to raw sensors, to intelligence reports, and more. The amount of data is staggering; some estimates place the number of entities to be processed at 1015. The uses for the data are varied as well, including entity tracking over space and time, identifying complex and evolving relationships between entities, and identifying organization structure, to name a few. Because they are ideal for representing relationship and linkage information, semantic graphs have emerged as a key technology for fusing and organizing DHS data. A semantic graph organizes relational data by using nodes to represent entities and edges to connect related entities. Hidden relationships in the data are then uncovered by examining the structure and properties of the semantic graph
Link prediction in complex networks: a local na\"{\i}ve Bayes model
Common-neighbor-based method is simple yet effective to predict missing
links, which assume that two nodes are more likely to be connected if they have
more common neighbors. In such method, each common neighbor of two nodes
contributes equally to the connection likelihood. In this Letter, we argue that
different common neighbors may play different roles and thus lead to different
contributions, and propose a local na\"{\i}ve Bayes model accordingly.
Extensive experiments were carried out on eight real networks. Compared with
the common-neighbor-based methods, the present method can provide more accurate
predictions. Finally, we gave a detailed case study on the US air
transportation network.Comment: 6 pages, 2 figures, 2 table