46 research outputs found

    Sequence queries on temporal graphs

    Get PDF
    Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Recommender Systems

    Get PDF
    The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports

    On designing large, secure and resilient networked systems

    Get PDF
    2019 Summer.Includes bibliographical references.Defending large networked systems against rapidly evolving cyber attacks is challenging. This is because of several factors. First, cyber defenders are always fighting an asymmetric warfare: While the attacker needs to find just a single security vulnerability that is unprotected to launch an attack, the defender needs to identify and protect against all possible avenues of attacks to the system. Various types of cost factors, such as, but not limited to, costs related to identifying and installing defenses, costs related to security management, costs related to manpower training and development, costs related to system availability, etc., make this asymmetric warfare even challenging. Second, newer and newer cyber threats are always emerging - the so called zero-day attacks. It is not possible for a cyber defender to defend against an attack for which defenses are yet unknown. In this work, we investigate the problem of designing large and complex networks that are secure and resilient. There are two specific aspects of the problem that we look into. First is the problem of detecting anomalous activities in the network. While this problem has been variously investigated, we address the problem differently. We posit that anomalous activities are the result of mal-actors interacting with non mal-actors, and such anomalous activities are reflected in changes to the topological structure (in a mathematical sense) of the network. We formulate this problem as that of Sybil detection in networks. For our experimentation and hypothesis testing we instantiate the problem as that of Sybil detection in on-line social networks (OSNs). Sybil attacks involve one or more attackers creating and introducing several mal-actors (fake identities in on-line social networks), called Sybils, into a complex network. Depending on the nature of the network system, the goal of the mal-actors can be to unlawfully access data, to forge another user's identity and activity, or to influence and disrupt the normal behavior of the system. The second aspect that we look into is that of building resiliency in a large network that consists of several machines that collectively provide a single service to the outside world. Such networks are particularly vulnerable to Sybil attacks. While our Sybil detection algorithms achieve very high levels of accuracy, they cannot guarantee that all Sybils will be detected. Thus, to protect against such "residual" Sybils (that is, those that remain potentially undetected and continue to attack the network services), we propose a novel Moving Target Defense (MTD) paradigm to build resilient networks. The core idea is that for large enterprise level networks, the survivability of the network's mission is more important than the security of one or more of the servers. We develop protocols to re-locate services from server to server in a random way such that before an attacker has an opportunity to target a specific server and disrupt it’s services, the services will migrate to another non-malicious server. The continuity of the service of the large network is thus sustained. We evaluate the effectiveness of our proposed protocols using theoretical analysis, simulations, and experimentation. For the Sybil detection problem we use both synthetic and real-world data sets. We evaluate the algorithms for accuracy of Sybil detection. For the moving target defense protocols we implement a proof-of-concept in the context of access control as a service, and run several large scale simulations. The proof-of- concept demonstrates the effectiveness of the MTD paradigm. We evaluate the computation and communication complexity of the protocols as we scale up to larger and larger networks

    Clustering sequence graphs

    Get PDF
    In application domains ranging from social networks to e-commerce, it is important to cluster users with respect to both their relationships (e.g., friendship or trust) and their actions (e.g., visited locations or rated products). Motivated by these applications, we introduce here the task of clustering the nodes of a sequence graph, i.e., a graph whose nodes are labeled with strings (e.g., sequences of users’ visited locations or rated products). Both string clustering algorithms and graph clustering algorithms are inappropriate to deal with this task, as they do not consider the structure of strings and graph simultaneously. Moreover, attributed graph clustering algorithms generally construct poor solutions because they need to represent a string as a vector of attributes, which inevitably loses information and may harm clustering quality. We thus introduce the problem of clustering a sequence graph. We first propose two pairwise distance measures for sequence graphs, one based on edit distance and shortest path distance and another one based on SimRank. We then formalize the problem under each measure, showing also that it is NP-hard. In addition, we design a polynomial-time 2-approximation algorithm, as well as a heuristic for the problem. Experiments using real datasets and a case study demonstrate the effectiveness and efficiency of our methods

    Reconstructing networks

    Get PDF
    Complex networks datasets often come with the problem of missing information: interactions data that have not been measured or discovered, may be affected by errors, or are simply hidden because of privacy issues. This Element provides an overview of the ideas, methods and techniques to deal with this problem and that together define the field of network reconstruction. Given the extent of the subject, we shall focus on the inference methods rooted in statistical physics and information theory. The discussion will be organized according to the different scales of the reconstruction task, that is, whether the goal is to reconstruct the macroscopic structure of the network, to infer its mesoscale properties, or to predict the individual microscopic connections.Comment: 107 pages, 25 figure

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Abstract-Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required
    corecore