11 research outputs found

    Recent Advances in Fully Dynamic Graph Algorithms

    Full text link
    In recent years, significant advances have been made in the design and analysis of fully dynamic algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. Here, we present a quick reference guide to recent engineering and theory results in the area of fully dynamic graph algorithms

    JGraphT -- A Java library for graph data structures and algorithms

    Full text link
    Mathematical software and graph-theoretical algorithmic packages to efficiently model, analyze and query graphs are crucial in an era where large-scale spatial, societal and economic network data are abundantly available. One such package is JGraphT, a programming library which contains very efficient and generic graph data-structures along with a large collection of state-of-the-art algorithms. The library is written in Java with stability, interoperability and performance in mind. A distinctive feature of this library is the ability to model vertices and edges as arbitrary objects, thereby permitting natural representations of many common networks including transportation, social and biological networks. Besides classic graph algorithms such as shortest-paths and spanning-tree algorithms, the library contains numerous advanced algorithms: graph and subgraph isomorphism; matching and flow problems; approximation algorithms for NP-hard problems such as independent set and TSP; and several more exotic algorithms such as Berge graph detection. Due to its versatility and generic design, JGraphT is currently used in large-scale commercial, non-commercial and academic research projects. In this work we describe in detail the design and underlying structure of the library, and discuss its most important features and algorithms. A computational study is conducted to evaluate the performance of JGraphT versus a number of similar libraries. Experiments on a large number of graphs over a variety of popular algorithms show that JGraphT is highly competitive with other established libraries such as NetworkX or the BGL.Comment: Major Revisio

    Exploring Multi-Level Parallelism For Graph-Based Applications Via Algorithm And System Co-Design

    Get PDF
    Graph processing is at the heart of many modern applications where graphs are used as the basic data structure to represent the entities of interest and the relationships between them. Improving the performance of graph-based applications, especially using parallelism techniques, has drawn significant interest both in academia and industry. On the one hand, modern CPU architectures are able to provide massive computational power by using sophisticated memory hierarchy and multi-level parallelism, including thread-level parallelism, data-level parallelism, etc. On the other hand, graph processing workloads are notoriously challenging for achieving high performance due to their irregular computation pattern and unpredictable control flow. Therefore, how to accelerate the performance of graph-based applications using parallelism is still an open question. This dissertation focuses on providing high performance for graph-based applications. To take full advantage of multi-level parallelism resources provided by CPUs, this dissertation studies the characteristics of graph-based applications and matches their parallel solutions with the underlying hardware via algorithm and system co-design. This dissertation divides graph-based applications into three categories: typical graph algorithms, sequential graph-based applications, and applications with graph-based solutions. The first category comprises typical graph algorithms with available parallel solutions. This dissertation proposes GraphPhi as a new approach to graph processing on emerging Intel Xeon Phi-like architectures. The second category includes specialized graph applications without nontrivial parallel solutions. This dissertation studies a state-of-the-art 2-hop labeling approach named Pruned Landmark Labeling (PLL). This dissertation proposes Batched Vertex-Centric PLL (BVC-PLL), which breaks PLL\u27s inherent dependencies and parallelizes it in a scalable way. The third category includes applications that rely on graph-based solutions. This dissertation studies the sequential search algorithm for the graph-based indexing methods used for the Approximate Nearest Neighbor Search (ANNS) problem. This dissertation proposes Speed-ANN, a parallel similarity search algorithm that reveals hidden intra-query parallelism to accelerate the search speed while fulfilling the high accuracy requirement. Moreover, this dissertation further explores the optimization opportunities for computational graph-based deep neural network inference running on tiny devices, specifically microcontrollers (MCUs). Altogether, this dissertation studies graph-based applications and improves their performance by providing solutions of multi-level parallelism via algorithm and system co-design to match them with the underlying multi-core CPU architectures

    On The Application Of Computational Modeling To Complex Food Systems Issues

    Get PDF
    Transdisciplinary food systems research aims to merge insights from multiple fields, often revealing confounding, complex interactions. Computational modeling offers a means to discover patterns and formulate novel solutions to such systems-level problems. The best models serve as hubs—or boundary objects—which ground and unify a collaborative, iterative, and transdisciplinary process of stakeholder engagement. This dissertation demonstrates the application of agent-based modeling, network analytics, and evolutionary computational optimization to the pressing food systems problem areas of livestock epidemiology and global food security. It is comprised of a methodological introduction, an executive summary, three journal-article formatted chapters, and an overarching discussion section. Chapter One employs an agent-based computer model (RUSH-PNBM v.1.1) developed to study the potential impact of the trend toward increased producer specialization on resilience to catastrophic epidemics within livestock production chains. In each run, an infection is introduced and may spread according to probabilities associated with the various modes of contact between hog producer, feed mill, and slaughter plant agents. Experimental data reveal that more-specialized systems are vulnerable to outbreaks at lower spatial densities, have more abrupt percolation transitions, and are characterized by less-predictable outcomes; suggesting that reworking network structures may represent a viable means to increase biosecurity. Chapter Two uses a calibrated, spatially-explicit version of RUSH-PNBM (v.1.2) to model the hog production chains within three U.S. states. Key metrics are calculated after each run, some of which pertain to overall network structures, while others describe each actor’s positionality within the network. A genetic programming algorithm is then employed to search for mathematical relationships between multiple individual indicators that effectively predict each node’s vulnerability. This “meta-metric” approach could be applied to aid livestock epidemiologists in the targeting of biosecurity interventions and may also be useful to study a wide range of complex network phenomena. Chapter Three focuses on food insecurity resulting from the projected gap between global food supply and demand over the coming decades. While no single solution has been identified, scholars suggest that investments into multiple interventions may stack together to solve the problem. However, formulating an effective plan of action requires knowledge about the level of change resulting from a given investment into each wedge, the time before that effect unfolds, the expected baseline change, and the maximum possible level of change. This chapter details an evolutionary-computational algorithm to optimize investment schedules according to the twin goals of maximizing global food security and minimizing cost. Future work will involve parameterizing the model through an expert informant advisory process to develop the existing framework into a practicable food policy decision-support tool

    Graph Processing in Main-Memory Column Stores

    Get PDF
    Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes

    On the design of efficient caching systems

    Get PDF
    Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore