402 research outputs found
Synthesis, Interdiction, and Protection of Layered Networks
This research developed the foundation, theory, and framework for a set of analysis techniques to assist decision makers in analyzing questions regarding the synthesis, interdiction, and protection of infrastructure networks. This includes extension of traditional network interdiction to directly model nodal interdiction; new techniques to identify potential targets in social networks based on extensions of shortest path network interdiction; extension of traditional network interdiction to include layered network formulations; and develops models/techniques to design robust layered networks while considering trade-offs with cost. These approaches identify the maximum protection/disruption possible across layered networks with limited resources, find the most robust layered network design possible given the budget limitations while ensuring that the demands are met, include traditional social network analysis, and incorporate new techniques to model the interdiction of nodes and edges throughout the formulations. In addition, the importance and effects of multiple optimal solutions for these (and similar) models is investigated. All the models developed are demonstrated on notional examples and were tested on a range of sample problem sets
SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System
Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs
On Improving Distributed Pregel-like Graph Processing Systems
The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from two major issues: (1) poor performance due to frequent global synchronization barriers and limited scalability; and (2) lack of support for graph algorithms that require serializability, the guarantee that parallel executions of an algorithm produce the same results as some serial execution of that algorithm.
Many graph processing systems use the bulk synchronous parallel (BSP) model, which allows graph algorithms to be easily implemented and reasoned about. However, BSP suffers from poor performance due to stale messages and frequent global synchronization barriers. While asynchronous models have been proposed to alleviate these overheads, existing systems that implement such models have limited scalability or retain frequent global barriers and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that overcomes the limitations of existing asynchronous models by reducing both message staleness and global synchronization while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate it at scale to demonstrate that BAP provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously.
Secondly, very few systems provide serializability, despite the fact that many graph algorithms require it for accuracy, correctness, or termination. To address this deficiency, we provide a complete solution that can be implemented on top of existing graph processing systems to provide serializability. Our solution formalizes the notion of serializability and the conditions under which it can be provided for graph processing systems. We propose a partition-based synchronization technique that enforces these conditions efficiently to provide serializability. We implement this technique into Giraph and GiraphUC to demonstrate that it is configurable, transparent to algorithm developers, and more performant than existing techniques.4 month
Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System
Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra.
This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes.
We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists
Recommended from our members
Investigating the performance of transport infrastructure using real-time data and a scalable multi-modal agent based model
The idea that including more information in more dynamic and iterative ways is central to the promise of the big data paradigm. The hope is that via new data sources, such as remote sensors and mobile phones, the reliance on heavily simplified generalised functions for model inputs will be erased. This trade between idealised and actual empirical data will be matched with dynamic models which consider complexity at a fundamental level, inherently mirroring the systems they are attempting to replicate. Cloud computing brings the possibility of doing all of this, in less time than the simplified macro models of the past, thus enabling better answers and at the time of critical decision making junctures.
This research was task driven - the question of high speed rail versus aviation led to an investigation into the simplifications and assumptions that back up many of the commonly held beliefs on the sustainability of different modes of transport. The literature ultimately highlighted the need for context specific information; actual load factors, actual journey times considering traffic/engineering works and so on.
Thus, rather than being explicitly an exercise in answering a specific question, a specific question was used to drive the development of a tool which may hold promise for answering a range of transportation related questions. The original contributions of this work are, firstly the use of real-time data sources to quantify temporally and spatially dynamic network performance metrics (eg. journey times on different transport models) and secondly to organise these data sources in a framework which can handle the volume and type of the data and organise the data in a way so that it is useful for the dynamic agent based modelling of future scenarios.EPSRC I Case Studentship with Ove Arup & Partner
Data preparation and visualization for the SWAN refraction model
Includes bibliographical references.This research and development project seeks to provide a usable interactive graphical interface to an environment that otherwise involves primarily numerical data in a static, non-interactive format. Tools will be developed that enable users to prepare numerical data required for the SWAN refraction model and to visualize the results in an interactivie three-dimensional graphical context. SWAN (acronym for Simulating Waves Near shore) is a numerical wave model that is used to predict wave parameters according to a given set of conditions. The design of the 2-D and 3-D graphical interfaces and their impact on the system will be discussed
- …