262 research outputs found
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Formal Derivation of Concurrent Garbage Collectors
Concurrent garbage collectors are notoriously difficult to implement
correctly. Previous approaches to the issue of producing correct collectors
have mainly been based on posit-and-prove verification or on the application of
domain-specific templates and transformations. We show how to derive the upper
reaches of a family of concurrent garbage collectors by refinement from a
formal specification, emphasizing the application of domain-independent design
theories and transformations. A key contribution is an extension to the
classical lattice-theoretic fixpoint theorems to account for the dynamics of
concurrent mutation and collection.Comment: 38 pages, 21 figures. The short version of this paper appeared in the
Proceedings of MPC 201
Making non-volatile memory programmable
Byte-addressable, non-volatile memory (NVM) is emerging as a revolutionary memory technology that provides persistence, near-DRAM performance, and scalable capacity. By using NVM, applications can directly create and manipulate durable data in place without the need for serialization out to SSDs.
Ideally, through NVM, persistent applications will be able to maintain crash-consistency at a minimal cost. However, before this is possible, improvements must be made at both the hardware and software level to support persistent applications. Currently, software support for NVM places too high of a burden on the developer, introducing many opportunities for mistakes while also being too rigid for compiler optimizations. Likewise, at the hardware level, too little information is passed to the processor about the instruction-level ordering requirements of persistent applications; this forces the hardware to require the use of coarse fences, which significantly slow down execution.
To help realize the promise of NVM, this thesis proposes both new software and hardware support that make NVM programmable. From the software side, this thesis proposes a new NVM programming model which relieves the programmer from performing much of the accounting work in persistent applications, instead relying on the runtime to perform error-prone tasks. Specifically, within the proposed model, the user only needs to provide minimal markings to identify the persistent data set and to ensure data is updated in a crash-consistent manner.
Given this new NVM programming model, this thesis next presents an implementation of the model in Java. I call my implementation AutoPersist and build my support into the Maxine research Java Virtual Machine (JVM). In this thesis I describe how the JVM can be changed to support the proposed NVM programming model, including adding new Java libraries, adding new JVM runtime features, and augmenting the behavior of existing Java bytecodes.
In addition to being easy-to-use, another advantage of the proposed model is that it is amenable to compiler optimizations. In this thesis I highlight two profile-guided optimizations: eagerly allocating objects directly into NVM and speculatively pruning control flow to only include expected-to-be taken paths. I also describe how to apply these optimizations to AutoPersist and show they have a substantial performance impact.
While designing AutoPersist, I often observed that dependency information known by the compiler cannot be passed down to the underlying hardware; instead, the compiler must insert coarse-grain fences to enforce needed dependencies. This is because current instruction set architectures (ISA) cannot describe arbitrary instruction-level execution ordering constraints. To fix this limitation, I introduce the Execution Dependency Extension (EDE), and describe how EDE can be added to an existing ISA as well as be implemented in current processor pipelines.
Overall, emerging NVM technologies can deliver programmer-friendly high performance. However, for this to happen, both software and hardware improvements are necessary. This thesis takes steps to address current the software and hardware gaps: I propose new software support to assist in the development of persistent applications and also introduce new instructions which allow for arbitrary instruction-level dependencies to be conveyed and enforced by the underlying hardware. With these improvements, hopefully the dream of programmable high-performance NVM is one step closer to being realized
Timing Sensitive Dependency Analysis and its Application to Software Security
Ich prƤsentiere neue Verfahren zur statischen Analyse von
AusfĆ¼hrungszeit-sensitiver Informationsflusskontrolle in Softwaresystemen.
Ich wende diese Verfahren an zur Analyse nebenlƤufiger Java
Programme, sowie zur Analyse von AusfĆ¼hrungszeit-SeitenkanƤlen in
Implementierungen kryptographischer Primitive.
Methoden der Informationsflusskontrolle zielen darauf ab, Fluss von
Informationen (z.B.: zwischen verschiedenen externen Schnittstellen
einer Software-Komponente) anhand expliziter Richtlinien einzuschrƤnken.
Solche Methoden kƶnnen daher zur Einhaltung sowohl
von Vertraulichkeit als auch IntegritƤt eingesetzt werden. Der Ziel korrekter
statischer Programmanalysen in diesem Umfeld ist der Nachweis,
dass in allen AusfĆ¼hrungen eines gegebenen Programms die zugehƶrigen
Richtlinien eingehalten werden. Ein solcher Nachweis erfordert
ein Sicherheitskriterium, welches formalisiert, unter welchen
Bedingungen dies der Fall ist.
Jedem formalen Sicherheitskriterium entspricht implizit ein
Programm- und Angreifermodell. Einfachste Nichtinterferenz-Kriterien
beschreiben beispielsweise nur nicht-interaktive Programme. Dies
sind Programme die nur bei Beginn und Ende der AusfĆ¼hrung Ein- und
Ausgaben erlauben. Im zugehƶrigen Angreifer-Modell kennt der
Angreifer das Programm, aber beobachtet nur bestimmte (ƶffentliche)
Aus- und Eingaben oder stellt diese bereit. Ein Programm ist nichtinterferent,
wenn der Angreifer aus seinen Beobachtungen keinerlei
RĆ¼ckschlĆ¼sse auf geheime Aus- und Eingaben terminierender AusfĆ¼hrungen
machen kann. Aus nicht-terminierenden AusfĆ¼hrungen
hingegen sind dem Angreifer in diesem Modell Schlussfolgerungen
auf geheime Eingaben erlaubt.
SeitenkanƤle entstehen, wenn einem Angreifer aus Beobachtungen realer
Systeme RĆ¼ckschlĆ¼sse auf vertrauliche Informationen ziehen kann,
welche im formalen Modell unmƶglich sind. Typische SeitenkanƤle
(also: in vielen formalen Sicherheitskriterien unmodelliert) sind neben
Nichttermination beispielsweise auch Energieverbrauch und die AusfĆ¼hrungszeit
von Programmen. HƤngt diese von geheimen Eingaben
ab, so kann ein Angreifer aus der beobachteten AusfĆ¼hrungszeit auf
die Eingabe (z.B.: auf den Wert einzelner geheimer Parameter) schlieĆen.
In meiner Dissertation prƤsentiere ich neue AbhƤngigkeitsanalysen,
die auch Nichtterminations- und AusfĆ¼hrungszeitkanƤle berĆ¼cksichtigen.
In Hinblick auf NichtterminationskanƤle stelle ich neue Verfahren
zur Berechnung von Programm-AbhƤngigkeiten vor. Hierzu entwickle
ich ein vereinheitlichendes Rahmenwerk, in welchem sowohl
Nichttermination-sensitive als auch Nichttermination-insensitive AbhƤngigkeiten
aus zueinander dualen Postdominanz-Begriffen resultieren.
FĆ¼r AusfĆ¼hrungszeitkanƤle entwickle ich neue AbhƤngigkeitsbegriffe
und dazugehƶrige Verfahren zu deren Berechnung. In zwei Anwendungen
untermauere ich die These:
AusfĆ¼hrungszeit-sensitive AbhƤngigkeiten ermƶglichen korrekte statische
Informationsfluss-Analyse unter BerĆ¼cksichtigung von AusfĆ¼hrungszeitkanƤlen.
Basierend auf AusfĆ¼hrungszeit-sensitiven AbhƤngigkeiten entwerfe
ich hierfĆ¼r neue Analysen fĆ¼r nebenlƤufige Programme.
AusfĆ¼hrungszeit-sensitive AbhƤngigkeiten sind dort selbst fĆ¼r
AusfĆ¼hrungszeit-insensitive Angreifermodelle relevant, da dort interne
AusfĆ¼hrungszeitkanƤle zwischen unterschiedlichen AusfĆ¼hrungsfƤden
extern beobachtbar sein kƶnnen. Meine Implementierung fĆ¼r
nebenlƤufige Java Programme basiert auf auf dem Programmanalyse-
System JOANA.
AuĆerdem prƤsentiere ich neue Analysen fĆ¼r AusfĆ¼hrungszeitkanƤle
aufgrund mikro-architektureller AbhƤngigkeiten. Exemplarisch untersuche
ich Implementierungen von AES256 BlockverschlĆ¼sselung. Bei einigen
Implementierungen fĆ¼hren Daten-Caches dazu, dass die AusfĆ¼hrungszeit
abhƤngt von SchlĆ¼ssel und Geheimtext, wodurch diese
aus der AusfĆ¼hrungszeit inferierbar sind. FĆ¼r andere Implementierungen
weist meine automatische statische Analyse (unter Annahme
einer einfachen konkreten Cache-Mikroarchitektur) die Abwesenheit
solcher KanƤle nach
Web and Semantic Web Query Languages
A number of techniques have been developed to facilitate
powerful data retrieval on the Web and Semantic Web. Three categories
of Web query languages can be distinguished, according to the format
of the data they can retrieve: XML, RDF and Topic Maps. This article
introduces the spectrum of languages falling into these categories
and summarises their salient aspects. The languages are introduced using
common sample data and query types. Key aspects of the query
languages considered are stressed in a conclusion
Reasoning & Querying ā State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Subheap-Augmented Garbage Collection
Automated memory management avoids the tedium and danger of manual techniques. However, as no programmer input is required, no widely available interface exists to permit principled control over sometimes unacceptable performance costs. This dissertation explores the idea that performance-oriented languages should give programmers greater control over where and when the garbage collector (GC) expends effort. We describe an interface and implementation to expose heap partitioning and collection decisions without compromising type safety. We show that our interface allows the programmer to encode a form of reference counting using Hayes\u27 notion of key objects. Preliminary experimental data suggests that our proposed mechanism can avoid high overheads suffered by tracing collectors in some scenarios, especially with tight heaps. However, for other applications, the costs of applying subheaps---in human effort and runtime overheads---remain daunting
- ā¦