411,613 research outputs found
Algorithm Diversity for Resilient Systems
Diversity can significantly increase the resilience of systems, by reducing
the prevalence of shared vulnerabilities and making vulnerabilities harder to
exploit. Work on software diversity for security typically creates variants of
a program using low-level code transformations. This paper is the first to
study algorithm diversity for resilience. We first describe how a method based
on high-level invariants and systematic incrementalization can be used to
create algorithm variants. Executing multiple variants in parallel and
comparing their outputs provides greater resilience than executing one variant.
To prevent different parallel schedules from causing variants' behaviors to
diverge, we present a synchronized execution algorithm for DistAlgo, an
extension of Python for high-level, precise, executable specifications of
distributed algorithms. We propose static and dynamic metrics for measuring
diversity. An experimental evaluation of algorithm diversity combined with
implementation-level diversity for several sequential algorithms and
distributed algorithms shows the benefits of algorithm diversity
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning
This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods
ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms
Connected components is a fundamental kernel in graph applications due to its
usefulness in measuring how well-connected a graph is, as well as its use as
subroutines in many other graph algorithms. The fastest existing parallel
multicore algorithms for connectivity are based on some form of edge sampling
and/or linking and compressing trees. However, many combinations of these
design choices have been left unexplored. In this paper, we design the
ConnectIt framework, which provides different sampling strategies as well as
various tree linking and compression schemes. ConnectIt enables us to obtain
several hundred new variants of connectivity algorithms, most of which extend
to computing spanning forest. In addition to static graphs, we also extend
ConnectIt to support mixes of insertions and connectivity queries in the
concurrent setting.
We present an experimental evaluation of ConnectIt on a 72-core machine,
which we believe is the most comprehensive evaluation of parallel connectivity
algorithms to date. Compared to a collection of state-of-the-art static
multicore algorithms, we obtain an average speedup of 37.4x (2.36x average
speedup over the fastest existing implementation for each graph). Using
ConnectIt, we are able to compute connectivity on the largest
publicly-available graph (with over 3.5 billion vertices and 128 billion edges)
in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the
fastest existing connectivity result for this graph, in any computational
setting. For our incremental algorithms, we show that our algorithms can ingest
graph updates at up to several billion edges per second. Finally, to guide the
user in selecting the best variants in ConnectIt for different situations, we
provide a detailed analysis of the different strategies in terms of their work
and locality
Sysinformatics & Systems Mimicry: New Fields Emerging from a “Science” of Systems Processes Engineering
AbstractThis paper gives an overview of the Systems Processes-Patterns/Systems Pathology (SP3) project of the INCOSE SSWG (Systems Science Working Group) because we see two new specialties arising from that research program. The project is based on research conducted over the last 300 years on natural systems in the seven natural sciences (astronomy, physics, chemistry, geology, biology, math and computers).1 It seeks to unify that knowledge with systems thinking from human and social systems research. The resulting integrated knowledge base (KB) is overwhelming, even when abstracted to only those 50 principles and pathologies that tell us something experimentally about how natural systems work or don’t work. The SP3 project collects facts in 25 categories (each described in this paper) and hundreds of Linkage Propositions (LPs) that explain how 55 systems processes influence each other to maintain systems health. We suggest a new approach called sysinformatics modeled after the bioinformatics courses that today prepare workers for genetics and systems biology. We need corresponding courses in sysinformatics to prepare workers in systems science, systems engineering, computer-based modeling, and new fields such as sustainability studies. The need for bioinformatics resulted from a range of advances in new experimental tools and techniques that led to terabytes of data that required meaningful analysis to bring from the lab bench to the hospital bed. Similarly, sysinformatics would attempt to make more meaningful the mountains of data and relations resulting from a new science of systems. It will require active participation of the computer sciences, engineering specialties, and deep mathematics. Overall, it would help in understanding of how systems processes work and don’t work. Sysinformatics would be a rigorously transdisciplinary field aimed at the discovery, storage, retrieval, organization, classification, and analysis of systems science data. It would stimulate discovery of new tools and algorithms for analysis of data, encourage invention of new ways to present and display data in ways more easily or intuitively meaningful to humans (like the Circos maps of genomics that interpret many different whole genome changes). Sysinformatics would add data mining, AI, simulation image processing, a range of algorithms, new companies and computerized tools to the toolboxes of practicing systems engineers. This paper describes initial similarities and differences between bioinformatics and sysinformatics. It also introduces “systems mimicry” as a potential new specialty and suggests similarities and differences between biomimicry and systems mimicry
Rapid Visual Categorization is not Guided by Early Salience-Based Selection
The current dominant visual processing paradigm in both human and machine
research is the feedforward, layered hierarchy of neural-like processing
elements. Within this paradigm, visual saliency is seen by many to have a
specific role, namely that of early selection. Early selection is thought to
enable very fast visual performance by limiting processing to only the most
salient candidate portions of an image. This strategy has led to a plethora of
saliency algorithms that have indeed improved processing time efficiency in
machine algorithms, which in turn have strengthened the suggestion that human
vision also employs a similar early selection strategy. However, at least one
set of critical tests of this idea has never been performed with respect to the
role of early selection in human vision. How would the best of the current
saliency models perform on the stimuli used by experimentalists who first
provided evidence for this visual processing paradigm? Would the algorithms
really provide correct candidate sub-images to enable fast categorization on
those same images? Do humans really need this early selection for their
impressive performance? Here, we report on a new series of tests of these
questions whose results suggest that it is quite unlikely that such an early
selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
A Running Time Improvement for Two Thresholds Two Divisors Algorithm
Chunking algorithms play an important role in data de-duplication systems. The Basic Sliding Window (BSW) algorithm is the first prototype of the content-based chunking algorithm which can handle most types of data. The Two Thresholds Two Divisors (TTTD) algorithm was proposed to improve the BSW algorithm in terms of controlling the variations of the chunk-size. In this project, we investigate and compare the BSW algorithm and TTTD algorithm from different factors by a series of systematic experiments. Up to now, no paper conducts these experimental evaluations for these two algorithms. This is the first value of this paper. According to our analyses and the results of experiments, we provide a running time improvement for the TTTD algorithm. Our new solution reduces about 7 % of the total running time and also reduces about 50 % of the large-sized chunks while comparing with the original TTTD algorithm and make average chunk-size closer to the expected chunk-size. These significant results are the second important value of this project
- …