1,114 research outputs found
Synapse: Synthetic Application Profiler and Emulator
We introduce Synapse motivated by the needs to estimate and emulate workload
execution characteristics on high-performance and distributed heterogeneous
resources. Synapse has a platform independent application profiler, and the
ability to emulate profiled workloads on a variety of heterogeneous resources.
Synapse is used as a proxy application (or "representative application") for
real workloads, with the added advantage that it can be tuned at arbitrary
levels of granularity in ways that are simply not possible using real
applications. Experiments show that automated profiling using Synapse
represents application characteristics with high fidelity. Emulation using
Synapse can reproduce the application behavior in the original runtime
environment, as well as reproducing properties when used in a different
run-time environments
Improved Algorithms for Time Decay Streams
In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions.
We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Sparse Hopsets in Congested Clique
We give the first Congested Clique algorithm that computes a sparse hopset
with polylogarithmic hopbound in polylogarithmic time. Given a graph ,
a -hopset with "hopbound" , is a set of edges
added to such that for any pair of nodes and in there is a path
with at most hops in with length within of
the shortest path between and in .
Our hopsets are significantly sparser than the recent construction of
Censor-Hillel et al. [6], that constructs a hopset of size
, but with a smaller polylogarithmic hopbound. On the other
hand, the previously known constructions of sparse hopsets with polylogarithmic
hopbound in the Congested Clique model, proposed by Elkin and Neiman
[10],[11],[12], all require polynomial rounds.
One tool that we use is an efficient algorithm that constructs an
-limited neighborhood cover, that may be of independent interest.
Finally, as a side result, we also give a hopset construction in a variant of
the low-memory Massively Parallel Computation model, with improved running time
over existing algorithms
Independence in CLP Languages
Studying independence of goals has proven very useful in the context of logic programming. In particular, it has provided a formal basis for powerful automatic parallelization tools, since independence ensures that two goals may be evaluated in parallel while preserving correctness and eciency. We extend the concept of independence to constraint logic programs (CLP) and
prove that it also ensures the correctness and eciency of the parallel evaluation of independent goals. Independence for CLP languages is more complex than for logic programming as search space preservation is necessary but no longer sucient for ensuring correctness and eciency. Two
additional issues arise. The rst is that the cost of constraint solving may depend upon the order constraints are encountered. The second is the need to handle dynamic scheduling. We clarify these issues by proposing various types of search independence and constraint solver independence, and show how they can be combined to allow dierent optimizations, from parallelism to intelligent
backtracking. Sucient conditions for independence which can be evaluated \a priori" at run-time are also proposed. Our study also yields new insights into independence in logic programming languages. In particular, we show that search space preservation is not only a sucient but also a necessary condition for ensuring correctness and eciency of parallel execution
Data management techniques
Today, it is projected that data storage and management is becoming one of the key challenges in order to achieve ultrascale computing for several reasons. First, data is expected to grow exponentially in the coming years and this progression will imply that disruptive technologies will be needed to store large amounts of data and more importantly to access it in a timely manner. Second, the improvement of computing elements and their scalability are shifting application execution from CPU bound to I/O bound. This creates additional challenges for significantly improving the access to data to keep with computation time and thus avoid high-performance computing (HPC) from being underutilized due to large periods of I/O activity. Third, the two initially separate worlds of HPC that mainly consisted on one hand of simulations that are CPU bound and on the other hand of analytics that mainly perform huge data scans to discover information and are I/O bound are blurring. Now, simulations and analytics need to work cooperatively and share the same I/O infrastructure
- …