6,502 research outputs found
BigDataBench: a Big Data Benchmark Suite from Internet Services
As architecture, systems, and data management communities pay greater
attention to innovative big data systems and architectures, the pressure of
benchmarking and evaluating these systems rises. Considering the broad use of
big data systems, big data benchmarks must include diversity of data and
workloads. Most of the state-of-the-art big data benchmarking efforts target
evaluating specific types of applications or system software stacks, and hence
they are not qualified for serving the purposes mentioned above. This paper
presents our joint research efforts on this issue with several industrial
partners. Our big data benchmark suite BigDataBench not only covers broad
application scenarios, but also includes diverse and representative data sets.
BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench .
Also, we comprehensively characterize 19 big data workloads included in
BigDataBench with varying data inputs. On a typical state-of-practice
processor, Intel Xeon E5645, we have the following observations: First, in
comparison with the traditional benchmarks: including PARSEC, HPCC, and
SPECCPU, big data applications have very low operation intensity; Second, the
volume of data input has non-negligible impact on micro-architecture
characteristics, which may impose challenges for simulation-based big data
architecture research; Last but not least, corroborating the observations in
CloudSuite and DCBench (which use smaller data inputs), we find that the
numbers of L1 instruction cache misses per 1000 instructions of the big data
applications are higher than in the traditional benchmarks; also, we find that
L3 caches are effective for the big data applications, corroborating the
observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High
Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando,
Florida, US
Cache Serializability: Reducing Inconsistency in Edge Transactions
Read-only caches are widely used in cloud infrastructures to reduce access
latency and load on backend databases. Operators view coherent caches as
impractical at genuinely large scale and many client-facing caches are updated
in an asynchronous manner with best-effort pipelines. Existing solutions that
support cache consistency are inapplicable to this scenario since they require
a round trip to the database on every cache transaction.
Existing incoherent cache technologies are oblivious to transactional data
access, even if the backend database supports transactions. We propose T-Cache,
a novel caching policy for read-only transactions in which inconsistency is
tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache
improves cache consistency despite asynchronous and unreliable communication
between the cache and the database. We define cache-serializability, a variant
of serializability that is suitable for incoherent caches, and prove that with
unbounded resources T-Cache implements this new specification. With limited
resources, T-Cache allows the system manager to choose a trade-off between
performance and consistency.
Our evaluation shows that T-Cache detects many inconsistencies with only
nominal overhead. We use synthetic workloads to demonstrate the efficacy of
T-Cache when data accesses are clustered and its adaptive reaction to workload
changes. With workloads based on the real-world topologies, T-Cache detects
43-70% of the inconsistencies and increases the rate of consistent transactions
by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability:
Reducing Inconsistency in Edge Transactions," Distributed Computing Systems
(ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201
A Guide to Distributed Digital Preservation
This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways."--P. [4] of cove
Basis Token Consistency: A Practical Mechanism for Strong Web Cache Consistency
With web caching and cache-related services like CDNs and edge services playing an increasingly significant role in the modern internet, the problem of the weak consistency and coherence provisions in current web protocols is becoming increasingly significant and drawing the attention of the standards community [LCD01]. Toward this end, we present definitions of consistency and coherence for web-like environments, that is, distributed client-server information systems where the semantics of interactions with resource are more general than the read/write operations found in memory hierarchies and distributed file systems. We then present a brief review of proposed mechanisms which strengthen the consistency of caches in the web, focusing upon their conceptual contributions and their weaknesses in real-world practice. These insights motivate a new mechanism, which we call "Basis Token Consistency" or BTC; when implemented at the server, this mechanism allows any client (independent of the presence and conformity of any intermediaries) to maintain a self-consistent view of the server's state. This is accomplished by annotating responses with additional per-resource application information which allows client caches to recognize the obsolescence of currently cached entities and identify responses from other caches which are already stale in light of what has already been seen. The mechanism requires no deviation from the existing client-server communication model, and does not require servers to maintain any additional per-client state. We discuss how our mechanism could be integrated into a fragment-assembling Content Management System (CMS), and present a simulation-driven performance comparison between the BTC algorithm and the use of the Time-To-Live (TTL) heuristic.National Science Foundation (ANI-9986397, ANI-0095988
Prioritized Garbage Collection: Explicit GC Support for Software Caches
Programmers routinely trade space for time to increase performance, often in
the form of caching or memoization. In managed languages like Java or
JavaScript, however, this space-time tradeoff is complex. Using more space
translates into higher garbage collection costs, especially at the limit of
available memory. Existing runtime systems provide limited support for
space-sensitive algorithms, forcing programmers into difficult and often
brittle choices about provisioning.
This paper presents prioritized garbage collection, a cooperative programming
language and runtime solution to this problem. Prioritized GC provides an
interface similar to soft references, called priority references, which
identify objects that the collector can reclaim eagerly if necessary. The key
difference is an API for defining the policy that governs when priority
references are cleared and in what order. Application code specifies a priority
value for each reference and a target memory bound. The collector reclaims
references, lowest priority first, until the total memory footprint of the
cache fits within the bound. We use this API to implement a space-aware
least-recently-used (LRU) cache, called a Sache, that is a drop-in replacement
for existing caches, such as Google's Guava library. The garbage collector
automatically grows and shrinks the Sache in response to available memory and
workload with minimal provisioning information from the programmer. Using a
Sache, it is almost impossible for an application to experience a memory leak,
memory pressure, or an out-of-memory crash caused by software caching.Comment: to appear in OOPSLA 201
- …