22 research outputs found
Algoritme penggantian cache proxy terdistribusi untuk meningkatkan kinerja server web
The performance of web processing needs to increase to meet the growth of internet usage, one of which is by using cache on the web proxy server. This study examines the implementation of the proxy cache replacement algorithm to increase cache hits in the proxy server. The study was conducted by creating a clustered or distributed web server system using eight web server nodes. The system was able to provide increased latency by 90 % better and increased throughput of 5.33 times better.Kinerja pemrosesan web perlu meningkat untuk memenuhi pertumbuhan penggunaan internet, salah satunya dengan menggunakan cache pada server proxy web. Penelitian ini mengkaji implementasi algoritme penggantian cache proxy untuk meningkatkan cache hit dalam server proxy. Penelitian dilakukan dengan membuat sistem web server secara cluster atau terdistribusi dengan menggunakan delapan buah node web server. Sistem menghasilkan peningkatan latensi sebesar 90 % lebih baik dan peningkatan throughput sebesar 5,33 kali lebih baik
On the design of efficient caching systems
Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems
High-performance and hardware-aware computing: proceedings of the first International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2708)
The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach
Flashing up the storage hierarchy
The focus of this thesis is on systems that employ both flash and magnetic disks as
storage media. Considering the widely disparate I/O costs of flash disks currently on
the market, our approach is a cost-aware one: we explore techniques that exploit the
I/O costs of the underlying storage devices to improve I/O performance. We also study
the asymmetric I/O properties of magnetic and flash disks and propose algorithms that
take advantage of this asymmetry. Our work is geared towards database systems; however,
most of the ideas presented in this thesis can be generalised to any data-intensive
application.
For the case of low-end, inexpensive flash devices with large capacities, we propose
using them at the same level of the memory hierarchy as magnetic disks. In such
setups, we study the problem of data placement, that is, on which type of storage
medium each data page should be stored. We present a family of online algorithms that
can be used to dynamically decide the optimal placement of each page. Our algorithms
adapt to changing workloads for maximum I/O efficiency. We found that substantial
performance benefits can be gained with such a design, especially for queries touching
large sets of pages with read-intensive workloads.
Moving one level higher in the storage hierarchy, we study the problem of buffer
allocation in databases that store data across multiple storage devices. We present our
novel approach to per-device memory allocation, under which both the I/O costs of the
storage devices and the cache behaviour of the data stored on each medium determine
the size of the main memory buffers that will be allocated to each device. Towards
informed decisions, we found that the ability to predict the cache behaviour of devices
under various cache sizes is of paramount importance. In light of this, we study the
problem of efficiently tracking the hit ratio curve for each device and introduce a lowoverhead
technique that provides high accuracy.
The price and performance characteristics of high-end flash disks make them perfectly
suitable for use as caches between the main memory and the magnetic disk(s)
of a storage system. In this context, we primarily focus on the problem of deciding
which data should be placed in the flash cache of a system: how the data flows from
one level of the memory hierarchy to the others is crucial for the performance of such a
system. Considering such decisions, we found that the I/O costs of the flash cache play
a major role. We also study several implementation issues such as the optimal size of
flash pages and the properties of the page directory of a flash cache.
Finally, we explore sorting in external memory using external merge-sort, as the
latter employs access patterns that can take full advantage of the I/O characteristics of
flash memory. We study the problem of sorting hierarchical data, as such is necessary
for a wide variety of applications including archiving scientific data and dealing with
large XML datasets. The proposed algorithm efficiently exploits the hierarchical structure
in order to minimize the number of disk accesses and optimise the utilization of
available memory. Our proposals are not specific to sorting over flash memory: the
presented techniques are highly efficient over magnetic disks as well
Algorithms incorporating concurrency and caching
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 189-203).This thesis describes provably good algorithms for modern large-scale computer systems, including today's multicores. Designing efficient algorithms for these systems involves overcoming many challenges, including concurrency (dealing with parallel accesses to the same data) and caching (achieving good memory performance.) This thesis includes two parallel algorithms that focus on testing for atomicity violations in a parallel fork-join program. These algorithms augment a parallel program with a data structure that answers queries about the program's structure, on the fly. Specifically, one data structure, called SP-ordered-bags, maintains the series-parallel relationships among threads, which is vital for uncovering race conditions (bugs) in the program. Another data structure, called XConflict, aids in detecting conflicts in a transactional-memory system with nested parallel transactions. For a program with work T and span To, maintaining either data structure adds an overhead of PT, to the running time of the parallel program when executed on P processors using an efficient scheduler, yielding a total runtime of O(T1/P + PTo). For each of these data structures, queries can be answered in 0(1) time. This thesis also introduces the compressed sparse rows (CSB) storage format for sparse matrices, which allows both Ax and ATx to be computed efficiently in parallel, where A is an n x n sparse matrix with nnz > n nonzeros and x is a dense n-vector. The parallel multiplication algorithm uses e(nnz) work and ... span, yielding a parallelism of ... , which is amply high for virtually any large matrix.(cont.) Also addressing concurrency, this thesis considers two scheduling problems. The first scheduling problem, motivated by transactional memory, considers randomized backoff when jobs have different lengths. I give an analysis showing that binary exponential backoff achieves makespan V2e(6v 1- i ) with high probability, where V is the total length of all n contending jobs. This bound is significantly larger than when jobs are all the same size. A variant of exponential backoff, however, achieves makespan of ... with high probability. I also present the size-hashed backoff protocol, specifically designed for jobs having different lengths, that achieves makespan ... with high probability. The second scheduling problem considers scheduling n unit-length jobs on m unrelated machines, where each job may fail probabilistically. Specifically, an input consists of a set of n jobs, a directed acyclic graph G describing the precedence constraints among jobs, and a failure probability qij for each job j and machine i. The goal is to find a schedule that minimizes the expected makespan. I give an O(log log(min {m, n}))-approximation for the case of independent jobs (when there are no precedence constraints) and an O(log(n + m) log log(min {m, n}))-approximation algorithm when precedence constraints form disjoint chains. This chain algorithm can be extended into one that supports precedence constraints that are trees, which worsens the approximation by another log(n) factor. To address caching, this thesis includes several new variants of cache-oblivious dynamic dictionaries.(cont.) A cache-oblivious dictionary fills the same niche as a classic B-tree, but it does so without tuning for particular memory parameters. Thus, cache-oblivious dictionaries optimize for all levels of a multilevel hierarchy and are more portable than traditional B-trees. I describe how to add concurrency to several previously existing cache-oblivious dictionaries. I also describe two new data structures that achieve significantly cheaper insertions with a small overhead on searches. The cache-oblivious lookahead array (COLA) supports insertions/deletions and searches in O((1/B) log N) and O(log N) memory transfers, respectively, where B is the block size, M is the memory size, and N is the number of elements in the data structure. The xDict supports these operations in O((1/1B E1-) logB(N/M)) and O((1/)0logB(N/M)) memory transfers, respectively, where 0 < E < 1 is a tunable parameter. Also on caching, this thesis answers the question: what is the worst possible page-replacement strategy? The goal of this whimsical chapter is to devise an online strategy that achieves the highest possible fraction of page faults / cache misses as compared to the worst offline strategy. I show that there is no deterministic strategy that is competitive with the worst offline. I also give a randomized strategy based on the most recently used heuristic and show that it is the worst possible pagereplacement policy. On a more serious note, I also show that direct mapping is, in some sense, a worst possible page-replacement policy. Finally, this thesis includes a new algorithm, following a new approach, for the problem of maintaining a topological ordering of a dag as edges are dynamically inserted.(cont.) The main result included here is an O(n2 log n) algorithm for maintaining a topological ordering in the presence of up to m < n(n - 1)/2 edge insertions. In contrast, the previously best algorithm has a total running time of O(min { m3/ 2, n5/2 }). Although these algorithms are not parallel and do not exhibit particularly good locality, some of the data structural techniques employed in my solution are similar to others in this thesis.by Jeremy T. Fineman.Ph.D
Spacelab data management subsystem phase B study
The Spacelab data management system is described. The data management subsystem (DMS) integrates the avionics equipment into an operational system by providing the computations, logic, signal flow, and interfaces needed to effectively command, control, monitor, and check out the experiment and subsystem hardware. Also, the DMS collects/retrieves experiment data and other information by recording and by command of the data relay link to ground. The major elements of the DMS are the computer subsystem, data acquisition and distribution subsystem, controls and display subsystem, onboard checkout subsystem, and software. The results of the DMS portion of the Spacelab Phase B Concept Definition Study are analyzed
Universal Database System Analysis for Insight and Adaptivity
Database systems are ubiquitous; they serve as the cornerstone of modern
application infrastructure due to their efficient data access and
storage. Database systems are commonly deployed in a wide range of environments,
from transaction processing to analytics.
Unfortunately, this broad support comes with a trade-off in system
complexity. Database systems contain many components and features that
must work together to meet client demand. Administrators responsible
for maintaining database systems face a daunting task: they must
determine the access characteristics of the client workload they are
serving and tailor the system to optimize for
it. Complicating matters, client workloads are known to shift in
access patterns and load. Thus, administrators continuously
perform this optimization task, refining system design and
configuration to meet ever-changing client request patterns.
Researchers have focused on creating next-generation, natively adaptive database systems to
address this administrator burden. Natively adaptive database systems construct
client-request models, determine workload characteristics, and tailor
processing strategies to optimize accordingly. These systems
continuously refine their models, ensuring they are responsive to
workload shifts. While these new systems show promise in adapting system
behaviour to their environment, existing, popularly-used database systems
lack these adaptive capabilities. Porting the ideas in these new
adaptive systems to existing infrastructure requires monumental
engineering effort, slowing their adoption and leaving users stranded
with their existing, non-adaptive database systems.
In this thesis, I present Dendrite, a framework that easily
``bolts on'' to existing database systems to endow them with adaptive
capabilities. Dendrite captures database system behaviour in
a system-agnostic fashion, ensuring that its techniques are
generalizable. It compares captured behaviour to determine
how system behaviour changes over time and with respect to idealized
system performance. These differences are matched against
configurable adaption rules, which deploy user-defined
functions to remedy performance problems. As such, Dendrite can
deploy whatever adaptions are necessary to address a behaviour shift
and tailor the system to the workload at hand. Dendrite
has low tracking overhead, making it practical for intensive database
system deployments
Mobile computing with the Rover Toolkit
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaves 138-147).by Anthony Douglas Joseph.Ph.D
Database and System Design for Emerging Storage Technologies
Emerging storage technologies offer an alternative to disk that is durable and allows faster data access. Flash memory, made popular by mobile devices, provides block access with low latency random reads. New nonvolatile memories (NVRAM) are expected in upcoming years, presenting DRAM-like performance alongside persistent storage. Whereas both technologies accelerate data accesses due to increased raw speed, used merely as disk replacements they may fail to achieve their full potentials. Flash’s asymmetric read/write access (i.e., reads execute faster than writes opens new opportunities to optimize Flash-specific access. Similarly, NVRAM’s low latency persistent accesses allow new designs for high performance failure-resistant applications.
This dissertation addresses software and hardware system design for such storage technologies. First, I investigate analytics query optimization for Flash, expecting Flash’s fast random access to require new query planning. While intuition suggests scan and join selection should shift between disk and Flash, I find that query plans chosen assuming disk are already near-optimal for Flash. Second, I examine new opportunities for durable, recoverable transaction processing with NVRAM. Existing disk-based recovery mechanisms impose large software overheads, yet updating data in-place requires frequent device synchronization that limits throughput. I introduce a new design, NVRAM Group Commit, to amortize synchronization delays over many transactions, increasing throughput at some cost to transaction latency. Finally, I propose a new framework for persistent programming and memory systems to enable high performance recoverable data structures with NVRAM, extending memory consistency with persistent semantics to introduce memory persistency.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107114/1/spelley_1.pd