3,946 research outputs found
Efficient Proactive Caching for Supporting Seamless Mobility
We present a distributed proactive caching approach that exploits user
mobility information to decide where to proactively cache data to support
seamless mobility, while efficiently utilizing cache storage using a congestion
pricing scheme. The proposed approach is applicable to the case where objects
have different sizes and to a two-level cache hierarchy, for both of which the
proactive caching problem is hard. Additionally, our modeling framework
considers the case where the delay is independent of the requested data object
size and the case where the delay is a function of the object size. Our
evaluation results show how various system parameters influence the delay gains
of the proposed approach, which achieves robust and good performance relative
to an oracle and an optimal scheme for a flat cache structure.Comment: 10 pages, 9 figure
Amorphous Placement and Informed Diffusion for Timely Monitoring by Autonomous, Resource-Constrained, Mobile Sensors
Personal communication devices are increasingly equipped with sensors for passive monitoring of encounters and surroundings. We envision the emergence of services that enable a community of mobile users carrying such resource-limited devices to query such information at remote locations in the field in which they collectively roam. One approach to implement such a service is directed placement and retrieval (DPR), whereby readings/queries about a specific location are routed to a node responsible for that location. In a mobile, potentially sparse setting, where end-to-end paths are unavailable, DPR is not an attractive solution as it would require the use of delay-tolerant (flooding-based store-carry-forward) routing of both readings and queries, which is inappropriate for applications with data freshness constraints, and which is incompatible with stringent device power/memory constraints. Alternatively, we propose the use of amorphous placement and retrieval (APR), in which routing and field monitoring are integrated through the use of a cache management scheme coupled with an informed exchange of cached samples to diffuse sensory data throughout the network, in such a way that a query answer is likely to be found close to the query origin. We argue that knowledge of the distribution of query targets could be used effectively by an informed cache management policy to maximize the utility of collective storage of all devices. Using a simple analytical model, we show that the use of informed cache management is particularly important when the mobility model results in a non-uniform distribution of users over the field. We present results from extensive simulations which show that in sparsely-connected networks, APR is more cost-effective than DPR, that it provides extra resilience to node failure and packet losses, and that its use of informed cache management yields superior performance
A Lightweight Distributed Solution to Content Replication in Mobile Networks
Performance and reliability of content access in mobile networks is
conditioned by the number and location of content replicas deployed at the
network nodes. Facility location theory has been the traditional, centralized
approach to study content replication: computing the number and placement of
replicas in a network can be cast as an uncapacitated facility location
problem. The endeavour of this work is to design a distributed, lightweight
solution to the above joint optimization problem, while taking into account the
network dynamics. In particular, we devise a mechanism that lets nodes share
the burden of storing and providing content, so as to achieve load balancing,
and decide whether to replicate or drop the information so as to adapt to a
dynamic content demand and time-varying topology. We evaluate our mechanism
through simulation, by exploring a wide range of settings and studying
realistic content access mechanisms that go beyond the traditional
assumptionmatching demand points to their closest content replica. Results show
that our mechanism, which uses local measurements only, is: (i) extremely
precise in approximating an optimal solution to content placement and
replication; (ii) robust against network mobility; (iii) flexible in
accommodating various content access patterns, including variation in time and
space of the content demand.Comment: 12 page
Rethinking Distributed Caching Systems Design and Implementation
Distributed caching systems based on in-memory key-value stores have become a
crucial aspect of fast and efficient content delivery in modern web-applications. However,
due to the dynamic and skewed execution environments and workloads, under which
such systems typically operate, several problems arise in the form of load imbalance.
This thesis addresses the sources of load imbalance in caching systems, mainly: i) data
placement, which relates to distribution of data items across servers and ii) data item
access frequency, which describes amount of requests each server has to process, and how
each server is able to cope with it. Thus, providing several strategies to overcome the
sources of imbalance in isolation.
As a use case, we analyse Memcached, its variants, and propose a novel solution for
distributed caching systems. Our solution revolves around increasing parallelism through
load segregation, and solutions to overcome the load discrepancies when reaching high
saturation scenarios, mostly through access re-arrangement, and internal replication.Os sistemas de cache distribuídos baseados em armazenamento de pares chave-valor
em RAM, tornaram-se um aspecto crucial em aplicações web modernas para o fornecimento
rápido e eficiente de conteúdo. No entanto, estes sistemas normalmente estão
sujeitos a ambientes muito dinâmicos e irregulares. Este tipo de ambientes e irregularidades,
causa vários problemas, que emergem sob a forma de desequilíbrios de carga.
Esta tese aborda as diferentes origens de desequilíbrio de carga em sistemas de caching
distribuído, principalmente: i) colocação de dados, que se relaciona com a distribuição
dos dados pelos servidores e a ii) frequência de acesso aos dados, que reflete a quantidade
de pedidos que cada servidor deve processar e como cada servidor lida com a sua carga.
Desta forma, demonstramos várias estratégias para reduzir o impacto proveniente das
fontes de desequilíbrio, quando analizadas em isolamento.
Como caso de uso, analisamos o sistema Memcached, as suas variantes, e propomos
uma nova solução para sistemas de caching distribuídos. A nossa solução gira em torno
de aumento de paralelismo atraves de segregação de carga e em como superar superar as
discrepâncias de carga a quando de sistema entra em grande saturação, principalmente
atraves de reorganização de acesso e de replicação intern
Improving the Performance and Energy Efficiency of GPGPU Computing through Adaptive Cache and Memory Management Techniques
Department of Computer Science and EngineeringAs the performance and energy efficiency requirement of GPGPUs have risen, memory management techniques of GPGPUs have improved to meet the requirements by employing hardware caches and utilizing heterogeneous memory. These techniques can improve GPGPUs by providing lower latency and higher bandwidth of the memory. However, these methods do not always guarantee improved performance and energy efficiency due to the small cache size and heterogeneity of the memory nodes. While prior works have proposed various techniques to address this issue, relatively little work has been done to investigate holistic support for memory management techniques.
In this dissertation, we analyze performance pathologies and propose various techniques to improve memory management techniques. First, we investigate the effectiveness of advanced cache indexing (ACI) for high-performance and energy-efficient GPGPU computing. Specifically, we discuss the designs of various static and adaptive cache indexing schemes and present implementation for GPGPUs. We then quantify and analyze the effectiveness of the ACI schemes based on a cycle-accurate GPGPU simulator. Our quantitative evaluation shows that ACI schemes achieve significant performance and energy-efficiency gains over baseline conventional indexing scheme. We also analyze the performance sensitivity of ACI to key architectural parameters (i.e., capacity, associativity, and ICN bandwidth) and the cache indexing latency. We also demonstrate that ACI continues to achieve high performance in various settings.
Second, we propose IACM, integrated adaptive cache management for high-performance and energy-efficient GPGPU computing. Based on the performance pathology analysis of GPGPUs, we integrate state-of-the-art adaptive cache management techniques (i.e., cache indexing, bypassing, and warp limiting) in a unified architectural framework to eliminate performance pathologies. Our quantitative evaluation demonstrates that IACM significantly improves the performance and energy efficiency of various GPGPU workloads over the baseline architecture (i.e., 98.1% and 61.9% on average, respectively) and achieves considerably higher performance than the state-of-the-art technique (i.e., 361.4% at maximum and 7.7% on average). Furthermore, IACM delivers significant performance and energy efficiency gains over the baseline GPGPU architecture even when enhanced with advanced architectural technologies (e.g., higher capacity, associativity).
Third, we propose bandwidth- and latency-aware page placement (BLPP) for GPGPUs with heterogeneous memory. BLPP analyzes the characteristics of a application and determines the optimal page allocation ratio between the GPU and CPU memory. Based on the optimal page allocation ratio, BLPP dynamically allocate pages across the heterogeneous memory nodes. Our experimental results show that BLPP considerably outperforms the baseline and state-of-the-art technique (i.e., 13.4% and 16.7%) and performs similar to the static-best version (i.e., 1.2% difference), which requires extensive offline profiling.clos
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
Content Distribution in P2P Systems
The report provides a literature review of the state-of-the-art for content distribution. The report's contributions are of threefold. First, it gives more insight into traditional Content Distribution Networks (CDN), their requirements and open issues. Second, it discusses Peer-to-Peer (P2P) systems as a cheap and scalable alternative for CDN and extracts their design challenges. Finally, it evaluates the existing P2P systems dedicated for content distribution according to the identied requirements and challenges
Transactional and analytical data management on persistent memory
Die zunehmende Anzahl von Smart-Geräten und Sensoren, aber auch die sozialen Medien lassen das Datenvolumen und damit die geforderte Verarbeitungsgeschwindigkeit stetig wachsen. Gleichzeitig müssen viele Anwendungen Daten persistent speichern oder sogar strenge Transaktionsgarantien einhalten. Die neuartige Speichertechnologie Persistent Memory (PMem) mit ihren einzigartigen Eigenschaften scheint ein natürlicher Anwärter zu sein, um diesen Anforderungen effizient nachzukommen. Sie ist im Vergleich zu DRAM skalierbarer, günstiger und dauerhaft. Im Gegensatz zu Disks ist sie deutlich schneller und direkt adressierbar. Daher wird in dieser Dissertation der gezielte Einsatz von PMem untersucht, um den Anforderungen moderner Anwendung gerecht zu werden. Nach der Darlegung der grundlegenden Arbeitsweise von und mit PMem, konzentrieren wir uns primär auf drei Aspekte der Datenverwaltung. Zunächst zerlegen wir mehrere persistente Daten- und Indexstrukturen in ihre zugrundeliegenden Entwurfsprimitive, um Abwägungen für verschiedene Zugriffsmuster aufzuzeigen. So können wir ihre besten Anwendungsfälle und Schwachstellen, aber auch allgemeine Erkenntnisse über das Entwerfen von PMem-basierten Datenstrukturen ermitteln. Zweitens schlagen wir zwei Speicherlayouts vor, die auf analytische Arbeitslasten abzielen und eine effiziente Abfrageausführung auf beliebigen Attributen ermöglichen. Während der erste Ansatz eine verknüpfte Liste von mehrdimensionalen gruppierten Blöcken verwendet, handelt es sich beim zweiten Ansatz um einen mehrdimensionalen Index, der Knoten im DRAM zwischenspeichert. Drittens zeigen wir unter Verwendung der bisherigen Datenstrukturen und Erkenntnisse, wie Datenstrom- und Ereignisverarbeitungssysteme mit transaktionaler Zustandsverwaltung verbessert werden können. Dabei schlagen wir ein neuartiges Transactional Stream Processing (TSP) Modell mit geeigneten Konsistenz- und Nebenläufigkeitsprotokollen vor, die an PMem angepasst sind. Zusammen sollen die diskutierten Aspekte eine Grundlage für die Entwicklung noch ausgereifterer PMem-fähiger Systeme bilden. Gleichzeitig zeigen sie, wie Datenverwaltungsaufgaben PMem ausnutzen können, indem sie neue Anwendungsgebiete erschließen, die Leistung, Skalierbarkeit und Wiederherstellungsgarantien verbessern, die Codekomplexität vereinfachen sowie die ökonomischen und ökologischen Kosten reduzieren.The increasing number of smart devices and sensors, but also social media are causing the volume of data and thus the demanded processing speed to grow steadily. At the same time, many applications need to store data persistently or even comply with strict transactional guarantees. The novel storage technology Persistent Memory (PMem), with its unique properties, seems to be a natural candidate to meet these requirements efficiently. Compared to DRAM, it is more scalable, less expensive, and durable. In contrast to disks, it is significantly faster and directly addressable. Therefore, this dissertation investigates the deliberate employment of PMem to fit the needs of modern applications. After presenting the fundamental work of and with PMem, we focus primarily on three aspects of data management. First, we disassemble several persistent data and index structures into their underlying design primitives to reveal the trade-offs for various access patterns. It allows us to identify their best use cases and vulnerabilities but also to gain general insights into the design of PMem-based data structures. Second, we propose two storage layouts that target analytical workloads and enable an efficient query execution on arbitrary attributes. While the first approach employs a linked list of multi-dimensional clustered blocks that potentially span several storage layers, the second approach is a multi-dimensional index that caches nodes in DRAM. Third, we show how to improve stream and event processing systems involving transactional state management using the preceding data structures and insights. In this context, we propose a novel Transactional Stream Processing (TSP) model with appropriate consistency and concurrency protocols adapted to PMem. Together, the discussed aspects are intended to provide a foundation for developing even more sophisticated PMemenabled systems. At the same time, they show how data management tasks can take advantage of PMem by opening up new application domains, improving performance, scalability, and recovery guarantees, simplifying code complexity, plus reducing economic and environmental costs
- …