30 research outputs found

    Exploring non-typical memcache architectures for decreased latency and distributed network usage.

    Get PDF
    Memcache is a distributed in-memory data store designed to reduce database load for web applications by caching frequently used data across multiple machines. While memcache already offers excellent performance, we explore how data-locality can increase performance under certain environments and workloads

    Exploiting Data Locality in Dynamic Web Applications

    Get PDF
    The Internet has grown from a static document retrieval system to a dynamic medium where users are both consumers and producers of information. Users may experience above-average website latencies due to the physical distances information must travel. Because user satisfaction is related to a website\u27s responsiveness, e-commerce may be hindered and prevent online businesses from reaching their full potential. This dissertation analyzes how temporal and relational dependencies in web applications limit their ability to become distributed. Two contributions are made, the first showing the location of data inside a datacenter influences the web system\u27s performance, and secondly, that relaxing strict consistency inside the web application at a fine- grained level can greatly lower latencies for geographically diverse users. Experiments are used to show when and how much these optimizations can benefit a dynamic web application

    Network-Wide Monitoring And Debugging

    Get PDF
    Modern networks can encompass over 100,000 servers. Managing such an extensive network with a diverse set of network policies has become more complicated with the introduction of programmable hardwares and distributed network functions. Furthermore, service level agreements (SLAs) require operators to maintain high performance and availability with low latencies. Therefore, it is crucial for operators to resolve any issues in networks quickly. The problems can occur at any layer of stack: network (load imbalance), data-plane (incorrect packet processing), control-plane (bugs in configuration) and the coordination among them. Unfortunately, existing debugging tools are not sufficient to monitor, analyze, or debug modern networks; either they lack visibility in the network, require manual analysis, or cannot check for some properties. These limitations arise from the outdated view of the networks, i.e., that we can look at a single component in isolation. In this thesis, we describe a new approach that looks at measuring, understanding, and debugging the network across devices and time. We also target modern stateful packet processing devices: programmable data-planes and distributed network functions as these becoming increasingly common part of the network. Our key insight is to leverage both in-network packet processing (to collect precise measurements) and out-of-network processing (to coordinate measurements and scale analytics). The resulting systems we design based on this approach can support testing and monitoring at the data center scale, and can handle stateful data in the network. We automate the collection and analysis of measurement data to save operator time and take a step towards self driving networks

    Athlos: A Framework for Developing Scalable MMOG Backends on Commodity Clouds

    Get PDF
    The development of resource-intensive, distributed, real-time applications like Massively Multiplayer Online Game (MMOG) backends entails a variety of challenges, some of which have been extensively studied. Despite some advancements, the development and deployment of MMOG backends on commodity clouds and high-level computing layers continues to face several obstacles, including a non-standardized development methodology, lack of provisions for scalability, and the need for abstractions and tools to support the development process. In this paper, we describe a set of models, methods, and tools for developing scalable MMOG backends and hosting them on commodity cloud platforms. We present Athlos, a framework that allows game developers to leverage our methodology to rapidly prototype MMOG backends that can run on any type of cloud environment. We evaluate this framework by conducting simulations based on several case-study MMOGs to benchmark its performance and scalability, and compare the development effort needed, and quality of the code produced with other approaches. We find that MMOGs developed using this framework: (a) can support a very high number of simultaneous players under a given latency threshold, (b) elastically scale both in terms of runtime and state, and (c) significantly reduce the amount of effort required to develop them. Coupled with the advantages of high-level computing layers such as Platform, Backend, and Function-as-a-Service, we argue that our framework accelerates the development of high-performance, scalable MMOGs, that leverage the resources of commodity cloud platforms

    Accelerating orchestration with in-network offloading

    Get PDF
    The demand for low-latency Internet applications has pushed functionality that was originally placed in commodity hardware into the network. Either in the form of binaries for the programmable data plane or virtualised network functions, services are implemented within the network fabric with the aim of improving their performance and placing them close to the end user. Training of machine learning algorithms, aggregation of networking traffic, virtualised radio access components, are just some of the functions that have been deployed within the network. Therefore, as the network fabric becomes the accelerator for various applications, it is imperative that the orchestration of their components is also adapted to the constraints and capabilities of the deployment environment. This work identifies performance limitations of in-network compute use cases for both cloud and edge environments and makes suitable adaptations. Within cloud infrastructure, this thesis proposes a platform that relies on programmable switches to accelerate the performance of data replication. It then proceeds to discuss design adaptations of an orchestrator that will allow in-network data offloading and enable accelerated service deployment. At the edge, the topic of inefficient orchestration of virtualised network functions is explored, mainly with respect to energy usage and resource contention. An orchestrator is adapted to schedule requests by taking into account edge constraints in order to minimise resource contention and accelerate service processing times. With data transfers consuming valuable resources at the edge, an efficient data representation mechanism is implemented to provide statistical insight on the provenance of data at the edge and enable smart query allocation to nodes with relevant data. Taking into account the previous state of the art, the proposed data plane replication method appears to be the most computationally efficient and scalable in-network data replication platform available, with significant improvements in throughput and up to an order of magnitude decrease in latency. The orchestrator of virtual network functions at the edge was shown to reduce event rejections, total processing time, and energy consumption imbalances over the default orchestrator, thus proving more efficient use of the infrastructure. Lastly, computational cost at the edge was further reduced with the use of the proposed query allocation mechanism which minimised redundant engagement of nodes

    메모리 집약적 기계학습 응용프로그램을 위한 디램 기반 프로세싱 인 메모리 마이크로아키텍처

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2022.2. 안정호.Recently, as research on neural networks has gained significant traction, a number of memory-intensive neural network models such as recurrent neural network (RNN) models and recommendation models are introduced to process various tasks. RNN models and recommendation models spend most of their execution time processing matrix-vector multiplication (MV-mul) and processing embedding layers, respectively. A fundamental primitive of embedding layers, tensor gather-and-reduction (GnR), gathers embedding vectors and then reduces them to a new embedding vector. Because the matrices in RNNs and the embedding tables in recommendation models have poor reusability and the ever-increasing sizes of the matrices and the embedding tables become too large to fit in the on-chip storage of devices, the performance and energy efficiency of MV-mul and GnR are determined by those of main-memory DRAM. Therefore, computing these operations within DRAM draws significant attention. In this dissertation, we first propose a main-memory architecture called MViD, which performs MV-mul by placing MAC units inside DRAM banks. For higher computational efficiency, we use a sparse matrix format and exploit quantization. Because of the limited power budget for DRAM devices, we implement the MAC units only on a portion of the DRAM banks. We architect MViD to slow down or pause MV-mul for concurrently processing memory requests from processors while satisfying the limited power budget. Our results show that MViD provides 7.2× higher throughput compared to the baseline system with four DRAM ranks (performing MV-mul in a chip-multiprocessor) while running inference of Deep Speech 2 with a memory-intensive workload. Then we propose TRiM, an NDP architecture for accelerating recommendation systems. Based on the observation that the DRAM datapath has a hierarchical tree structure, TRiM augments the DRAM datapath with "in-DRAM" reduction units at the DDR4/5 rank/bank-group/bank level. We modify the interface of DRAM to provide commands effectively to multiple reduction units running in parallel. We also propose a host-side architecture with hot embedding-vector replication to alleviate the load imbalance that arises across the reduction units. An optimal TRiM design based on DDR5 achieves up to a 7.7× and 3.9× speedup and reduces by 55% and 50% the energy consumption of the embedding vector gather and reduction over the baseline and the state-of-the-art NDP architecture with minimal area overhead equivalent to 2.66% of DRAM chips.최근 많은 신경망 연구들이 관심을 받으면서, RNN 모델 혹은 추천 시스템 모델과 같은 메모리 집약적 신경망 모델들이 다양한 작업을 처리하기 위해서 등장하고있다. RNN 모델과 추천 시스템 모델은 대부분의 실행 시간 동안 각각 행렬-벡터 곱을 연산하고 임베딩 레이어를 처리한다. 임베딩 레이어의 기본 연산인 GnR 연산은 여러개의 임베딩 벡터를 모은 다음 이들을 합치는 동작을 한다. RNN 처리시 필요한 행렬과 추천 시스템 모델 처리시 필요한 임베딩 테이블은 재사용성이 낮고 이들의 크기는 계속 증가하여 온칩 스토리지에 저장될 수 없기 때문에 행렬-벡터 곱 및 GnR 연산의 성능 및 에너지 효율성은 주 메모리 DRAM의 성능 및 에너지 효율성에 의해 결정된다. 따라서 DRAM 내에서 이러한 연산을 처리하는 방식이 관심을 끌고있다. 본 논문에서는 먼저 DRAM 뱅크 내부에 MAC 유닛을 배치하여 행렬-벡터 곱을 수행하는 MViD라는 주 메모리 구조를 제안한다. 그리고 더 높은 계산 효율성을 위해 희소 행렬 형식을 사용하고 양자화를 활용한다. DRAM 장치가 사용할 수 있는 제한된 전력 때문에 DRAM 뱅크의 일부에만 MAC 장치를 구현한다. 전력 제한 조건을 충족하면서 프로세서의 메모리 요청을 동시에 처리하기 위해 행렬-벡터곱을 늦추거나 일시 중지하도록 MViD를 설계한다. 그 결과로 MViD가 메모리 집약적 워크로드로 Deep Speech 2의 추론을 실행하면서 4개의 DRAM 랭크를 사용하는 프로세서에서 행렬-벡터곱을 처리하는 기준 시스템에 비해 7.2배 더 높은 처리량을 제공한다는 것을 보여준다. 그리고 우리는 추천 시스템을 가속하기 위한 메모리 근처 처리 구조인 TRiM을 제안한다. DRAM 데이터 경로가 계층적 트리 구조를 갖는다는 사실을 기반으로 TRiM은 DDR4/5 랭크/뱅크그룹/뱅크 수준에서 DRAM 내부 벡터 감소 장치로 DRAM 데이터 경로를 강화한다. 병렬로 실행되는 여러 벡터 감소 장치에 명령을 효과적으로 제공하기 위해 DRAM의 인터페이스를 수정한다. 또한 벡터 감소 장치에서 발생하는 부하 불균형을 완화하기 위해 호스트 측 구조에 핫 임베딩 벡터 복제를 제안한다. DDR5를 기반으로 하는 최적의 TRiM 설계는 DRAM 칩의 2.66%에 해당하는 크기 오버헤드만으로 최대 7.7배 및 3.9배의 속도 향상을 달성하고 임베딩 벡터 수집의 에너지 소비를 55% 및 50% 줄인다.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Accelerating RNNs on Edge 3 1.2 Accelerating Recommendation Model 5 1.3 Research Contributions 8 1.4 Outline 9 2 Background 11 2.1 Memory-intensive Machine Learning Applications 11 2.2 DRAM Organization and Operations 13 3 MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks 18 3.1 Background and Motivation 18 3.1.1 Energy-efficient RNN Mobile Inference 18 3.1.2 How to Improve the Energy Efficiency and Bandwidth of DRAM Accesses in MV-mul 21 3.2 MV-mul in DRAM 23 3.2.1 Exploiting Quantization and Sparsity in RNN's Matrix Elements 23 3.2.2 The Operation Sequence of MV-mul in DRAM 27 3.2.3 Concurrently Serving Requests from Processors and Performing MV-mul in DRAM 32 3.2.4 Put It All Together: MViD Architecture 37 3.2.5 Additional Optimization Schemes 38 3.3 Evaluation 39 3.3.1 Power/Area/Timing Analysis 39 3.3.2 Performance/Energy Evaluation 42 3.4 Discussion 48 4 TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory 51 4.1 Prior NDP architectures for accelerating Tensor Gather-andReduction 51 4.1.1 Tensor Gather-and-Reduction in RecSys 51 4.1.2 Prior NDP accelerators for GnR 52 4.1.3 Quantitative Analysis 56 4.1.4 Additional Schemes for Accelerating GnR 58 4.2 Tensor Reduction in Memory 58 4.2.1 Basic Concept for TRiM 59 4.2.2 How to Provision C/A Bandwidth 62 4.2.3 Exploring NDP Unit Placement 65 4.2.4 TRiM-G Organization and Operations 68 4.2.5 Host-side Architecture for TRiM 70 4.2.6 Schemes for Improving Reliability 75 4.3 Experimental Setup 76 4.4 Evaluation 77 4.4.1 Performance and Energy Efficiency 79 4.4.2 Sensitivity Study of Hot-entry Replication 82 4.4.3 Design Overhead 82 4.5 Discussion 83 5 Discussion 86 6 Related work 89 7 Conclusion 92 REFERENCES 94 국문초록 117박

    Cloudarmor: Supporting Reputation-Based Trust Management for Cloud Services

    Get PDF
    Cloud services have become predominant in the current technological era. For the rich set of features provided by cloud services, consumers want to access the services while protecting their privacy. In this kind of environment, protection of cloud services will become a significant problem. So, research has started for a system, which lets the users access cloud services without losing the privacy of their data. Trust management and identity model makes sense in this case. The identity model maintains the authentication and authorization of the components involved in the system and trust-based model provides us with a dynamic way of identifying issues and attacks with the system and take appropriate actions. Further, a trust management-based system provides us with a new set of challenges such as reputation-based attacks, availability of components, and misleading trust feedbacks. Collusion attacks and Sybil attacks form a significant part of these challenges. This paper aims to solve the above problems in a trust management-based model by introducing a credibility model on top of a new trust management model, which addresses these use-cases, and also provides reliability and availability

    Distribuição de conteúdos over-the-top multimédia em redes sem fios

    Get PDF
    mestrado em Engenharia Eletrónica e TelecomunicaçõesHoje em dia a Internet é considerada um bem essencial devido ao facto de haver uma constante necessidade de comunicar, mas também de aceder e partilhar conteúdos. Com a crescente utilização da Internet, aliada ao aumento da largura de banda fornecida pelos operadores de telecomunicações, criaram-se assim excelentes condições para o aumento dos serviços multimédia Over-The-Top (OTT), demonstrado pelo o sucesso apresentado pelos os serviços Netflix e Youtube. O serviço OTT engloba a entrega de vídeo e áudio através da Internet sem um controlo direto dos operadores de telecomunicações, apresentando uma proposta atractiva de baixo custo e lucrativa. Embora a entrega OTT seja cativante, esta padece de algumas limitações. Para que a proposta se mantenha em crescimento e com elevados padrões de Qualidade-de-Experiência (QoE) para os consumidores, é necessário investir na arquitetura da rede de distribuição de conteúdos, para que esta seja capaz de se adaptar aos diversos tipos de conteúdo e obter um modelo otimizado com um uso cauteloso dos recursos, tendo como objectivo fornecer serviços OTT com uma boa qualidade para o utilizador, de uma forma eficiente e escalável indo de encontro aos requisitos impostos pelas redes móveis atuais e futuras. Esta dissertação foca-se na distribuição de conteúdos em redes sem fios, através de um modelo de cache distribuída entre os diferentes pontos de acesso, aumentando assim o tamanho da cache e diminuindo o tráfego necessário para os servidores ou caches da camada de agregação acima. Assim, permite-se uma maior escalabilidade e aumento da largura de banda disponível para os servidores de camada de agregação acima. Testou-se o modelo de cache distribuída em três cenários: o consumidor está em casa em que se considera que tem um acesso fixo, o consumidor tem um comportamento móvel entre vários pontos de acesso na rua, e o consumidor está dentro de um comboio em alta velocidade. Testaram-se várias soluções como Redis2, Cachelot e Memcached para servir de cache, bem como se avaliaram vários proxies para ir de encontro ás características necessárias. Mais ainda, na distribuição de conteúdos testaram-se dois algoritmos, nomeadamente o Consistent e o Rendezvouz Hashing. Ainda nesta dissertação utilizou-se uma proposta já existente baseada na previsão de conteúdos (prefetching ), que consiste em colocar o conteúdo nas caches antes de este ser requerido pelos consumidores. No final, verificou-se que o modelo distribuído com a integração com prefecthing melhorou a qualidade de experiência dos consumidores, bem como reduziu a carga nos servidores de camada de agregação acima.Nowadays, the Internet is considered an essential good, due to the fact that there is a need to communicate, but also to access and share information. With the increasing use of the Internet, allied with the increased bandwidth provided by telecommunication operators, it has created conditions for the increase of Over-the-Top (OTT) Multimedia Services, demonstrated by the huge success of Net ix and Youtube. The OTT service encompasses the delivery of video and audio through the Internet without direct control of telecommunication operators, presenting an attractive low-cost and pro table proposal. Although the OTT delivery is captivating, it has some limitations. In order to increase the number of clients and keep the high Quality of Experience (QoE) standards, an enhanced architecture for content distribution network is needed. Thus, the enhanced architecture needs to provide a good quality for the user, in an e cient and scalable way, supporting the requirements imposed by future mobile networks. This dissertation aims to approach the content distribution in wireless networks, through a distributed cache model among the several access points, thus increasing the cache size and decreasing the load on the upstream servers. The proposed architecture was tested in three di erent scenarios: the consumer is at home and it is considered that it has a xed access, the consumer is mobile between several access points in the street, the consumer is in a high speed train. Several solutions were evaluated, such as Redis2, Cachelot and Memcached to serve as caches, along with the evaluation of several proxies server in order to ful ll the required features. Also, it was tested two distributed algorithms, namely the Consistent and Rendezvous Hashing. Moreover, in this dissertation it was integrated a prefetching mechanism, which consists of inserting the content in caches before being requested by the consumers. At the end, it was veri ed that the distributed model with prefetching improved the consumers QoE as well as it reduced the load on the upstream servers
    corecore