1,454 research outputs found
Evaluation of Storage Systems for Big Data Analytics
abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.Dissertation/ThesisMasters Thesis Computer Science 201
Cost-Effective Cache Deployment in Mobile Heterogeneous Networks
This paper investigates one of the fundamental issues in cache-enabled
heterogeneous networks (HetNets): how many cache instances should be deployed
at different base stations, in order to provide guaranteed service in a
cost-effective manner. Specifically, we consider two-tier HetNets with
hierarchical caching, where the most popular files are cached at small cell
base stations (SBSs) while the less popular ones are cached at macro base
stations (MBSs). For a given network cache deployment budget, the cache sizes
for MBSs and SBSs are optimized to maximize network capacity while satisfying
the file transmission rate requirements. As cache sizes of MBSs and SBSs affect
the traffic load distribution, inter-tier traffic steering is also employed for
load balancing. Based on stochastic geometry analysis, the optimal cache sizes
for MBSs and SBSs are obtained, which are threshold-based with respect to cache
budget in the networks constrained by SBS backhauls. Simulation results are
provided to evaluate the proposed schemes and demonstrate the applications in
cost-effective network deployment
Search strategies in unstructured overlays
Trabalho de projecto de mestrado em Engenharia Informática, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2008Unstructured peer-to-peer networks have a low maintenance cost, high resilience and tolerance to the continuous arrival and departure of nodes. In these networks search is usually performed by flooding, which generates a high number of duplicate messages. To improve scalability, unstructured overlays evolved to a two-tiered architecture where regular nodes rely on special nodes, called supernodes or superpeers, to locate resources, thus reducing the scope of flooding based searches. While this approach takes advantage of node heterogeneity, it makes the overlay less resilient to accidental and malicious faults, and less attractive to users concerned with the consumption of their resources and who may not desire to commit additional resources that are required by nodes selected as superpeers. Another point of concern is churn, defined as the constant entry and departure of nodes. Churn affects both structured and unstructured overlay networks and, in order to build resilient search protocols, it must be taken into account. This dissertation proposes a novel search algorithm, called FASE, which combines a replication policy and a search space division technique to achieve low hop counts using a small number of messages, on unstructured overlays with nonhierarquical topologies. The problem of churn is mitigated by a distributed monitoring algorithm designed with FASE in mind. Simulation results validate FASE efficiency when compared to other search algorithms for peer-to-peer networks. The evaluation of the distributed monitoring algorithm shows that it maintains FASE performance when subjected to churn.Os sistemas peer-to-peer, como aplicações de partilha e distribuição de conteúdos ou voz-sobre-IP, são construídos sobre redes sobrepostas. Redes sobrepostas são redes virtuais que existem sobre uma rede subjacente, em que a topologia da rede sobreposta não tem de ter uma correspondência com a topologia da rede subjacente. Ao contrário das suas congéneres estruturadas, as redes sobrepostas não-estru-turadas não restringem a localização dos seus participantes, ou seja, não limitam a escolha de vizinhos de um dado nó, o que torna a sua manutenção mais simples. O baixo custo de manutenção das redes sobrepostas não-estruturadas torna estas especialmente adequadas para a construção de sistemas peer-to-peer capazes de tolerar o comportamento dinâmico dos seus participantes, uma vez que estas redes são permanentemente afectadas pela entrada e saída de nós na rede, um fénomeno conhecido como churn. O algoritmo de pesquisa mais comum em redes sobrepostas não-estruturadas consiste em inundar a rede, o que origina uma grande quantidade de mensagens duplicadas por cada pesquisa. A escalabilidade destes algoritmos é limitada porque consomem demasiados recursos da rede em sistemas com muitos participantes. Para reduzir o número de mensagens, as redes sobrepostas não-estruturadas podem ser organizadas em topologias hierárquicas. Nestas topologias alguns nós da rede, chamados supernós, assumem um papel mais importante, responsabilizando-se pela localização de objectos. A utilização de supernós cria novos problemas, como a sua selecção e a dependência da rede de uma pequena percentagem dos nós. Esta dissertação apresenta um novo algoritmo de pesquisa, chamado FASE, criado para operar sobre redes sobrepostas não estruturadas com topologias não-hierárquicas. Este algoritmo combina uma política de replicação com uma técnica de divisão do espaço de procura para resolver pesquisas ao alcançe de um número reduzido de saltos com o menor custo possível. Adicionalmente, o algoritmo procura nivelar a contribuição dos participantes, já que todos contribuem de uma forma semelhante para o desempenho da pesquisa. A estratégia seguida pelo algo- ritmo consiste em dividir tanto os nós da rede como as chaves dos seus conteúdos por diferentes “frequências” e replicar chaves nas respectivas frequências, sem, no entanto, limitar a localização de um nó ou impor uma estrutura à rede ou mesmo aplicar uma definição rígida de chave. Com o objectivo de mitigar o problema do churn, é apresentado um algoritmo de monitorização distribuído para as réplicas originadas pelo FASE. Os algoritmos propostos são avaliados através de simulações, que validam a eficiência do FASE quando comparado com outros algoritmos de pesquisa em redes sobrepostas não-estruturadas. É também demonstrado que o FASE mantém o seu desempenho em redes sob o efeito do churn quando combinado com o algoritmo de monitorização
CRAID: Online RAID upgrades using dynamic hot data reorganization
Current algorithms used to upgrade RAID arrays typically require large amounts of data to be migrated, even those that move only the minimum amount of data required to keep a balanced data load. This paper presents CRAID, a self-optimizing RAID array that performs an online block reorganization of frequently used, long-term accessed data in order to reduce this migration even further. To achieve this objective, CRAID tracks frequently used, long-term data blocks and copies them to a dedicated partition spread across all the disks in the array. When new disks are added, CRAID only needs to extend this process to the new devices to redistribute this partition, thus greatly reducing the overhead of the upgrade process. In addition, the reorganized access patterns within this partition improve the array’s performance, amortizing the copy overhead and allowing CRAID to offer a performance competitive with traditional RAIDs.
We describe CRAID’s motivation and design and we evaluate it by replaying seven real-world workloads including a file server, a web server and a user share. Our experiments show that CRAID can successfully detect hot data variations and begin using new disks as soon as they are added to the array. Also, the usage of a dedicated
partition improves the sequentiality of relevant data access, which amortizes the cost of reorganizations. Finally, we prove that a full-HDD CRAID array with a small distributed partition (<1.28% per disk) can compete in performance with an ideally restriped RAID-5 and a hybrid RAID-5 with a small SSD cache.Peer ReviewedPostprint (published version
On Optimal and Fair Service Allocation in Mobile Cloud Computing
This paper studies the optimal and fair service allocation for a variety of
mobile applications (single or group and collaborative mobile applications) in
mobile cloud computing. We exploit the observation that using tiered clouds,
i.e. clouds at multiple levels (local and public) can increase the performance
and scalability of mobile applications. We proposed a novel framework to model
mobile applications as a location-time workflows (LTW) of tasks; here users
mobility patterns are translated to mobile service usage patterns. We show that
an optimal mapping of LTWs to tiered cloud resources considering multiple QoS
goals such application delay, device power consumption and user cost/price is
an NP-hard problem for both single and group-based applications. We propose an
efficient heuristic algorithm called MuSIC that is able to perform well (73% of
optimal, 30% better than simple strategies), and scale well to a large number
of users while ensuring high mobile application QoS. We evaluate MuSIC and the
2-tier mobile cloud approach via implementation (on real world clouds) and
extensive simulations using rich mobile applications like intensive signal
processing, video streaming and multimedia file sharing applications. Our
experimental and simulation results indicate that MuSIC supports scalable
operation (100+ concurrent users executing complex workflows) while improving
QoS. We observe about 25% lower delays and power (under fixed price
constraints) and about 35% decrease in price (considering fixed delay) in
comparison to only using the public cloud. Our studies also show that MuSIC
performs quite well under different mobility patterns, e.g. random waypoint and
Manhattan models
- …