88 research outputs found

    On the Synchronization Bottleneck of OpenStack Swift-like Cloud Storage Systems

    Get PDF
    As one type of the most popular cloud storage services, OpenStack Swift and its follow-up systems replicate each object across multiple storage nodes and leverage object sync protocols to achieve high reliability and eventual consistency. The performance of object sync protocols heavily relies on two key parameters: r (number of replicas for each object) and n (number of objects hosted by each storage node). In existing tutorials and demos, the configurations are usually r = 3 and n 3 and n >> 1000, the sync process is significantly delayed and produces massive network overhead, referred to as the sync bottleneck problem. By reviewing the source code of OpenStack Swift, we find that its object sync protocol utilizes a fairly simple and network-intensive approach to check the consistency among replicas of objects. Hence in a sync round, the number of exchanged hash values per node is Theta(n x r). To tackle the problem, we propose a lightweight and practical object sync protocol, LightSync, which not only remarkably reduces the sync overhead, but also preserves high reliability and eventual consistency. LightSync derives this capability from three novel building blocks: 1) Hashing of Hashes, which aggregates all the h hash values of each data partition into a single but representative hash value with the Merkle tree; 2) Circular Hash Checking, which checks the consistency of different partition replicas by only sending the aggregated hash value to the clockwise neighbor; and 3) Failed Neighbor Handling, which properly detects and handles node failures with moderate overhead to effectively strengthen the robustness of LightSync. The design of LightSync offers provable guarantee on reducing the per-node network overhead from Theta(n x r) to Theta(n/h). Furthermore, we have implemented LightSync as an open-source patch and adopted it to OpenStack Swift, thus reducing the sync delay by up to 879x and the network overhead by up to 47.5x

    Improving performance and capacity utilization in cloud storage for content delivery and sharing services

    Get PDF
    Content delivery and sharing (CDS) is a popular and cost effective cloud-based service for organizations to deliver/share contents to/with end-users, partners and insider users. This type of service improves the data availability and I/O performance by producing and distributing replicas of shared contents. However, such a technique increases the storage/network resources utilization. This paper introduces a threefold methodology to improve the trade-off between I/O performance and capacity utilization of cloud storage for CDS services. This methodology includes: i) Definition of a classification model for identifying types of users and contents by analyzing their consumption/ demand and sharing patterns, ii) Usage of the classification model for defining content availability and load balancing schemes, and iii) Integration of a dynamic availability scheme into a cloud based CDS system. Our method was implemented ¿This work was partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under the grant TIN2016-79637-P ”Towards Unification of HPC and Big Data Paradigms

    A Study of Application-awareness in Software-defined Data Center Networks

    Get PDF
    A data center (DC) has been a fundamental infrastructure for academia and industry for many years. Applications in DC have diverse requirements on communication. There are huge demands on data center network (DCN) control frameworks (CFs) for coordinating communication traffic. Simultaneously satisfying all demands is difficult and inefficient using existing traditional network devices and protocols. Recently, the agile software-defined Networking (SDN) is introduced to DCN for speeding up the development of the DCNCF. Application-awareness preserves the application semantics including the collective goals of communications. Previous works have illustrated that application-aware DCNCFs can much more efficiently allocate network resources by explicitly considering applications needs. A transfer application task level application-aware software-defined DCNCF (SDDCNCF) for OpenFlow software-defined DCN (SDDCN) for big data exchange is designed. The SDDCNCF achieves application-aware load balancing, short average transfer application task completion time, and high link utilization. The SDDCNCF is immediately deployable on SDDCN which consists of OpenFlow 1.3 switches. The Big Data Research Integration with Cyberinfrastructure for LSU (BIC-LSU) project adopts the SDDCNCF to construct a 40Gb/s high-speed storage area network to efficiently transfer big data for accelerating big data related researches at Louisiana State University. On the basis of the success of BIC-LSU, a coflow level application-aware SD- DCNCF for OpenFlow-based storage area networks, MinCOF, is designed. MinCOF incorporates all desirable features of existing coflow scheduling and routing frame- works and requires minimal changes on hosts. To avoid the architectural limitation of the OpenFlow SDN implementation, a coflow level application-aware SDDCNCF using fast packet processing library, Coflourish, is designed. Coflourish exploits congestion feedback assistances from SDN switches in the DCN to schedule coflows and can smoothly co-exist with arbitrary applications in a shared DCN. Coflourish is implemented using the fast packet processing library on an SDN switch, Open vSwitch with DPDK. Simulation and experiment results indicate that Coflourish effectively shortens average application completion time

    An access control and authorization model with Open stack cloud for Smart Grid

    Get PDF
    In compare to Authentication for identification and relationship of an identity of a user with its task and process within the system, authorization in access control is much anxious about confirming that user and its task in the form of system process, access to the assets of any particular domain is only approved when proven obedient to the identified policies. Access control and authorization is always an area of interest for researchers for enhancing security of critical assets from many decades. Our prime focus and interest is in the field of access control model based on Attribute base access control (ABAC) and with this paper we tried to integrate ABAC with openstack cloud for achieving finer level of granularity in access policies for domain like smart grid. Technical advancement of current era demands that critical infrastructure like traditional electrical grid open ups to the modern information and communication technology to get the benefit in terms of efficiency, scalability, accessibility and transparency for better adaptability in real world. Incorporation of ICT with electric grid makes it possible to do greater level of bi-directional interaction among stake holders like customer, generation units, distribution units and administrations and these leads international organization to contribute for standardization of smart grid concepts and technology so that the realization of smart grid becomes reality. Smart grid is a distributed system of very large scale by its nature and needs to integrate available legacy systems with its own security requirements. Cloud computing proven to be most efficient approach for said requirements and we have identified openstack as our cloud platform. We have integrated ABAC approach with default RBAC approach of openstack and provide a frame work that supports and integrate multiple access control polices in making authorization decisions. Smart grid domain in considered as case study which requires support of multiple access policies (RBAC, ABAC or DAC etc) with our model for access control and authorization

    Tallennusmenetelmät hajautetulle dockerisoidulle pilvelle

    Get PDF
    Distributed Object and Block Storages systems are studied in this thesis and their suitability as a storage solution for a dockerized cloud was evaluated. Docker is a relatively new virtualization framework. In beginning it was designed for containerizing processes on single host environments. However, it started to be used in multi host configurations and clouds, which has caused need for persistent storage solutions which are not relaying on host machine storage. Two open source distributed storage solutions were studied. Swift is an eventually consistent Object Storage system developed for the Openstack project. Ceph is a consistent storage system including object, block and file system storage subsystems. Swift and Ceph Object Storage systems were compared against each other. The Ceph Block Storage performance was evaluated against the virtual machine disk. The results show that Ceph has double the throughput in small objects from 8KB to 128KB compared to Swift throughput, and 30% better performance in files from 256KB to 100MB. The main trend between Swift and Ceph is that Ceph has better throughput on read operations in all object sizes. The Ceph Block Storage system was able to utilize 88.5% of the virtual machine disk write throughput. Throughput efficiency was calculated by multiplying write throughput of Ceph block by three and it dividing by virtual machine disk write throughput. Ceph block throughput needed to be tripled because replication tripled amount of disk writes. Ceph journal files were not stored on the disk so those wont affect efficiency.Työssä käsitellään hajautettuja objekti- ja lohko -tallennusmenetelmiä sekä niiden sopivuutta pysyväistallennukseksi dockerisoituun pilveen. Docker on suhteellisen uusi virtualisointityökalu ja se oli alunperin suunniteltu pelkästään yhden koneen prosessien virtualisointiin. Sitä kuitenkin alettiin käyttämään pilvipalveluissa virtualisointityökaluna, mikä on aiheuttanut tarpeen hajautetulle tallentamiselle, sillä tallentaminen isäntäkoneen kovalevylle ei ole toimiva ratkaisu pilvipalveluissa. Työssä käsiteltiin kahta avoimen lähdekoodin hajautettua tallennusjärjestelmää, Swift ja Ceph. Swift on Openstack projektin objekti-tallennusjärjestelmä. Ceph puolestaan tukee hajautettuja objekti, lohko, ja tietojärjestelmä tallennusmenetelmiä. Objekti-tallennuksessa Swiftin ja Cephin suorituskykyjä verrattiin toisiinsa ja lohkotallennuksessa Cephin suorituskykyä verrattiin virtuaalikoneen suorituskykyyn. Tuloksissa huomattiin Cephin saavuttavan kaksinkertaisen suorituskyvyn verrattuna Swiftiin, kun testin objektien koko oli 8 kilotavusta 128 kilotavuun. Näitä suuremmilla objekteilla aina 100MB saakka suorituskyky ero oli enää 30 % Cephin hyväksi. Yleisesti Cephi saavutti paremman suorituskyvyn objekteja luettaessa verrattuna Swiftiin. Cephin lohko tallennus osoitti testeissä hyvää suorituskykyä kyetessään 88,5 % kirjoitus hyötysuhteeseen verrattaessa virtuaalikoneen kovalevyyn. Hyötysuhde laskettiin kertomalla Cephin lohkon surituskyky kolmella ja jakamalla se virtuaalikoneen kovalevyn suoritusteholla. Cephin suorituskyky kerrottiin kolmella sillä Cephi tallentaa kaiken kolmeen kertaan. Cephin lokikirjoituksia ei tarvinnut huomioida yhtälössä sillä niitä ei tallennettu kovalevylle

    Cloud-efficient modelling and simulation of magnetic nano materials

    Get PDF
    Scientific simulations are rarely attempted in a cloud due to the substantial performance costs of virtualization. Considerable communication overheads, intolerable latencies, and inefficient hardware emulation are the main reasons why this emerging technology has not been fully exploited. On the other hand, the progress of computing infrastructure nowadays is strongly dependent on perspective storage medium development, where efficient micromagnetic simulations play a vital role in future memory design. This thesis addresses both these topics by merging micromagnetic simulations with the latest OpenStack cloud implementation while providing a time and costeffective alternative to expensive computing centers. However, many challenges have to be addressed before a high-performance cloud platform emerges as a solution for problems in micromagnetic research communities. First, the best solver candidate has to be selected and further improved, particularly in the parallelization and process communication domain. Second, a 3-level cloud communication hierarchy needs to be recognized and each segment adequately addressed. The required steps include breaking the VMisolation for the host’s shared memory activation, cloud network-stack tuning, optimization, and efficient communication hardware integration. The project work concludes with practical measurements and confirmation of successfully implemented simulation into an open-source cloud environment. It is achieved that the renewed Magpar solver runs for the first time in the OpenStack cloud by using ivshmem for shared memory communication. Also, extensive measurements proved the effectiveness of our solutions, yielding from sixty percent to over ten times better results than those achieved in the standard cloud.Aufgrund der erheblichen Leistungskosten der Virtualisierung werden wissenschaftliche Simulationen in einer Cloud selten versucht. Beträchtlicher Kommunikationsaufwand, erhebliche Latenzen und ineffiziente Hardwareemulation sind die Hauptgründe, warum diese aufkommende Technologie nicht vollständig genutzt wurde. Andererseits hängt der Fortschritt der Computertechnologie heutzutage stark von der Entwicklung perspektivischer Speichermedien ab, bei denen effiziente mikromagnetische Simulationen eine wichtige Rolle für die zukünftige Speichertechnologie spielen. Diese Arbeit befasst sich mit diesen beiden Themen, indem mikromagnetische Simulationen mit der neuesten OpenStack Cloud-Implementierung zusammengeführt werden, um eine zeit- und kostengünstige Alternative zu teuren Rechenzentren bereitzustellen. Viele Herausforderungen müssen jedoch angegangen werden, bevor eine leistungsstarke Cloud-Plattform als Lösung für Probleme in mikromagnetischen Forschungsgemeinschaften entsteht. Zunächst muss der beste Kandidat für die Lösung ausgewählt und weiter verbessert werden, insbesondere im Bereich der Parallelisierung und Prozesskommunikation. Zweitens muss eine 3-stufige CloudKommunikationshierarchie erkannt und jedes Segment angemessen adressiert werden. Die erforderlichen Schritte umfassen das Aufheben der VM-Isolation, um den gemeinsam genutzten Speicher zwischen Cloud-Instanzen zu aktivieren, die Optimierung des Cloud-Netzwerkstapels und die effiziente Integration von Kommunikationshardware. Die praktische Arbeit endet mit Messungen und der Bestätigung einer erfolgreich implementierten Simulation in einer Open-Source Cloud-Umgebung. Als Ergebnis haben wir erreicht, dass der neu erstellte Magpar-Solver zum ersten Mal in der OpenStack Cloud ausgeführt wird, indem ivshmem für die Shared-Memory Kommunikation verwendet wird. Umfangreiche Messungen haben auch die Wirksamkeit unserer Lösungen bewiesen und von sechzig Prozent bis zu zehnmal besseren Ergebnissen als in der Standard Cloud geführt

    Towards Trasparent Data Access with Context Awareness

    Get PDF
    Applying the principles of open research data is an important factor accelerating the production, analysis of scientific results and worldwide collaboration. However, still very little data is being shared. The aim of this article is analysis of existing data access solutions in order to identify reasons for such situation. After analysis of existing solutions and data access stakeholders needs, the authors propose own vision of data access model evolution

    VISOR: virtual machine images management service for cloud infarestructures

    Get PDF
    Cloud Computing is a relatively novel paradigm that aims to fulfill the computing as utility dream. It has appeared to bring the possibility of providing computing resources (such as servers, storage and networks) as a service and on demand, making them accessible through common Internet protocols. Through cloud offers, users only need to pay for the amount of resources they need and for the time they use them. Virtualization is the clouds key technology, acting upon virtual machine images to deliver fully functional virtual machine instances. Therefore, virtual machine images play an important role in Cloud Computing and their efficient management becomes a key concern that should be carefully addressed. To tackle this requirement, most cloud offers provide their own image repository, where images are stored and retrieved from, in order to instantiate new virtual machines. However, the rise of Cloud Computing has brought new problems in managing large collections of images. Existing image repositories are not able to efficiently manage, store and catalogue virtual machine images from other clouds through the same centralized service repository. This becomes especially important when considering the management of multiple heterogeneous cloud offers. In fact, despite the hype around Cloud Computing, there are still existing barriers to its widespread adoption. Among them, clouds interoperability is one of the most notable issues. Interoperability limitations arise from the fact that current cloud offers provide proprietary interfaces, and their services are tied to their own requirements. Therefore, when dealing with multiple heterogeneous clouds, users face hard to manage integration and compatibility issues. The management and delivery of virtual machine images across different clouds is an example of such interoperability constraints. This dissertation presents VISOR, a cloud agnostic virtual machine images management service and repository. Our work towards VISOR aims to provide a service not designed to fit in a specific cloud offer but rather to overreach sharing and interoperability limitations among different clouds. With VISOR, the management of clouds interoperability can be seamlessly abstracted from the underlying procedures details. In this way, it aims to provide users with the ability to manage and expose virtual machine images across heterogeneous clouds, throughout the same generic and centralized repository and management service. VISOR is an open source software with a community-driven development process, thus it can be freely customized and further improved by everyone. The conducted tests to evaluate its performance and resources usage rate have shown VISOR as a stable and high performance service, even when compared with other services already in production. Lastly, placing clouds as the main target audience is not a limitation for other use cases. In fact, virtualization and virtual machine images are not exclusively linked to cloud environments. Therefore and given the service agnostic design concerns, it is possible to adapt it to other usage scenarios as well.A Computação em Nuvem (”Cloud Computing”) é um paradigma relativamente novo que visa cumprir o sonho de fornecer a computação como um serviço. O mesmo surgiu para possibilitar o fornecimento de recursos de computação (servidores, armazenamento e redes) como um serviço de acordo com as necessidades dos utilizadores, tornando-os acessíveis através de protocolos de Internet comuns. Através das ofertas de ”cloud”, os utilizadores apenas pagam pela quantidade de recursos que precisam e pelo tempo que os usam. A virtualização é a tecnologia chave das ”clouds”, atuando sobre imagens de máquinas virtuais de forma a gerar máquinas virtuais totalmente funcionais. Sendo assim, as imagens de máquinas virtuais desempenham um papel fundamental no ”Cloud Computing” e a sua gestão eficiente torna-se um requisito que deve ser cuidadosamente analisado. Para fazer face a tal necessidade, a maioria das ofertas de ”cloud” fornece o seu próprio repositório de imagens, onde as mesmas são armazenadas e de onde são copiadas a fim de criar novas máquinas virtuais. Contudo, com o crescimento do ”Cloud Computing” surgiram novos problemas na gestão de grandes conjuntos de imagens. Os repositórios existentes não são capazes de gerir, armazenar e catalogar images de máquinas virtuais de forma eficiente a partir de outras ”clouds”, mantendo um único repositório e serviço centralizado. Esta necessidade torna-se especialmente importante quando se considera a gestão de múltiplas ”clouds” heterogéneas. Na verdade, apesar da promoção extrema do ”Cloud Computing”, ainda existem barreiras à sua adoção generalizada. Entre elas, a interoperabilidade entre ”clouds” é um dos constrangimentos mais notáveis. As limitações de interoperabilidade surgem do fato de as ofertas de ”cloud” atuais possuírem interfaces proprietárias, e de os seus serviços estarem vinculados às suas próprias necessidades. Os utilizadores enfrentam assim problemas de compatibilidade e integração difíceis de gerir, ao lidar com ”clouds” de diferentes fornecedores. A gestão e disponibilização de imagens de máquinas virtuais entre diferentes ”clouds” é um exemplo de tais restrições de interoperabilidade. Esta dissertação apresenta o VISOR, o qual é um repositório e serviço de gestão de imagens de máquinas virtuais genérico. O nosso trabalho em torno do VISOR visa proporcionar um serviço que não foi concebido para lidar com uma ”cloud” específica, mas sim para superar as limitações de interoperabilidade entre ”clouds”. Com o VISOR, a gestão da interoperabilidade entre ”clouds” é abstraída dos detalhes subjacentes. Desta forma pretende-se proporcionar aos utilizadores a capacidade de gerir e expor imagens entre ”clouds” heterogéneas, mantendo um repositório e serviço de gestão centralizados. O VISOR é um software de código livre com um processo de desenvolvimento aberto. O mesmo pode ser livremente personalizado e melhorado por qualquer pessoa. Os testes realizados para avaliar o seu desempenho e a taxa de utilização de recursos mostraram o VISOR como sendo um serviço estável e de alto desempenho, mesmo quando comparado com outros serviços já em utilização. Por fim, colocar as ”clouds” como principal público-alvo não representa uma limitação para outros tipos de utilização. Na verdade, as imagens de máquinas virtuais e a virtualização não estão exclusivamente ligadas a ambientes de ”cloud”. Assim sendo, e tendo em conta as preocupações tidas no desenho de um serviço genérico, também é possível adaptar o nosso serviço a outros cenários de utilização

    Behrooz File System (BFS)

    Get PDF
    In this thesis, the Behrooz File System (BFS) is presented, which provides an in-memory distributed file system. BFS is a simple design which combines the best of in-memory and remote file systems. BFS stores data in the main memory of commodity servers and provides a shared unified file system view over them. BFS utilizes backend storage to provide persistency and availability. Unlike most existing distributed in-memory storage systems, BFS supports a general purpose POSIX-like file interface. BFS is built by grouping multiple servers’ memory together; therefore, if applications and BFS servers are co-located, BFS is a highly efficient design because this architecture minimizes inter-node communication. This pattern is common in distributed computing environments and data analytics applications. A set of microbenchmarks and SPEC SFS 2014 benchmark are used to evaluate different aspects of BFS, such as throughput, reliability, and scalability. The evaluation results indicate the simple design of BFS is successful in delivering the expected performance, while certain workloads reveal limitations of BFS in handling a large number of files. Addressing these limitations, as well as other potential improvements, are considered as future work
    corecore