26 research outputs found

    Hyperscsi : Design and development of a new protocol for storage networking

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Security comparison of ownCloud, Nextcloud, and Seafile in open source cloud storage solutions

    Get PDF
    Cloud storage has become one of the most efficient and economical ways to store data over the web. Although most organizations have adopted cloud storage, there are numerous privacy and security concerns about cloud storage and collaboration. Furthermore, adopting public cloud storage may be costly for many enterprises. An open-source cloud storage solution for cloud file sharing is a possible alternative in this instance. There is limited information on system architecture, security measures, and overall throughput consequences when selecting open-source cloud storage solutions despite widespread awareness. There are no comprehensive comparisons available to evaluate open-source cloud storage solutions (specifically owncloud, nextcloud, and seafile) and analyze the impact of platform selections. This thesis will present the concept of cloud storage, a comprehensive understanding of three popular open-source features, architecture, security features, vulnerabilities, and other angles in detail. The goal of the study is to conduct a comparison of these cloud solutions so that users may better understand the various open-source cloud storage solutions and make more knowledgeable selections. The author has focused on four attributes: features, architecture, security, and vulnerabilities of three cloud storage solutions ("ownCloud," "Nextcloud," and "Seafile") since most of the critical issues fall into one of these classifications. The findings show that, while the three services take slightly different approaches to confidentiality, integrity, and availability, they all achieve the same purpose. As a result of this research, the user will have a better understanding of the factors and will be able to make a more informed decision on cloud storage options

    Assessment of In-Cloud Enterprise Resource Planning System Performed in a Virtual Cluster

    Get PDF
    This paper introduces a high-performed high-availability in-cloud enterprise resources planning (in-cloud ERP) which has deployed in the virtual machine cluster. The proposed approach can resolve the crucial problems of ERP failure due to unexpected downtime and failover between physical hosts in enterprises, causing operation termination and hence data loss. Besides, the proposed one together with the access control authentication and network security is capable of preventing intrusion hacked and/or malicious attack via internet. Regarding system assessment, cost-performance (C-P) ratio, a remarkable cost effectiveness evaluation, has been applied to several remarkable ERP systems. As a result, C-P ratio evaluated from the experiments shows that the proposed approach outperforms two well-known benchmark ERP systems, namely, in-house ECC 6.0 and in-cloud ByDesign

    Replication and Caching Systems for the support of VMs stored in File Systems with Snapshots

    Get PDF
    Recently, in a relatively short timeframe, there were fundamental changes in the way computing power is used. Virtualisation technology has changed both the model of a data centre’s infrastructure and the way physical computers are now managed. This shift is a consequence of today’s fast deployment rate of Virtual Machines (VM) in a high consolidation environment with minimal need for human management. New approaches to virtualisation techniques are being developed at a surprisingly fast rate, leading to a new exciting and vibrating ecosystem of platforms and services. We see the big industry players tackling problems such as Desktop Virtualisation with moderate success, but completely ignoring the computation power already present in their clients’ infrastructures and, instead, opting for a costly solution based on powerful new machines. There’s still room for improvement in Virtual Desktop Infrastructure (VDI) and development of new architectures that take advantage of the computation power available at the user’s desk, with a minimum effort on the management side; Infrastructure for Client-Based Desktops (iCBD) is one of these projects. This thesis focuses on the development of mechanisms for the replication and caching of VM images stored in a local filesystem, albeit one with the ability to perform snapshots. In this work, there are some challenges to address: the proposed architecture must be entirely distributed and completely integrated with the already existing client-based VDI platform; and it must be able to efficiently cope with very large, read-only files, (some of them snapshots) and handle their multiple versions. This work will also explore the challenges and advantages of deploying such a system in a high throughput network, with both high availability and scalability while efficiently supporting a large number of users (and their workstations)

    A shared-disk parallel cluster file system

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance computing (HPC) as well as IT environments. HPC and IT are quite different environments and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs). These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds. Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include: · Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data. · Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required). A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was slightly modified, and two kernel modules and a user-level daemon were added. In the prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN. Our benchmarks for non-overlapping writers over a single file shared among processes running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa IBM Shared University Research (SUR

    File system metadata virtualization

    Get PDF
    The advance of computing systems has brought new ways to use and access the stored data that push the architecture of traditional file systems to its limits, making them inadequate to handle the new needs. Current challenges affect both the performance of high-end computing systems and its usability from the applications perspective. On one side, high-performance computing equipment is rapidly developing into large-scale aggregations of computing elements in the form of clusters, grids or clouds. On the other side, there is a widening range of scientific and commercial applications that seek to exploit these new computing facilities. The requirements of such applications are also heterogeneous, leading to dissimilar patterns of use of the underlying file systems. Data centres have tried to compensate this situation by providing several file systems to fulfil distinct requirements. Typically, the different file systems are mounted on different branches of a directory tree, and the preferred use of each branch is publicised to users. A similar approach is being used in personal computing devices. Typically, in a personal computer, there is a visible and clear distinction between the portion of the file system name space dedicated to local storage, the part corresponding to remote file systems and, recently, the areas linked to cloud services as, for example, directories to keep data synchronized across devices, to be shared with other users, or to be remotely backed-up. In practice, this approach compromises the usability of the file systems and the possibility of exploiting all the potential benefits. We consider that this burden can be alleviated by determining applicable features on a per-file basis, and not associating them to the location in a static, rigid name space. Moreover, usability would be further increased by providing multiple dynamic name spaces that could be adapted to specific application needs. This thesis contributes to this goal by proposing a mechanism to decouple the user view of the storage from its underlying structure. The mechanism consists in the virtualization of file system metadata (including both the name space and the object attributes) and the interposition of a sensible layer to take decisions on where and how the files should be stored in order to benefit from the underlying file system features, without incurring on usability or performance penalties due to inadequate usage. This technique allows to present multiple, simultaneous virtual views of the name space and the file system object attributes that can be adapted to specific application needs without altering the underlying storage configuration. The first contribution of the thesis introduces the design of a metadata virtualization framework that makes possible the above-mentioned decoupling; the second contribution consists in a method to improve file system performance in large-scale systems by using such metadata virtualization framework; finally, the third contribution consists in a technique to improve the usability of cloud-based storage systems in personal computing devices.Postprint (published version

    Bayesian Prognostic Framework for High-Availability Clusters

    Get PDF
    Critical services from domains as diverse as finance, manufacturing and healthcare are often delivered by complex enterprise applications (EAs). High-availability clusters (HACs) are software-managed IT infrastructures that enable these EAs to operate with minimum downtime. To that end, HACs monitor the health of EA layers (e.g., application servers and databases) and resources (i.e., components), and attempt to reinitialise or restart failed resources swiftly. When this is unsuccessful, HACs try to failover (i.e., relocate) the resource group to which the failed resource belongs to another server. If the resource group failover is also unsuccessful, or when a system-wide critical failure occurs, HACs initiate a complete system failover. Despite the availability of multiple commercial and open-source HAC solutions, these HACs (i) disregard important sources of historical and runtime information, and (ii) have limited reasoning capabilities. Therefore, they may conservatively perform unnecessary resource group or system failovers or delay justified failovers for longer than necessary. This thesis introduces the first HAC taxonomy, uses it to carry out an extensive survey of current HAC solutions, and develops a novel Bayesian prognostic (BP) framework that addresses the significant HAC limitations that are mentioned above and are identified by the survey. The BP framework comprises four \emph{modules}. The first module is a technique for modelling high availability using a combination of established and new HAC characteristics. The second is a suite of methods for obtaining and maintaining the information required by the other modules. The third is a HAC-independent Bayesian decision network (BDN) that predicts whether resource failures can be managed locally (i.e., without failovers). The fourth is a method for constructing a HAC-specific Bayesian network for the fast prediction of resource group and system failures. Used together, these modules reduce the downtime of HAC-protected EAs significantly. The experiments presented in this thesis show that the BP framework can deliver downtimes between 5.5 and 7.9 times smaller than those obtained with an established open-source HAC

    Scalability in extensible and heterogeneous storage systems

    Get PDF
    The evolution of computer systems has brought an exponential growth in data volumes, which pushes the capabilities of current storage architectures to organize and access this information effectively: as the unending creation and demand of computer-generated data grows at an estimated rate of 40-60% per year, storage infrastructures need increasingly scalable data distribution layouts that are able to adapt to this growth with adequate performance. In order to provide the required performance and reliability, large-scale storage systems have traditionally relied on multiple RAID-5 or RAID-6 storage arrays, interconnected with high-speed networks like FibreChannel or SAS. Unfortunately, the performance of the current, most commonly-used storage technology-the magnetic disk drive-can't keep up with the rate of growth needed to sustain this explosive growth. Moreover, storage architectures based on solid-state devices (the successors of current magnetic drives) don't seem poised to replace HDD-based storage for the next 5-10 years, at least in data centers. Though the performance of SSDs significantly improves that of hard drives, it would cost the NAND industry hundreds of billions of dollars to build enough manufacturing plants to satisfy the forecasted demand. Besides the problems derived from technological and mechanical limitations, the massive data growth poses more challenges: to build a storage infrastructure, the most flexible approach consists in using pools of storage devices that can be expanded as needed by adding new devices or replacing older ones, thus seamlessly increasing the system's performance and capacity. This approach however, needs data layouts that can adapt to these topology changes and also exploit the potential performance offered by the hardware. Such strategies should be able to rebuild the data layout to accommodate the new devices in the infrastructure, extracting the utmost performance from the hardware and offering a balanced workload distribution. An inadequate data layout might not effectively use the enlarged capacity or better performance provided by newer devices, thus leading to unbalancing problems like bottlenecks or resource underusage. Besides, massive storage systems will inevitably be composed of a collection of heterogeneous hardware: as capacity and performance requirements grow, new storage devices must be added to cope with demand, but it is unlikely that these devices will have the same capacity or performance of those installed. Moreover, upon failure, disks are most commonly replaced by faster and larger ones, since it is not always easy (or cheap) to find a particular model of drive. In the long run, any large-scale storage system will have to cope with a myriad of devices. The title of this dissertation, "Scalability in Extensible and Heterogeneous Storage Systems", refers to the main focus of our contributions in scalable data distributions that can adapt to increasing volumes of data. Our first contribution is the design of a scalable data layout that can adapt to hardware changes while redistributing only the minimum data to keep a balanced workload. With the second contribution, we perform a comparative study on the influence of pseudo-random number generators in the performance and distribution quality of randomized layouts and prove that a badly chosen generator can degrade the quality of the strategy. Our third contribution is an an analysis of long-term data access patterns in several real-world traces to determine if it is possible to offer high performance and a balanced load with less than minimal data rebalancing. In our final contribution, we apply the knowledge learnt about long-term access patterns to design an extensible RAID architecture that can adapt to changes in the number of disks without migrating large amounts of data, and prove that it can be competitive with current RAID arrays with an overhead of at most 1.28% the storage capacity.L'evolució dels sistemes de computació ha dut un creixement exponencial dels volums de dades, que porta al límit la capacitat d'organitzar i accedir informació de les arquitectures d'emmagatzemament actuals. Amb una incessant creació de dades que creix a un ritme estimat del 40-60% per any, les infraestructures de dades requereixen de distribucions de dades cada cop més escalables que puguin adaptar-se a aquest creixement amb un rendiment adequat. Per tal de proporcionar aquest rendiment, els sistemes d'emmagatzemament de gran escala fan servir agregacions RAID5 o RAID6 connectades amb xarxes d'alta velocitat com FibreChannel o SAS. Malauradament, el rendiment de la tecnologia més emprada actualment, el disc magnètic, no creix prou ràpid per sostenir tal creixement explosiu. D'altra banda, les prediccions apunten que els dispositius d'estat sòlid, els successors de la tecnologia actual, no substituiran els discos magnètics fins d'aquí 5-10 anys. Tot i que el rendiment és molt superior, la indústria NAND necessitarà invertir centenars de milions de dòlars per construir prou fàbriques per satisfer la demanda prevista. A més dels problemes derivats de limitacions tècniques i mecàniques, el creixement massiu de les dades suposa més problemes: la solució més flexible per construir una infraestructura d'emmagatzematge consisteix en fer servir grups de dispositius que es poden fer créixer bé afegint-ne de nous, bé reemplaçant-ne els més vells, incrementant així la capacitat i el rendiment del sistema de forma transparent. Aquesta solució, però, requereix distribucions de dades que es puguin adaptar a aquests canvis a la topologia i explotar el rendiment potencial que el hardware ofereix. Aquestes distribucions haurien de poder reconstruir la col.locació de les dades per acomodar els nous dispositius, extraient-ne el màxim rendiment i oferint una càrrega de treball balancejada. Una distribució inadient pot no fer servir de manera efectiva la capacitat o el rendiment addicional ofert pels nous dispositius, provocant problemes de balanceig com colls d¿ampolla o infrautilització. A més, els sistemes d'emmagatzematge massius estaran inevitablement formats per hardware heterogeni: en créixer els requisits de capacitat i rendiment, es fa necessari afegir nous dispositius per poder suportar la demanda, però és poc probable que els dispositius afegits tinguin la mateixa capacitat o rendiment que els ja instal.lats. A més, en cas de fallada, els discos són reemplaçats per d'altres més ràpids i de més capacitat, ja que no sempre és fàcil (o barat) trobar-ne un model particular. A llarg termini, qualsevol arquitectura d'emmagatzematge de gran escala estarà formada per una miríade de dispositius diferents. El títol d'aquesta tesi, "Scalability in Extensible and Heterogeneous Storage Systems", fa referència a les nostres contribucions a la recerca de distribucions de dades escalables que es puguin adaptar a volums creixents d'informació. La primera contribució és el disseny d'una distribució escalable que es pot adaptar canvis de hardware només redistribuint el mínim per mantenir un càrrega de treball balancejada. A la segona contribució, fem un estudi comparatiu sobre l'impacte del generadors de números pseudo-aleatoris en el rendiment i qualitat de les distribucions pseudo-aleatòries de dades, i provem que una mala selecció del generador pot degradar la qualitat de l'estratègia. La tercera contribució és un anàlisi dels patrons d'accés a dades de llarga duració en traces de sistemes reals, per determinar si és possible oferir un alt rendiment i una bona distribució amb una rebalanceig inferior al mínim. A la contribució final, apliquem el coneixement adquirit en aquest estudi per dissenyar una arquitectura RAID extensible que es pot adaptar a canvis en el número de dispositius sense migrar grans volums de dades, i demostrem que pot ser competitiva amb les distribucions ideals RAID actuals, amb només una penalització del 1.28% de la capacita

    Intelligent Management of Virtualised Computer Based Workloads and Systems

    Get PDF
    Managing the complexity within virtualised IT infrastructure platforms is a common problem for many organisations today. Computer systems are often highly consolidated into a relatively small physical footprint compared with previous decades prior to late 2000s, so much thought, planning and control is necessary to effectively operate such systems within the enterprise computing space. With the development of private, hybrid and public cloud utility computing this has become even more relevant; this work examines how such cloud systems are using virtualisation technology and embedded software to leverage advantages, and it uses a fresh approach of developing and creating an Intelligent decision engine (expert system). Its aim is to help reduce the complexity of managing virtualised computer-based platforms, through tight integration, high-levels of automation to minimise human inputs, errors, and enforce standards and consistency, in order to achieve better management and control. The thesis investigates whether an expert system known as the Intelligent Decision Engine (IDE) could aid the management of virtualised computer-based platforms. Through conducting a series of mixed quantitative and qualitative experiments in the areas of research, the initial findings and evaluation are presented in detail, using repeatable and observable processes and provide detailed analysis on the recorded outputs. The results of the investigation establish the advantages of using the IDE (expert system) to achieve the goal of reducing the complexity of managing virtualised computer-based platforms. In each detailed area examined, it is demonstrated how using a global management approach in combination with VM provisioning, migration, failover, and system resource controls can create a powerful autonomous system
    corecore