Search CORE

9 research outputs found

Designing SSI clusters with hierarchical checkpointing and single I/O space

Author: Chow E
Hwang K
Jin H
Wang CL
Xu Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Adopting a new hierarchical checkpointing architecture, the authors develop a single I/O address space for building highly available clusters of computers. They propose a systematic approach to achieving a single system image by integrating existing middleware support with the newly developed features.published_or_final_versio

HKU Scholars Hub

SplitFS: Reducing Software Overhead in File Systems for Persistent Memory

Author: Belay Adam
Belay Adam
Chidambaram Vijay
DeBergalis Matt
Peter Simon
Tarasov Vasily
Volos Haris
Wallace Grant
Xu Jian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/09/2019
Field of study

We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-mapping the underlying file, and serving the read and overwrites using processor loads and stores. Metadata operations are handled by the kernel PM file system (ext4 DAX). SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from, without interfering with each other. SplitFS reduces software overhead by up-to 4x compared to the NOVA PM file system, and 17x compared to ext4-DAX. On a number of micro-benchmarks and applications such as the LevelDB key-value store running the YCSB benchmark, SplitFS increases application performance by up to 2x compared to ext4 DAX and NOVA while providing similar consistency guarantees

arXiv.org e-Print Archive

Crossref

File server scaling with network-attached secure disks

Author: Berend Ozceri
Chen Lee
David F. Nagle
David Rochbergt
Erik Riedel
Eugene M. Feinberg
Fay W. Chang
Garth A. Gibson
Howard Gobiofft
Jim Zelenka
Khalil Amiri
Publication venue: ACM Press
Publication date: 01/01/1997
Field of study

By providing direct data transfer between storage and client, net-work-attached storage devices have the potential to improve scal-ability for existing distributed file systems (by removing the server as a bottleneck) and bandwidth for new parallel and distributed file systems (through network striping and more efficient data paths). Together, these advantages influence a large enough fraction of the storage market to make commodity network-attached storage fea-sible. Realizing the technology’s full potential requires careful consideration across a wide range of file system, networking and security issues. This paper contrasts two network-attached storage architectures-(l) Networked SCSI disks (NetSCSI) are network-attached storage devices with minimal changes from the familiar SCSI interface, while (2) Network-Attached Secure Disks (NASD) are drives that support independent client access to drive object services. To estimate the potential performance benefits of these architectures, we develop an analytic model and perform trace-driven replay experiments based on AFS and NFS traces. Our results suggest that NetSCSI can reduce tile server load during a burst of NFS or AFS activity by about 30%. With the NASD archi-tecture, server load (during burst activity) can be reduced by a fac-tor of up to five for AFS and up to ten for NFS.

CiteSeerX

Crossref

File server scaling with network-attached secure disks

Author: Benner A.E
Berend Ozceri
Birrell A.D.
Chen Lee
Cooper E.
Dahlin M.
David F. Nagle
David Rochberg
Erik Riedel
Eugene M. Feinberg
Fay W. Chang
Garth A. Gibson
Golding R.
Hitz D.
Hitz D.
Howard Gobioff
IEEE
Jim Zelenka
Khalil Amiri
Long D.D.E.
Minshall G.
Ousterhout J.K.
Riedel E.
Sandberg R.
Storage Technology Corporation
Van Meter R.
Weingart S.H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Flexible allocation and space management in storage systems

Author: Kang Suk Woo
Publication venue: Texas A&M University
Publication date: 17/09/2007
Field of study

In this dissertation, we examine some of the challenges faced by the emerging networked storage systems. We focus on two main issues. Current file systems allocate storage statically at the time of their creation. This results in many suboptimal scenarios, for example: (a) space on the disk is not allocated well across multiple file systems, (b) data is not organized well for typical access patterns. We propose Virtual Allocation for flexible storage allocation. Virtual allocation separates storage allocation from the file system. It employs an allocate-on-write strategy, which lets applications fit into the actual usage of storage space without regard to the configured file system size. This improves flexibility by allowing storage space to be shared across different file systems. We present the design of virtual allocation and an evaluation of it through benchmarks based on a prototype system on Linux. Next, based on virtual allocation, we consider the problem of balancing locality and load in networked storage systems with multiple storage devices (or bricks). Data distribution affects locality and load balance across the devices in a networked storage system. We propose user-optimal data migration scheme which tries to balance locality and load balance in such networked storage systems. The presented approach automatically and transparently manages migration of data blocks among disks as data access patterns and loads change over time. We built a prototype system on Linux and present the design of user-optimal migration and an evaluation of it through realistic experiments

Texas A&M Repository

Optimizing the recovery of data consistency gossip algorithms on distributed object-store systems (CEPH)

Author: Mouratidis Theofilos
Μουρατίδης Θεόφιλος
Publication venue
Publication date: 01/01/2021
Field of study

Η αύξηση των δεδομένων στο Διαδίκτυο αυξάνεται ραγδαία και τα συστήματα αποθήκευσης και διατήρησης του τεράστιου όγκου πληροφοριών γίνονται όλο και ποιο δημοφιλή. Το Ceph είναι ένα κατανεμημένο σύστημα αποθήκευσης αντικειμένων για το χειρισμό μεγάλων ποσοτήτων δεδομένων. Το σύστημα αυτό αναπτύχθηκε αρχικά από τον Sage Weil (Redhat) και κερδίζει δημοτικότητα με την πάροδο του χρόνου. Το Ceph χρησιμοποιείται ως σύστημα αποθήκευσης μεγάλων δεδομένων σε μεγάλες εταιρείες όπως η CISCO, η CERN και η Deutche Telekom. Αν και είναι ένα δημοφιλές σύστημα, όπως και κάθε άλλο κατανεμημένο σύστημα, οι κόμβοι της συστάδας του αποτυγχάνουν με την πάροδο του χρόνου. Σε αυτήν την περίπτωση, θα πρέπει να πραγματοποιηθούν μηχανισμοί αποκατάστασης χαμένων δεδομένων για την επίλυση τυχόν προβλημάτων. Σε αυτή τη διατριβή, συστήνουμε έναν νέο τρόπο συγχρονισμού των δεδομένων μεταξύ των αντιγράφων για να κάνουμε τα δεδομένα συνεπή, εντοπίζοντας και φιλτράροντας τα αμετάβλητα αντικείμενα. Ο τρέχων αλγόριθμος για ανάκτηση χαμένων δεδομένων του Ceph είναι μια ανθεκτική αλλά απλοϊκή εφαρμογή σχετικά με την πρόσβαση στο δίσκο και την κατανάλωση μνήμης. Καθώς η τεχνολογία εξελίσσεται και γίνονται ταχύτερες λύσεις αποθήκευσης (π.χ. PCIe SSD, NVME), πρακτικές όπως το πρωτόκολλο προεγγραφής ημερολογίου (Write-Ahead Log) για τη συνέπεια των δεδομένων μπορούν επίσης να δημιουργήσουν νέα προβλήματα. Καταγράφοντας χιλιάδες εγγραφές ανά δευτερόλεπτο κάτω από μια υποβαθμισμένη συστάδα κόμβων μπορεί να αυξηθεί αρκετά γρήγορα η κατανάλωση μνήμης και να αποτύχει ένας κόμβος αποθήκευσης (η υποβαθμισμένη συστάδα είναι μια κατάσταση της συστάδας στην οποία ένας κόμβος αποθήκευσης είναι εκτός λειτουργίας για οποιονδήποτε λόγο). Παρόλο που το Ceph υποστηρίζει πλέον ένα ανώτατο όριο στον αριθμό των εγγραφών του WAL, αυτό το όριο επιτυγχάνεται συχνά και ακυρώνει το ημερολόγιο πλήρως, επειδή οι νέες εγγραφές θα χαθούν. Επομένως, το σύστημα στην τρέχουσα υλοποίηση χρειάζεται να ελέγχει κάθε αντικείμενο των κόμβων αντιγράφων, ώστε να μπορεί να τους συγχρονίσει, κάτι που προφανώς είναι μια πολύ αργή διαδικασία. Ως εκ τούτου, παρουσιάζουμε τα δέντρα Merkle ως μια εναλλακτική λύση στα φίλτρα Bloom, ώστε η διαδικασία ανάκτησης να μπορεί να εντοπίσει περιοχές όπου τα αντικείμενα δεν τροποποιήθηκαν και έτσι να μειώσει τον χρόνο ανάκτησης αυτών των δεδομένων. Η διαδικασία ανάκτησης έχει ένα εμφανές αντίκτυπο στην επίδοση των λειτουργιών των αντικειμένων (γράψιμο, ανάγνωση) των χρηστών και η συνολική εμπειρία για αυτούς μπορεί να βελτιωθεί με την μείωση των χρόνων ανάκτησης χαμένων δεδομένων της συστάδας. Σύμφωνα με τα πειράματα που πραγματοποιήσαμε, παρατηρούμε αύξηση απόδοσης της τάξης των 10% έως 400% που ποικίλλει ανάλογα με τον αριθμό των αντικειμένων που επηρεάστηκαν κατά τη διακοπή λειτουργίας ενός η περισσότερων κόμβων.The data growth on the internet is increasing rapidly and systems for storing and preserving the sheer volume of information are nowadays on the rise. Ceph is a distributed storage system for handling large amounts of data, it was initially developed by Sage Weil (Redhat) and it is gaining popularity over the years. Ceph is being used as a system for big data storage in large companies such as CISCO, CERN and Deutche Telekom. Although a popular system, as any other distributed system, its individual components fail over the course of time. In this case, the recovery mechanisms need to take place to resolve any issues. In this thesis, we introduce a new way to synchronise the data between the replicas to make the data consistent, by identifying and filtering unchanged objects. The current algorithm for recovery in Ceph is a durable yet simple implementation regarding disk access and memory consumption. As the technology evolves and faster storage solutions emerge (e.g. PCIe SSDs), practices such as Write-Ahead Logging for data consistency can also introduce new problems. Having thousands of write operations logged per second under a degraded cluster can rapidly increase memory consumption and fail a storage node (degraded is a cluster state in which a storage node is down for any reason). Although, Ceph now supports an upper limit on the number of entries in its WAL, this limit is often reached and it invalidates the log, because any new entries will be lost. Therefore, the system is left to check every object of the replicas so it can synchronize them, which is a very slow process. Hence, we introduce the Merkle trees as an alternative solution to Bloom filters so the recovery procedure can identify regions where objects were not modified and thus reduce the recovery time. The recovery process has an observable impact on the users’ IO bandwidth, and the overall experience for them can be improved by reducing the cluster’s recovery times. The benchmarks show a performance increase of 10% to 400% that varies with how many objects were affected during the downtime of a node

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Hyperscsi : Design and development of a new protocol for storage networking

Author: WANG YONG HONG
Publication venue
Publication date: 02/11/2005
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Arquitectura multiagente para E/S de alto rendimiento en clusters

Author: Pérez Hernández María de los Santos
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2003
Field of study

La E/S constituye en la actualidad uno de los principales cuellos de botella de los sistemas distribuidos de propósito general, debido al desequilibrio existente entre el tiempo de cómputo y de E/S. Una de las soluciones propuestas para este problema ha sido el uso de la E/S paralela. En esta área, se han originado un gran número de bibliotecas de E/S paralela y sistemas de ficheros paralelos. Este tipo de sistemas adolecen de algunos defectos y carencias. Muchos de ellos están concebidos para máquinas paralelas y no se integran adecuadamente en entornos distribuidos y clusters. El uso intensivo de clusters de estaciones de trabajo durante estos últimos años hace que este tipo de sistemas no sean adecuados en el escenario de computación actual. Otros sistemas, que se adaptan a este tipo de entornos, no incluyen capacidades de reconfiguración dinámica, por lo que tienen una funcionalidad limitada. Por último, la mayoría de los sistemas de E/S que utilizan diferentes optimizaciones de E/S, no ofrecen flexibilidad a las aplicaciones para hacer uso de las mismas, intentando ocultar al usuario este tipo de técnicas. No obstante, a fin de optimizar las operaciones de E/S, es importante que las aplicaciones sean capaces de describir sus patrones de acceso, interactuando con el sistema de E/S. En otro ámbito, dentro del área de los sistemas distribuidos se encuentra el paradigma de agentes, que permite dotar a las aplicaciones de un conjunto de propiedades muy adecuadas para su adaptación a entornos complejos y dinámicos. Las características de este paradigma lo hacen a priori prometedor para abordar algunos de los problemas existentes en el campo de la E/S paralela. Esta tesis propone una solución a la problemática actual de E/S a través de tres líneas principales: (i) el uso de la teoría de agentes en sistemas de E/S de alto rendimiento, (ii) la definición de un formalismo que permita la reconfiguración dinámica de nodos de almacenamiento en un cluster y (iii) el uso de técnicas de optimización de E/S configurables y orientadas a las aplicaciones

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM