Search CORE

4 research outputs found

Practical erasure codes for storage systems: the study of entanglement codes, an approach that propagates redundancy to increase reliability and performance

Author: Estrada Galiñanes Verónica del Carmen
Felber Pascal
Publication venue
Publication date: 10/10/2018
Field of study

Cette dissertation traite de la conception de codes d'effacement pratiques pour les systèmes de stockage. Les défaillances de disque physiques et logiques sont une source commune d'erreurs; cependant, il est prédit que les disques dur à plateaux vont rester le support de stockage standard dans les grands centres de données. Le stockage dans le cloud a besoin de codes efficaces pour devenir fiables malgré ses composants à faible coût. À mesure que les systèmes augmentent en taille et en complexité, leurs propriétés et leurs exigences peuvent changer. Lorsque les données vieillissent, elles sont habituellement déplacées vers des archives dédiées. Pourtant, les limites entre les systèmes de stockage et les archives deviennent diffuses à mesure que nous évoluons vers des applications nécessitant un faible temps de latence, telles que l’exploration de données des grandes archives scientifiques. De plus, la centralisation des services de sauvegarde dans le cloud amène des préoccupations en matière de protection de la vie privée et de coûts de maintenance. Certaines études suggèrent que les réseaux coopératifs peer-to-peer sont plus durables à long terme. Mais les noeuds peer-to-peer et les disques dur partagent une propriété indésirable: aucun des deux ne sont assez fiables pour un déploiement à grande échelle. La motivation de cette étude est de concevoir un code flexible et pratique qui offre une haute tolérance aux pannes pour améliorer la durabilité et la disponibilité des données même dans des scénarios de type "catastrophe". La survie des données est basée sur la robustesse et la redondance. Il est difficile de concevoir une solution basée sur des codes classiques qui tiennent compte de ces cinq aspects: disponibilité, fiabilité, sécurité, intégrité et maintenabilité. Les compromis sont généralement obtenus par la combinaison de différentes techniques. Cette thèse cherche à montrer que les techniques de codage basées exclusivement sur l'utilisation de réseaux parallèles (comme la réplication) ou concentrés sur l'utilisation de réseaux en série (comme dans les opérations de division et d'expansion utilisées dans les codes classiques d'effacement) ne permettent pas d'exploiter toutes les ressources disponibles dans le système. Les codes d'enchevêtrement (entanglement codes) créent une redondance en enroulant de nouveaux blocs de données avec les anciens, en construisant des chaînes de données enchevêtrées qui sont tissées dans un maillage croissant de contenu interdépendant. Nous proposons: 1) chaîne enchevêtrements, qui peuvent être ouverts ou fermés, comme des alternatives plus fiables que la mise en miroir de disques, 2) des enchevêtrements alpha pour obtenir une tolérance de panne extrêmement élevée avec un faible coût de stockage et des coûts de réparation faibles, et 3) des codes de robinet (spigot codes) pour réduire l'empreinte spatiale à partir de données enchevêtrées sans perte significative des propriétés de l'enchevêtrement. Ces codes peuvent exploiter efficacement le stockage et les ressources de bande passante en exploitant la puissance combinatoire de la fiabilité du réseau. En outre, leur conception flexible basée sur des chaînes virtuelles de données enchevêtrées offre une solution évolutive et adaptée aux besoins futurs. Enfin, en raison de la puissance combinatoire des données enchevêtrées, dans l'ensemble, les cinq aspects sont renforcés.This dissertation deals with the design of practical erasure codes for storage systems. Hardware and logical disk failures are a common source of system failures that may lead to data loss. Nevertheless, it is predicted that spinning disks would remain the standard storage medium in large datacenters. Cloud storage needs efficient codes to become reliable despite its low-cost components. As systems scale in size and complexity, their properties and requirements may change. When data ages, it is usually moved to dedicated archives. Yet the boundaries between storage systems and archives are getting diffuse as we move into applications that require low latency access such as mining data from large scientific archives. Moreover, the centralized approach of cloud backup services brings privacy and economics concerns. Some studies suggest that cooperative peer-to-peer networks are more sustainable for the long term. But peer-to-peer nodes and spinning disks share an undesirable property: both are unreliable. The motivation for this study is to design flexible and practical codes that can provide high fault-tolerance to improve data durability and availability even in catastrophic scenarios. Survivability comes through the strength built with redundancy. It is difficult to devise a solution based on classic codes that considers all aspects of dependability: availability, reliability, safety, integrity and maintainability. Compromises are generally found through the complex combination of many techniques. This thesis argues that codes that are based exclusively on the use of parallel networks (such as replication) or mainly on the use of serial networks (as it is seen in the split and expand operations behind classic erasure codes) do not leverage all the resources available in a system. Entanglement codes create redundancy by tangling new data blocks with old ones, building entangled data chains that are woven into a growing mesh of interdependent content. We propose: 1) open and close entanglements as more reliable alternatives than mirroring, 2) alpha entanglements to achieve extremely high fault-tolerance with low storage overhead and low repair costs, and 3) spigot codes to reduce the space footprint from entangled data without significant loss of the entanglement's properties. These codes can leverage storage and bandwidth resources efficiently by exploiting the combinatorial power of network reliability. Furthermore, their flexible design based on virtual chains of entangled data yields a scalable and suitable solution to accommodate future requirements. Finally, due to the combinatorial power of entangled data, all in all, dependability is boosted.Diese Dissertation beschäftigt sich mit der Gestaltung von praktischen Löschcodes für Speichersysteme. Hardware und logische Festplattenfehler sind eine häufige Quelle von Systemfehlern, die zu Datenverlust führen kann. Es wird jedoch vorausgesagt, dass Festplatten das Standard-Speichermedium in grossen Rechenzentren bleiben würden. Cloud-Storage benötigt effiziente Codes, um trotz seiner kostengünstigen Komponenten zuverlässig zu werden. Da Systeme in Grösse und Komplexität wachsen, können sich ihre Eigenschaften und Anforderungen ändern. Veraltete Daten werden normalerweise auf dedizierte Archive transferiert. Die Grenzen zwischen Speichersystemen und Archiven werden immer diffuser mit zunehmendem Einsatz von Anwendungen, die eine kurze Latenzzeit erfordern, wie z.B. Data-Mining aus grossen wissenschaftlichen Archiven. Darüber hinaus bringt der zentralisierte Ansatz von Cloud-Backup-Diensten Bedenken hinsichtlich der Privatsphäre und Wartungskosten auf lange Sicht. Einige Studien deuten darauf hin, dass kooperative Peer-to-Peer-Netzwerke langfristig nachhaltiger sind. Aber Peer-to-Peer-Knoten und Festplatten haben eine unerwünschte Eigenschaft: beide sind unzuverlässig. Die Motivation für diese Studie ist es, flexible und praktische Codes zu entwickeln, die eine hohe Fehlertoleranz zur Verfügung stellen können um die Datenlanglebigkeit und -verfügbarkeit auch in Katastrophenszenarien zu verbessern. Überlebensfähigkeit eines Speichersystems kommt durch die Stärke, die mit Redundanz gebaut wird. Es ist schwierig, eine Lösung zu entwickeln, die auf klassischen Codes basiert und die die folgenden fünf Aspekte berücksichtigt: Verfügbarkeit, Zuverlässigkeit, Sicherheit, Integrität und Wartbarkeit. Kompromisse werden in der Regel durch die komplexe Kombination von vielen Techniken gefunden. Diese These argumentiert, dass Codierungstechniken, die ausschliesslich auf der Verwendung von parallelen Netzwerken (wie z. B. Replikation) oder hauptsächlich auf der Verwendung von seriellen Netzwerken basieren (wie z. B. Split- und Erweiterungsoperationen in klassischen Löschcodes) nicht alle verfügbaren Ressourcen in einem System nutzen. Verschränkung-Codes (Entanglement-Codes) schaffen Redundanzen durch die Verknüpfung neuer Datenblöcke mit alten, wodurch eine verschränkte Datenkette in ein wachsendes Netz von voneinander abhängigen Inhalten gewebt wird. Wir schlagen vor: 1) offen und geschlossene Verschränkungsketten sind zuverlässigere Alternativen als Spiegelung, 2) Alpha-Verschränkung-Codes, um eine extrem hohe Fehlertoleranz mit niedrigem Speicheraufwand und niedrigen Reparaturkosten zu erzielen, und 3) Zapfencodes (Spigot-Codes), um den Platzbedarf von verschränkten Daten zu reduzieren ohne signifikanten Verlust der Eigenschaften der Verschränkung. Diese Codes können Speicher- und Bandbreitenressourcen effizient nutzen, indem sie die kombinatorische Stärke der Netzwerkzuverlässigkeit nutzen. Darüber hinaus liefert ihr flexibles Design basierend auf virtuellen Ketten von verschränkten Daten eine skalierbare und geeignete Lösung für zukünftige Anforderungen. Schliesslich wird aufgrund der kombinatorische Stärke der verschränkten Daten, alles in allem, die Vertrauenswürdigkeit gesteigert.Esta tesis trata del diseño de códigos de borrado que sean prácticos para sistemas de almacenamiento de datos. Los errores de disco, sean estos ocasionados por una falla de hardware o una falla lógica, son una fuente común de errores de sistema que pueden ocasionar pérdida de datos. Sin embargo, se predice que los discos duros giratorios seguirían siendo el medio de almacenamiento estándar en grandes centros de datos. El almacenamiento en la nube necesita códigos eficientes para que el sistema sea confiable a pesar de sus componentes de bajo costo. A medida que los sistemas varían en tamaño y complejidad, sus propiedades y requerimientos pueden cambiar. Cuando los datos envejecen, se los suele mover a archivos dedicados. Sin embargo, las fronteras entre los sistemas de almacenamiento y los archivos se vuelven difusas a medida que avanzamos hacia aplicaciones que requieren un acceso de baja latencia, como la minería de datos en grandes archivos científicos. Además, el enfoque centralizado de los servicios de respaldo en la nube trae consigo preocupaciones relacionadas con la privacidad y los costos de mantenimiento a largo plazo. Algunos estudios sugieren que las redes cooperativas peer-to-peer son más sostenibles a largo plazo. Pero los nodos peer-to-peer y los discos giratorios comparten una propiedad indeseable: ambos son poco confiables. La motivación para este estudio es diseñar códigos flexibles y prácticos que pueden proporcionar una alta tolerancia a fallos para mejorar la durabilidad y disponibilidad de datos incluso en escenarios catastróficos. La supervivencia de los datos se logra con la robustez que ofrece un almacenamiento de datos redundante. Es difícil concebir una solución basada en códigos clásicos que considere estos cinco aspectos: disponibilidad, fiabilidad, seguridad, integridad y mantenibilidad. Los compromisos se encuentran generalmente a través de la compleja combinación de varias técnicas. Esta tesis argumenta que las técnicas de codificación que se basan exclusivamente en el uso de redes paralelas (como la replicación) o principalmente en el uso de redes seriales (como se observa en las operaciones de división y expansión usadas en los códigos de borrado clásicos) no aprovechan todos los recursos disponibles en un sistema. Los códigos de entrelazado de datos (entanglement codes) crean redundancia al mezclar nuevos bloques de datos con los antiguos, construyendo cadenas de datos entrelazados que se tejen en una creciente malla de contenido interdependiente. Proponemos: 1) cadenas de datos entrelazados abiertas y cerradas como alternativas más confiables que el uso de discos espejados, 2) códigos de entrelazados alfa (alpha entanglement codes) para lograr una tolerancia a fallos extremadamente alta con bajos costos de almacenamiento y reparación, y 3) códigos de grifo (spigot codes) para reducir aún mas los gastos de almacenamiento sin pérdida significativa de las propiedades de los códigos de entrelazado. Estos códigos pueden aprovechar los recursos de almacenamiento y ancho de banda eficientemente aprovechando la potencia combinatoria de la confiabilidad de los elementos que conforman la red. Además, su diseño flexible basado en cadenas virtuales de datos entrelazados proporciona una solución escalable y adecuada para adaptarse a los requisitos futuros. Por último, debido a la potencia combinatoria de los datos entrelazados se refuerzan los cinco aspectos arriba mencionados

An erasure-resilient and compute-efficient coding scheme for storage applications

Author: Kalcher Sebastian
Publication venue
Publication date: 01/01/2013
Field of study

Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme

GSI Repository

Third International Symposium on Space Mission Operations and Ground Data Systems, part 1

Author: Rash James L.
Publication venue
Publication date
Field of study

Under the theme of 'Opportunities in Ground Data Systems for High Efficiency Operations of Space Missions,' the SpaceOps '94 symposium included presentations of more than 150 technical papers spanning five topic areas: Mission Management, Operations, Data Management, System Development, and Systems Engineering. The papers focus on improvements in the efficiency, effectiveness, productivity, and quality of data acquisition, ground systems, and mission operations. New technology, techniques, methods, and human systems are discussed. Accomplishments are also reported in the application of information systems to improve data retrieval, reporting, and archiving; the management of human factors; the use of telescience and teleoperations; and the design and implementation of logistics support for mission operations

The Music Sound

Author: Nicolae Sfetcu
Publication venue
Publication date: 01/01/2006
Field of study

A guide for music: compositions, events, forms, genres, groups, history, industry, instruments, language, live music, musicians, songs, musicology, techniques, terminology , theory, music video. Music is a human activity which involves structured and audible sounds, which is used for artistic or aesthetic, entertainment, or ceremonial purposes. The traditional or classical European aspects of music often listed are those elements given primacy in European-influenced classical music: melody, harmony, rhythm, tone color/timbre, and form. A more comprehensive list is given by stating the aspects of sound: pitch, timbre, loudness, and duration. Common terms used to discuss particular pieces include melody, which is a succession of notes heard as some sort of unit; chord, which is a simultaneity of notes heard as some sort of unit; chord progression, which is a succession of chords (simultaneity succession); harmony, which is the relationship between two or more pitches; counterpoint, which is the simultaneity and organization of different melodies; and rhythm, which is the organization of the durational aspects of music

PhilPapers