1,781 research outputs found

    A construction for balancing non-binary sequences based on gray code prefixes

    Get PDF
    Abstract: We introduce a new construction for the balancing of non-binary sequences that make use of Gray codes for prefix coding. Our construction provides full encoding and decoding of sequences, including the prefix. This construction is based on a generalization of Knuth’s parallel balancing approach, which can handle very long information sequences. However, the overall sequence—composed of the information sequence, together with the prefix—must be balanced. This is reminiscent of Knuth’s serial algorithm. The encoding of our construction does not make use of lookup tables, while the decoding process is simple and can be done in parallel

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

    Reordering Columns for Smaller Indexes

    Get PDF
    Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.Comment: to appear in Information Science

    DNA–based data storage system

    Get PDF
    Despite the many advances in traditional data recording techniques, the surge of Big Data platforms and energy conservation issues has imposed new challenges to the storage community in terms of identifying extremely high volume, non-volatile and durable recording media. The potential for using macromolecules for ultra-dense storage was recognized as early as 1959 when Richard Feynman outlined his vision for nanotechnology in a lecture, “There is plenty of room at the bottom”. Among known macromolecules, DNA is unique insofar as it lends itself to implementations of non-volatile recording media of outstanding integrity and extremely high storage capacity. The basic system implementation steps for DNA-based data storage systems include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. In this work we advance the field of macromolecular data storage in three directions. First, we introduce the notion of weakly mutually uncorrelated (WMU) sequences. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. For this purpose, WMU sequences used for primer design in DNAbased data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefixsynchronized and cyclic codes. Second, we describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on the newly developed WMU coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications. Third, we demonstrate for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. Every solution for DNA-based data storage systems so far has exclusively focused on Illumina sequencing devices, but such sequencers are expensive and designed for laboratory use only. Instead, we propose using a new technology, MinION–Oxford Nanopore’s handheld sequencer. Nanopore sequencing is fast and cheap, but it results in reads with high error rates. To deal with this issue, we designed an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. As a proof of concept, we stored and sequenced around 3.6 kB of binary data that includes two compressed images (a Citizen Kane poster and a smiley face emoji), using a portable data storage system, and obtained error-free read-outs

    Fault-Tolerant Load Management for Real-Time Distributed Computer Systems

    Get PDF
    This paper presents a fault-tolerant scheme applicable to any decentralized load balancing algorithms used in soft real-time distributed systems. Using the theory of distance-transitive graphs for representing topologies of these systems, the proposed strategy partitions these systems into independent symmetric regions (spheres) centered at some control points. These central points, called fault-control points, provide a two-level task redundancy and efficiently re-distribute the load of failed nodes within their spheres. Using the algebraic characteristics of these topologies, it is shown that the identification of spheres and fault-control points is, in general, is an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard matrix. Assuming a realistic failure-repair system environment, the performance of the proposed strategy has been evaluated and compared with no fault environment, through an extensive and detailed simulation. For our fault-tolerant strategy, we propose two measures of goodness, namely, the percentage of re-scheduled tasks which meet their deadlines and the overhead incurred for fault management. It is shown that using the proposed strategy, up to 80% of the tasks can still meet their deadlines. The proposed strategy is general enough to be applicable to many networks, belonging to a number of families of distance transitive graphs. Through simulation, we have analyzed the sensitivity of this strategy to various system parameters and have shown that the performance degradation due to failures does not depend on these parameter. Also, the probability of a task being lost altogether due to multiple failures has been shown to be extremely low

    Quantum Low-Density Parity-Check Codes

    Get PDF
    Quantum error correction is an indispensable ingredient for scalable quantum computing. In this Perspective we discuss a particular class of quantum codes called “quantum low-density parity-check (LDPC) codes.” The codes we discuss are alternatives to the surface code, which is currently the leading candidate to implement quantum fault tolerance. We introduce the zoo of quantum LDPC codes and discuss their potential for making quantum computers robust with regard to noise. In particular, we explain recent advances in the theory of quantum LDPC codes related to certain product constructions and discuss open problems in the field
    • …
    corecore