13,808 research outputs found

    Compression of interferometric radio-astronomical data

    Full text link
    The volume of radio-astronomical data is a considerable burden in the processing and storing of radio observations with high time and frequency resolutions and large bandwidths. Lossy compression of interferometric radio-astronomical data is considered to reduce the volume of visibility data and to speed up processing. A new compression technique named "Dysco" is introduced that consists of two steps: a normalization step, in which grouped visibilities are normalized to have a similar distribution; and a quantization and encoding step, which rounds values to a given quantization scheme using a dithering scheme. Several non-linear quantization schemes are tested and combined with different methods for normalizing the data. Four data sets with observations from the LOFAR and MWA telescopes are processed with different processing strategies and different combinations of normalization and quantization. The effects of compression are measured in image plane. The noise added by the lossy compression technique acts like normal system noise. The accuracy of Dysco is depending on the signal-to-noise ratio of the data: noisy data can be compressed with a smaller loss of image quality. Data with typical correlator time and frequency resolutions can be compressed by a factor of 6.4 for LOFAR and 5.3 for MWA observations with less than 1% added system noise. An implementation of the compression technique is released that provides a Casacore storage manager and allows transparent encoding and decoding. Encoding and decoding is faster than the read/write speed of typical disks. The technique can be used for LOFAR and MWA to reduce the archival space requirements for storing observed data. Data from SKA-low will likely be compressible by the same amount as LOFAR. The same technique can be used to compress data from other telescopes, but a different bit-rate might be required.Comment: Accepted for publication in A&A. 13 pages, 8 figures. Abstract was abridge

    Computing in the RAIN: a reliable array of independent nodes

    Get PDF
    The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

    Reliability of Erasure Coded Storage Systems: A Geometric Approach

    Full text link
    We consider the probability of data loss, or equivalently, the reliability function for an erasure coded distributed data storage system under worst case conditions. Data loss in an erasure coded system depends on probability distributions for the disk repair duration and the disk failure duration. In previous works, the data loss probability of such systems has been studied under the assumption of exponentially distributed disk failure and disk repair durations, using well-known analytic methods from the theory of Markov processes. These methods lead to an estimate of the integral of the reliability function. Here, we address the problem of directly calculating the data loss probability for general repair and failure duration distributions. A closed limiting form is developed for the probability of data loss and it is shown that the probability of the event that a repair duration exceeds a failure duration is sufficient for characterizing the data loss probability. For the case of constant repair duration, we develop an expression for the conditional data loss probability given the number of failures experienced by a each node in a given time window. We do so by developing a geometric approach that relies on the computation of volumes of a family of polytopes that are related to the code. An exact calculation is provided and an upper bound on the data loss probability is obtained by posing the problem as a set avoidance problem. Theoretical calculations are compared to simulation results.Comment: 28 pages. 8 figures. Presented in part at IEEE International Conference on BigData 2013, Santa Clara, CA, Oct. 2013 and to be presented in part at 2014 IEEE Information Theory Workshop, Tasmania, Australia, Nov. 2014. New analysis added May 2015. Further Update Aug. 201
    corecore