46,528 research outputs found
Studies of disk arrays tolerating two disk failures and a proposal for a heterogeneous disk array
There has been an explosion in the amount of generated data in the past decade. Online access to these data is made possible by large disk arrays, especially in the RAID (Redundant Array of Independent Disks) paradigm. According to the RAID level a disk array can tolerate one or more disk failures, so that the storage subsystem can continue operating with disk failure(s). RAID 5 is a single disk failure tolerant array which dedicates the capacity of one disk to parity information. The content on the failed disk can be reconstructed on demand and written onto a spare disk. However, RAID5 does not provide enough protection for data since the data loss may occur when there is a media failure (unreadable sectors) or a second disk failure during the rebuild process. Due to the high cost of downtime in many applications, two disk failure tolerant arrays, such as RAID6 and EVENODD, have become popular. These schemes use 2/N of the capacity of the array for redundant information in order to tolerate two disk failures. RM2 is another scheme that can tolerate two disk failures, with slightly higher redundancy ratio. However, the performance of these two disk failure tolerant RAID schemes is impaired, since there are two check disks to be updated for each write request. Therefore, their performance, especially when there are disk failure(s), is of interest.
In the first part of the dissertation, the operations for the RAID5, RAID6, EVENODD and RM2 schemes are described. A cost model is developed for these RAID schemes by analyzing the operations in various operating modes. This cost model offers a measure of the volume of data being transmitted, and provides adevice-independent comparison of the efficiency of these RAID schemes. Based on this cost model, the maximum throughput of a RAID scheme can be obtained given detailed disk characteristic and RAID configuration. Utilizing M/G/1 queuing model and other favorable modeling assumptions, a queuing analysis to obtain the mean read response time is described. Simulation is used to validate analytic results, as well as to evaluate the RAID systems in analytically intractable cases.
The second part of this dissertation describes a new disk array architecture, namely Heterogeneous Disk Array (HDA). The HDA is motivated by a few observations of the trends in storage technology. The HDA architecture allows a disk array to have two forms of heterogeneity: (1) device heterogeneity, i.e., disks of different types can be incorporated in a single HDA; and (2) RAID level heterogeneity, i.e., various RAID schemes can coexist in the same array. The goal of this architecture is (1) utilizing the extra resource (i.e. bandwidth and capacity) introduced by new disk drives in an automated and efficient way; and (2) using appropriate RAID levels to meet the varying availability requirements for different applications.
In HDA, each new object is associated with an appropriate RAID level and the allocation is carried out in a way to keep disk bandwidth and capacity utilizations balanced. Design considerations for the data structures of HDA metadata are described, followed by the actual design of the data structures and flowcharts for the most frequent operations. Then a data allocation algorithm is described in detail. Finally, the HDA architecture is prototyped based on the DASim simulation toolkit developed at NJIT and simulation results of an HDA with two RAID levels (RAID 1 and RAIDS) are presented
Redundancy and Aging of Efficient Multidimensional MDS-Parity Protected Distributed Storage Systems
The effect of redundancy on the aging of an efficient Maximum Distance
Separable (MDS) parity--protected distributed storage system that consists of
multidimensional arrays of storage units is explored. In light of the
experimental evidences and survey data, this paper develops generalized
expressions for the reliability of array storage systems based on more
realistic time to failure distributions such as Weibull. For instance, a
distributed disk array system is considered in which the array components are
disseminated across the network and are subject to independent failure rates.
Based on such, generalized closed form hazard rate expressions are derived.
These expressions are extended to estimate the asymptotical reliability
behavior of large scale storage networks equipped with MDS parity-based
protection. Unlike previous studies, a generic hazard rate function is assumed,
a generic MDS code for parity generation is used, and an evaluation of the
implications of adjustable redundancy level for an efficient distributed
storage system is presented. Results of this study are applicable to any
erasure correction code as long as it is accompanied with a suitable structure
and an appropriate encoding/decoding algorithm such that the MDS property is
maintained.Comment: 11 pages, 6 figures, Accepted for publication in IEEE Transactions on
Device and Materials Reliability (TDMR), Nov. 201
Offshore Turbine Arrays: Numerical Modeling and Experimental Validation
The interaction between wind turbines in a large wind farm needs to be better understood to reduce array losses and improve energy production. A numerical test bed for an array of offshore wind turbines was developed in the open-source computational fluid dynamics (CFD) framework OpenFOAM. It provides a computational tool which can be used in combination with physical model turbine array studies in the Flow Physics Facility (FPF) at UNH as well as other test facilities.
Turbines were modeled as actuator disks with turbulence sources to reduce computational cost. Both k-ϵ and k-ω SST turbulence models were utilized to capture the flow in the near-wall, wake, and free stream regions.
Experimental studies were performed in the FPF to validate the numerical results and to provide realistic initial and boundary conditions, for example turbulent boundary layer inlet velocity profiles. Mesh refinement and boundary condition studies were performed. Numerical simulations were executed on a custom-built server, designed to be the head node of a future CFD cluster. The entire project was built on open-source software to facilitate replication and expansion. The numerical model provides building blocks for simulations of large wind turbine arrays, computational resources permitting.
The numerical model currently replicates a three by one array of wind turbines in the FPF, and provides detailed insight into the array fluid dynamics
Workload-Based Configuration of MEMS-Based Storage Devices for Mobile Systems
Because of its small form factor, high capacity, and expected low cost, MEMS-based storage is a suitable storage technology for mobile systems. However, flash memory may outperform MEMS-based storage in terms of performance, and energy-efficiency. The problem is that MEMS-based storage devices have a large number (i.e., thousands) of heads, and to deliver peak performance, all heads must be deployed simultaneously to access each single sector. Since these devices are mechanical and thus some housekeeping information is needed for each head, this results in a huge capacity loss and increases the energy consumption of MEMS-based storage with respect to flash.
We solve this problem by proposing new techniques to lay out data in MEMS-based storage devices. Data layouts represent optimizations in a design space spanned by three parameters: the number of active heads, sector parallelism, and sector size. We explore this design space and show that by exploiting knowledge of the expected workload, MEMS-based devices can employ all heads, thus delivering peak performance, while decreasing the energy consumption and compromising only a little on the capacity. Our exploration shows that MEMS-based storage is competitive with flash in most cases, and outperforms flash in a few cases
Self-Repairing Disk Arrays
As the prices of magnetic storage continue to decrease, the cost of replacing
failed disks becomes increasingly dominated by the cost of the service call
itself. We propose to eliminate these calls by building disk arrays that
contain enough spare disks to operate without any human intervention during
their whole lifetime. To evaluate the feasibility of this approach, we have
simulated the behavior of two-dimensional disk arrays with n parity disks and
n(n-1)/2 data disks under realistic failure and repair assumptions. Our
conclusion is that having n(n+1)/2 spare disks is more than enough to achieve a
99.999 percent probability of not losing data over four years. We observe that
the same objectives cannot be reached with RAID level 6 organizations and would
require RAID stripes that could tolerate triple disk failures.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
Tools for modelling and simulating migration-based preservation
This report describes two tools for modelling and simulating the costs and risks of using IT storage systems for the long-term archiving of file-based AV assets. The tools include a model of storage costs, the ingest and access of files, the possibility of data corruption and loss from a range of mechanisms, and the impact of having limited resources with which to fulfill access requests and preservation actions. Applications include archive planning, development of a technology strategy, cost estimation for business planning, operational decision support, staff training and generally promoting awareness of the issues and challenges archives face in digital preservation
- …