5 research outputs found

    Rebuild performance enhancement using onboard caching and delayed vacation termination in clustered raid 5

    Get PDF
    The Clustered Raid 5 (CRAID5) architecture with a parity group size(G) smaller than the number of disks(N) increases the load by the declustering ratio denoted by α = (G -1)/(N -1), which can be lesser than that in Raid 5 while switching to, and subsequently operating in rebuild mode. The Nearly Random Permutation (NRP) layout provides the flexibility to vary the declustering ratio (α) for a given N, and the Vacationing Server Model (VSM) of processing the rebuild requests provides acceptable rebuild and user response times. The rebuild performance and the user response time can be improved by introducing an onboard buffer in the disks, which caches a single track upon arrival of a rebuild request while in rebuild mode. Such an enhancement is proposed, and the architecture is described along with an analysis using the DASim simulation toolkit developed at NJIT. Also proposed is the delayed termination of vacations with two user requests as this improves the rebuild performance with a negligible negative impact on user response time. Finally, the effect of limiting the rebuild buffer on the rebuild performance is presented in the context of three different disk utilizations and declustering ratios

    Solar activity detection and prediction using image processing and machine learning techniques

    Get PDF
    The objective of the research in this dissertation is to develop the methods for automatic detection and prediction of solar activities, including prominence eruptions, emerging flux regions and solar flares. Image processing and machine learning techniques are applied in this study. These methods can be used for automatic observation of solar activities and prediction of space weather that may have great influence on the near earth environment. The research presented in this dissertation covers the following topics: i) automatic detection of prominence eruptions (PBs), ii) automatic detection of emerging flux regions (EFRs), and iii) automatic prediction of solar flares. In detection of prominence eruptions, an automated method is developed by combining image processing and pattern recognition techniques. Consecutive Hu solar images are used as the input. The image processing techniques, including image transformation, segmentation and morphological operations are used to extract the limb objects and measure the associated properties. The pattern recognition techniques, such as Support Vector Machine (SVM), are applied to classify all the objects and generate a list of identified the PBs as the output. In detection of emerging flux regions, an automatic detection method is developed by using multi-scale circular harmonic filters, Kalman filter and SVM. The method takes a sequence of consecutive Michelson Doppler Imager (MDI) magnetograms as the input. The multi-scale circular harmonic filters are applied to detect bipolar regions from the solar disk surface and these regions are traced by Kalman filter until their disappearance. Finally, a SVM classifier is applied to distinguish EFRs from the other regions based on statistical properties. In solar flare prediction, it is modeled as a conditional density estimation (CDE) problem. A novel method is proposed to solve the CDE problem using kernel-based nonlinear regression and moment-based density function reconstruction techniques. This method involves two main steps. In the first step, kernel-based nonlinear regression techniques are applied to predict the conditional moments of the target variable, such as flare peak intensity or flare index. In the second step, the condition density function is reconstructed based on the estimated moments. The method is compared with the traditional double-kernel density estimator, and the experimental results show that it yields the comparable performance of the double-kernel density estimator. The most important merit of this new method is that it can handle high dimensional data effectively, while the double-kernel density estimator has confined to the bivariate case due to the difficulty of determining optimal bandwidths. The method can be used to predict the conditional density function of either flare peak intensity or flare index, which shows that our method is of practical significance in automated flare forecasting

    Studies of disk arrays tolerating two disk failures and a proposal for a heterogeneous disk array

    Get PDF
    There has been an explosion in the amount of generated data in the past decade. Online access to these data is made possible by large disk arrays, especially in the RAID (Redundant Array of Independent Disks) paradigm. According to the RAID level a disk array can tolerate one or more disk failures, so that the storage subsystem can continue operating with disk failure(s). RAID 5 is a single disk failure tolerant array which dedicates the capacity of one disk to parity information. The content on the failed disk can be reconstructed on demand and written onto a spare disk. However, RAID5 does not provide enough protection for data since the data loss may occur when there is a media failure (unreadable sectors) or a second disk failure during the rebuild process. Due to the high cost of downtime in many applications, two disk failure tolerant arrays, such as RAID6 and EVENODD, have become popular. These schemes use 2/N of the capacity of the array for redundant information in order to tolerate two disk failures. RM2 is another scheme that can tolerate two disk failures, with slightly higher redundancy ratio. However, the performance of these two disk failure tolerant RAID schemes is impaired, since there are two check disks to be updated for each write request. Therefore, their performance, especially when there are disk failure(s), is of interest. In the first part of the dissertation, the operations for the RAID5, RAID6, EVENODD and RM2 schemes are described. A cost model is developed for these RAID schemes by analyzing the operations in various operating modes. This cost model offers a measure of the volume of data being transmitted, and provides adevice-independent comparison of the efficiency of these RAID schemes. Based on this cost model, the maximum throughput of a RAID scheme can be obtained given detailed disk characteristic and RAID configuration. Utilizing M/G/1 queuing model and other favorable modeling assumptions, a queuing analysis to obtain the mean read response time is described. Simulation is used to validate analytic results, as well as to evaluate the RAID systems in analytically intractable cases. The second part of this dissertation describes a new disk array architecture, namely Heterogeneous Disk Array (HDA). The HDA is motivated by a few observations of the trends in storage technology. The HDA architecture allows a disk array to have two forms of heterogeneity: (1) device heterogeneity, i.e., disks of different types can be incorporated in a single HDA; and (2) RAID level heterogeneity, i.e., various RAID schemes can coexist in the same array. The goal of this architecture is (1) utilizing the extra resource (i.e. bandwidth and capacity) introduced by new disk drives in an automated and efficient way; and (2) using appropriate RAID levels to meet the varying availability requirements for different applications. In HDA, each new object is associated with an appropriate RAID level and the allocation is carried out in a way to keep disk bandwidth and capacity utilizations balanced. Design considerations for the data structures of HDA metadata are described, followed by the actual design of the data structures and flowcharts for the most frequent operations. Then a data allocation algorithm is described in detail. Finally, the HDA architecture is prototyped based on the DASim simulation toolkit developed at NJIT and simulation results of an HDA with two RAID levels (RAID 1 and RAIDS) are presented

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    Rebuild Strategies for Redundant Disk Arrays

    No full text
    RAID5 performance is critical while rebuild is in progress, since in addition to the increased load to recreate lost data on demand, there is interference caused by rebuild requests. We report on simulation results, which show that processing user requests at a higher, rather than the same priority as rebuild requests, results in a lower response time for user requests, as well as reduced rebuild time. Several other parameters related to rebuild processing are also explored.
    corecore