41 research outputs found

    RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS

    Get PDF
    The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference

    Quality of Service based Retrieval Strategy for Distributed Video on Demand on Multiple Servers

    Get PDF
    The recent advances and development of inexpensive computers and high speed networking technology have enabled the Video on Demand (VoD) application to connect to shared-computing servers, replacing the traditional computing environments where each application was having its own dedicated computing hardware. The VoD application enables the viewer to select, from a list of video files, his favorite video file and watch its reproduction at will. Early video on demand applications were based on single video server where video streams are initiated from a single server, then with the increase in the number of the clients who became interested in VoD services, the focus became on Distributed VoD architectures (DVoD) where the context of distribution may be distributed system components, distributed streaming servers, distributed media content etc.The VoD server must handle several issues in order to be able to present a successful service. It has to receive the clients’ requests and analyze them, calculate the necessary resources for each request, and decide whether a request can be admitted or not. Once the request is admitted, the server must schedule the request, retrieve the required video data and send the video data in a timely manner so that the client does not suffer data starvation in his buffer during the video reproduction. So, the overall objective of a VoD service provider is to provide a better Quality of Service (QoS). Some issues related to QoS are-efficient use of bandwidth, providing better throughput etc.One of the important issues is to retrieve the video data from the servers in minimum time and to start the playback of the video at client side with a minimum waiting time. The overall time elapsed in retrieving the video data and starting the playback is known as access time. The thesis presents an efficient retrieval strategy for a distributed VoD environment where the basic objective is to minimize the access time by maintaining the presentation continuity at the client side. We have neglected some of the network parameters which may affect the access time, by assuming a high speed network between the servers and the client. The performance of the strategy has been analyzed and is compared with the referred PAR (Play After Retrieval) strategy. Further, the strategy is also analyzed under availability condition which is a more realistic approach

    Dependence-driven techniques in system design

    Get PDF
    Burstiness in workloads is often found in multi-tier architectures, storage systems, and communication networks. This feature is extremely important in system design because it can significantly degrade system performance and availability. This dissertation focuses on how to use knowledge of burstiness to develop new techniques and tools for performance prediction, scheduling, and resource allocation under bursty workload conditions.;For multi-tier enterprise systems, burstiness in the service times is catastrophic for performance. Via detailed experimentation, we identify the cause of performance degradation on the persistent bottleneck switch among various servers. This results in an unstable behavior that cannot be captured by existing capacity planning models. In this dissertation, beyond identifying the cause and effects of bottleneck switch in multi-tier systems, we also propose modifications to the classic TPC-W benchmark to emulate bursty arrivals in multi-tier systems.;This dissertation also demonstrates how burstiness can be used to improve system performance. Two dependence-driven scheduling policies, SWAP and ALoC, are developed. These general scheduling policies counteract burstiness in workloads and maintain high availability by delaying selected requests that contribute to burstiness. Extensive experiments show that both SWAP and ALoC achieve good estimates of service times based on the knowledge of burstiness in the service process. as a result, SWAP successfully approximates the shortest job first (SJF) scheduling without requiring a priori information of job service times. ALoC adaptively controls system load by infinitely delaying only a small fraction of the incoming requests.;The knowledge of burstiness can also be used to forecast the length of idle intervals in storage systems. In practice, background activities are scheduled during system idle times. The scheduling of background jobs is crucial in terms of the performance degradation of foreground jobs and the utilization of idle times. In this dissertation, new background scheduling schemes are designed to determine when and for how long idle times can be used for serving background jobs, without violating predefined performance targets of foreground jobs. Extensive trace-driven simulation results illustrate that the proposed schemes are effective and robust in a wide range of system conditions. Furthermore, if there is burstiness within idle times, then maintenance features like disk scrubbing and intra-disk data redundancy can be successfully scheduled as background activities during idle times

    Scalable analysis of stochastic process algebra models

    Get PDF
    The performance modelling of large-scale systems using discrete-state approaches is fundamentally hampered by the well-known problem of state-space explosion, which causes exponential growth of the reachable state space as a function of the number of the components which constitute the model. Because they are mapped onto continuous-time Markov chains (CTMCs), models described in the stochastic process algebra PEPA are no exception. This thesis presents a deterministic continuous-state semantics of PEPA which employs ordinary differential equations (ODEs) as the underlying mathematics for the performance evaluation. This is suitable for models consisting of large numbers of replicated components, as the ODE problem size is insensitive to the actual population levels of the system under study. Furthermore, the ODE is given an interpretation as the fluid limit of a properly defined CTMC model when the initial population levels go to infinity. This framework allows the use of existing results which give error bounds to assess the quality of the differential approximation. The computation of performance indices such as throughput, utilisation, and average response time are interpreted deterministically as functions of the ODE solution and are related to corresponding reward structures in the Markovian setting. The differential interpretation of PEPA provides a framework that is conceptually analogous to established approximation methods in queueing networks based on meanvalue analysis, as both approaches aim at reducing the computational cost of the analysis by providing estimates for the expected values of the performance metrics of interest. The relationship between these two techniques is examined in more detail in a comparison between PEPA and the Layered Queueing Network (LQN) model. General patterns of translation of LQN elements into corresponding PEPA components are applied to a substantial case study of a distributed computer system. This model is analysed using stochastic simulation to gauge the soundness of the translation. Furthermore, it is subjected to a series of numerical tests to compare execution runtimes and accuracy of the PEPA differential analysis against the LQN mean-value approximation method. Finally, this thesis discusses the major elements concerning the development of a software toolkit, the PEPA Eclipse Plug-in, which offers a comprehensive modelling environment for PEPA, including modules for static analysis, explicit state-space exploration, numerical solution of the steady-state equilibrium of the Markov chain, stochastic simulation, the differential analysis approach herein presented, and a graphical framework for model editing and visualisation of performance evaluation results

    Web page performance analysis

    Get PDF
    Computer systems play an increasingly crucial and ubiquitous role in human endeavour by carrying out or facilitating tasks and providing information and services. How much work these systems can accomplish, within a certain amount of time, using a certain amount of resources, characterises the systems’ performance, which is a major concern when the systems are planned, designed, implemented, deployed, and evolve. As one of the most popular computer systems, the Web is inevitably scrutinised in terms of performance analysis that deals with its speed, capacity, resource utilisation, and availability. Performance analyses for the Web are normally done from the perspective of the Web servers and the underlying network (the Internet). This research, on the other hand, approaches Web performance analysis from the perspective of Web pages. The performance metric of interest here is response time. Response time is studied as an attribute of Web pages, instead of being considered purely a result of network and server conditions. A framework that consists of measurement, modelling, and monitoring (3Ms) of Web pages that revolves around response time is adopted to support the performance analysis activity. The measurement module enables Web page response time to be measured and is used to support the modelling module, which in turn provides references for the monitoring module. The monitoring module estimates response time. The three modules are used in the software development lifecycle to ensure that developed Web pages deliver at worst satisfactory response time (within a maximum acceptable time), or preferably much better response time, thereby maximising the efficiency of the pages. The framework proposes a systematic way to understand response time as it is related to specific characteristics of Web pages and explains how individual Web page response time can be examined and improved

    DEVELOPMENT OF DIAGNOSTIC AND PROGNOSTIC METHODOLOGIES FOR ELECTRONIC SYSTEMS BASED ON MAHALANOBIS DISTANCE

    Get PDF
    Diagnostic and prognostic capabilities are one aspect of the many interrelated and complementary functions in the field of Prognostic and Health Management (PHM). These capabilities are sought after by industries in order to provide maximum operational availability of their products, maximum usage life, minimum periodic maintenance inspections, lower inventory cost, accurate tracking of part life, and no false alarms. Several challenges associated with the development and implementation of these capabilities are the consideration of a system's dynamic behavior under various operating environments; complex system architecture where the components that form the overall system have complex interactions with each other with feed-forward and feedback loops of instructions; the unavailability of failure precursors; unseen events; and the absence of unique mathematical techniques that can address fault and failure events in various multivariate systems. The Mahalanobis distance methodology distinguishes multivariable data groups in a multivariate system by a univariate distance measure calculated from the normalized value of performance parameters and their correlation coefficients. The Mahalanobis distance measure does not suffer from the scaling effect--a situation where the variability of one parameter masks the variability of another parameter, which happens when the measurement ranges or scales of two parameters are different. A literature review showed that the Mahalanobis distance has been used for classification purposes. In this thesis, the Mahalanobis distance measure is utilized for fault detection, fault isolation, degradation identification, and prognostics. For fault detection, a probabilistic approach is developed to establish threshold Mahalanobis distance, such that presence of a fault in a product can be identified and the product can be classified as healthy or unhealthy. A technique is presented to construct a control chart for Mahalanobis distance for detecting trends and biasness in system health or performance. An error function is defined to establish fault-specific threshold Mahalanobis distance. A fault isolation approach is developed to isolate faults by identifying parameters that are associated with that fault. This approach utilizes the design-of-experiment concept for calculating residual Mahalanobis distance for each parameter (i.e., the contribution of each parameter to a system's health determination). An expected contribution range for each parameter estimated from the distribution of residual Mahalanobis distance is used to isolate the parameters that are responsible for a system's anomalous behavior. A methodology to detect degradation in a system's health using a health indicator is developed. The health indicator is defined as the weighted sum of a histogram bin's fractional contribution. The histogram's optimal bin width is determined from the number of data points in a moving window. This moving window approach is utilized for progressive estimation of the health indicator over time. The health indicator is compared with a threshold value defined from the system's healthy data to indicate the system's health or performance degradation. A symbolic time series-based health assessment approach is developed. Prognostic measures are defined for detecting anomalies in a product and predicting a product's time and probability of approaching a faulty condition. These measures are computed from a hidden Markov model developed from the symbolic representation of product dynamics. The symbolic representation of a product's dynamics is obtained by representing a Mahalanobis distance time series in symbolic form. Case studies were performed to demonstrate the capability of the proposed methodology for real time health monitoring. Notebook computers were exposed to a set of environmental conditions representative of the extremes of their life cycle profiles. The performance parameters were monitored in situ during the experiments, and the resulting data were used as a training dataset. The dataset was also used to identify specific parameter behavior, estimate correlation among parameters, and extract features for defining a healthy baseline. Field-returned computer data and data corresponding to artificially injected faults in computers were used as test data

    Building information modeling – A game changer for interoperability and a chance for digital preservation of architectural data?

    Get PDF
    Digital data associated with the architectural design-andconstruction process is an essential resource alongside -and even past- the lifecycle of the construction object it describes. Despite this, digital architectural data remains to be largely neglected in digital preservation research – and vice versa, digital preservation is so far neglected in the design-and-construction process. In the last 5 years, Building Information Modeling (BIM) has seen a growing adoption in the architecture and construction domains, marking a large step towards much needed interoperability. The open standard IFC (Industry Foundation Classes) is one way in which data is exchanged in BIM processes. This paper presents a first digital preservation based look at BIM processes, highlighting the history and adoption of the methods as well as the open file format standard IFC (Industry Foundation Classes) as one way to store and preserve BIM data
    corecore