319 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    ON OPTIMIZATIONS OF VIRTUAL MACHINE LIVE STORAGE MIGRATION FOR THE CLOUD

    Get PDF
    Virtual Machine (VM) live storage migration is widely performed in the data cen- ters of the Cloud, for the purposes of load balance, reliability, availability, hardware maintenance and system upgrade. It entails moving all the state information of the VM being migrated, including memory state, network state and storage state, from one physical server to another within the same data center or across different data centers. To minimize its performance impact, this migration process is required to be transparent to applications running within the migrating VM, meaning that ap- plications will keep running inside the VM as if there were no migration operations at all. In this dissertation, a thorough literature review is conducted to provide a big picture of the VM live storage migration process, its problems and existing solutions. After an in-depth examination, we observe that a severe IO interference between the VM IO threads and migration IO threads exists and causes both types of the IO threads to suffer from performance degradation. This interference stems from the fact that both types of IO threads share the same critical IO path by reading from and writing to the same shared storage system. Owing to IO resource contention and requests interference between the two different types of IO requests, not only will the IO request queue lengthens in the storage system, but the time-consuming disk seek operations will also become more frequent. Based on this fundamental observation, this dissertation research presents three related but orthogonal solutions that tackle the IO interference problem in order to improve the VM live storage migration performance. First, we introduce the Workload-Aware IO Outsourcing scheme, called WAIO, to improve the VM live storage migration efficiency. Second, we address this problem by proposing a novel scheme, called SnapMig, to improve the VM live storage migration efficiency and eliminate its performance impact on user applications at the source server by effectively leveraging the existing VM snapshots in the backup servers. Third, we propose the IOFollow scheme to improve both the VM performance and migration performance simultaneously. Finally, we outline the direction for the future research work. Advisor: Hong Jian

    Cloud-computing strategies for sustainable ICT utilization : a decision-making framework for non-expert Smart Building managers

    Get PDF
    Virtualization of processing power, storage, and networking applications via cloud-computing allows Smart Buildings to operate heavy demand computing resources off-premises. While this approach reduces in-house costs and energy use, recent case-studies have highlighted complexities in decision-making processes associated with implementing the concept of cloud-computing. This complexity is due to the rapid evolution of these technologies without standardization of approach by those organizations offering cloud-computing provision as a commercial concern. This study defines the term Smart Building as an ICT environment where a degree of system integration is accomplished. Non-expert managers are highlighted as key users of the outcomes from this project given the diverse nature of Smart Buildings’ operational objectives. This research evaluates different ICT management methods to effectively support decisions made by non-expert clients to deploy different models of cloud-computing services in their Smart Buildings ICT environments. The objective of this study is to reduce the need for costly 3rd party ICT consultancy providers, so non-experts can focus more on their Smart Buildings’ core competencies rather than the complex, expensive, and energy consuming processes of ICT management. The gap identified by this research represents vulnerability for non-expert managers to make effective decisions regarding cloud-computing cost estimation, deployment assessment, associated power consumption, and management flexibility in their Smart Buildings ICT environments. The project analyses cloud-computing decision-making concepts with reference to different Smart Building ICT attributes. In particular, it focuses on a structured programme of data collection which is achieved through semi-structured interviews, cost simulations and risk-analysis surveys. The main output is a theoretical management framework for non-expert decision-makers across variously-operated Smart Buildings. Furthermore, a decision-support tool is designed to enable non-expert managers to identify the extent of virtualization potential by evaluating different implementation options. This is presented to correlate with contract limitations, security challenges, system integration levels, sustainability, and long-term costs. These requirements are explored in contrast to cloud demand changes observed across specified periods. Dependencies were identified to greatly vary depending on numerous organizational aspects such as performance, size, and workload. The study argues that constructing long-term, sustainable, and cost-efficient strategies for any cloud deployment, depends on the thorough identification of required services off and on-premises. It points out that most of today’s heavy-burdened Smart Buildings are outsourcing these services to costly independent suppliers, which causes unnecessary management complexities, additional cost, and system incompatibility. The main conclusions argue that cloud-computing cost can differ depending on the Smart Building attributes and ICT requirements, and although in most cases cloud services are more convenient and cost effective at the early stages of the deployment and migration process, it can become costly in the future if not planned carefully using cost estimation service patterns. The results of the study can be exploited to enhance core competencies within Smart Buildings in order to maximize growth and attract new business opportunities

    Tunable Security for Deployable Data Outsourcing

    Get PDF
    Security mechanisms like encryption negatively affect other software quality characteristics like efficiency. To cope with such trade-offs, it is preferable to build approaches that allow to tune the trade-offs after the implementation and design phase. This book introduces a methodology that can be used to build such tunable approaches. The book shows how the proposed methodology can be applied in the domains of database outsourcing, identity management, and credential management

    Using TCP/IP traffic shaping to achieve iSCSI service predictability

    Get PDF
    This thesis reproduces the properties of load interference common in many storage devices using resource sharing for flexibility and maximum hardware utilization. The nature of resource sharing and load is studied and compared to assumptions and models used in previous work. The results are used to design a method for throttling iSCSI initiators, attached to an iSCSI target server, using a packet delay module in Linux Traffic Control. The packet delay throttle enables close-to-linear rate reduction for both read and write operations. Iptables and Ipset are used to add dynamic packet matching needed for rapidly changing throttling values. All throttling is achieved without triggering TCP retransmit timeout and subsequent slow start caused by packet loss. A control mechanism for dynamically adapting throttling values to rapidly changing workloads is implemented using a modified proportional integral derivative (PID) controller. Using experiments, control engineering filtering techniques and results from previous research, a suitable per resource saturation indicator was found. The indicator is an exponential moving average of the wait time of active resource consumers. It is used as input value to the PID controller managing the packet rates of resource consumers, creating a closed control loop managed by the PID controller. Finally a prototype of an autonomic resource prioritization framework is designed. The framework identifies and maintains information about resources, their consumers, their average wait time for active consumers and their set of throttleable consumers. The information is kept in shared memory and a PID controller is spawned for each resource, thus safeguarding read response times by throttling writers on a per-resource basis. The framework is exposed to extreme workload changes and demonstrates high ability to keep read response time below a predefined threshold. Using moderate tuning efforts the framework exhibits low overhead and resource consumption, promising suitability for large scale operation in production environments

    A Server Consolidation Solution

    Get PDF
    Advances in server architecture has enabled corporations the ability to strategically redesign their data centers in order to realign the system infrastructure to business needs. The architectural design of physically and logically consolidating servers into fewer and smaller hardware platforms can reduce data center overhead costs, while adding quality of service. In order for the organization to take advantage of the architectural opportunity a server consolidation project was proposed that utilized blade technology coupled with the virtualization of servers. Physical consolidation reduced the data center facility requirements, while server virtualization reduced the number of required hardware platforms. With the constant threat of outsourcing, coupled with the explosive growth of the organization, the IT managers were challenged to provide increased system services and functionality to a larger user community, while maintaining the same head count. A means of reducing overhead costs associated with the in-house data center was to reduce the required facility and hardware resources. The reduction in the data center footprint required less real estate, electricity, fire suppression infrastructure, and HVAC utilities. In addition, since the numerous stand alone servers were consolidated onto a standard platform system administration became more agile to business opportunities.

    Enhancing Existing Disaster Recovery Plans Using Backup Performance Indicators

    Get PDF
    Companies that perform data backup lose valuable data because they lack reliable data backup or restoration methods. The purpose of this study was to examine the need for a Six Sigma data backup performance indicator tool that clarifies the current state of a data backup method using an intuitive numerical scale. The theoretical framework for the study included backup theory, disaster recovery theory, and Six Sigma theory. The independent variables were implementation of data backup, data backup quality, and data backup confidence. The dependent variable was the need for a data backup performance indicator. An adapted survey instrument that measured an organization\u27s data backup plan, originally administered by Information Week, was used to survey 107 businesses with 15 to 250 employees in the Greater Cincinnati area. The results revealed that 69 out of 107 small businesses did not need a data backup performance indicator and the binary logistic regression model indicated no significant relationship between the dependent and independent variables. The conclusion of the study is that many small businesses have not experienced a disaster and cannot see the importance of a data backup indicator that quantifies recovery potential in case of a disaster. It is recommended that further research is required to determine if this phenomenon is only applicable only to small businesses in the Greater Cincinnati area through comparisons based on business size and location. This study contributes to positive social change through improvement of data backup, which enables organizations to quickly recover from a disaster, thereby saving jobs and contributing to the stability of city, state, and national economies

    A proposed model to analyse risk and return for a large computing system adoption

    No full text
    This thesis presents Organisational Sustainability Modelling (OSM), a new method to model and analyse risk and return systematically for the adoption of large systems such as Cloud Computing. Return includes improvements in technical efficiency, profitability and service. Risk includes controlled risk (risk-control rate) and uncontrolled risk (beta), although uncontrolled risk cannot be evaluated directly. Three OSM metrics, actual return value, expected return value and risk-control rate are used to calculate uncontrolled risk. The OSM data collection process in which hundreds of datasets (rows of data containing three OSM metrics in each row) are used as inputs is explained. Outputs including standard error, mean squared error, Durbin-Watson, p-value and R-squared value are calculated. Visualisation is used to illustrate quality and accuracy of data analysis. The metrics, process and interpretation of data analysis is presented and the rationale is explained in the review of the OSM method.Three case studies are used to illustrate the validity of OSM:• National Health Service (NHS) is a technical application concerned with backing up data files and focuses on improvement in efficiency.• Vodafone/Apple is a cost application and focuses on profitability.• The iSolutions Group, University of Southampton focuses on service improvement using user feedback.The NHS case study is explained in detail. The expected execution time calculated by OSM to complete all backup activity in Cloud-based systems matches actual execution time to within 0.01%. The Cloud system shows improved efficiency in both sets of comparisons. All three case studies confirm there are benefits for the adoption of a large computer system such as the Cloud. Together these demonstrations answer the two research questions for this thesis:1. How do you model and analyse risk and return on adoption of large computing systems systematically and coherently?2. Can the same method be used in risk mitigation of system adoption?Limitations of this study, a reproducibility case, comparisons with similar approaches, research contributions and future work are also presented
    corecore