1,110 research outputs found

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    High-Performance Cloud Computing: A View of Scientific Applications

    Full text link
    Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape

    Cloud-computing strategies for sustainable ICT utilization : a decision-making framework for non-expert Smart Building managers

    Get PDF
    Virtualization of processing power, storage, and networking applications via cloud-computing allows Smart Buildings to operate heavy demand computing resources off-premises. While this approach reduces in-house costs and energy use, recent case-studies have highlighted complexities in decision-making processes associated with implementing the concept of cloud-computing. This complexity is due to the rapid evolution of these technologies without standardization of approach by those organizations offering cloud-computing provision as a commercial concern. This study defines the term Smart Building as an ICT environment where a degree of system integration is accomplished. Non-expert managers are highlighted as key users of the outcomes from this project given the diverse nature of Smart Buildings’ operational objectives. This research evaluates different ICT management methods to effectively support decisions made by non-expert clients to deploy different models of cloud-computing services in their Smart Buildings ICT environments. The objective of this study is to reduce the need for costly 3rd party ICT consultancy providers, so non-experts can focus more on their Smart Buildings’ core competencies rather than the complex, expensive, and energy consuming processes of ICT management. The gap identified by this research represents vulnerability for non-expert managers to make effective decisions regarding cloud-computing cost estimation, deployment assessment, associated power consumption, and management flexibility in their Smart Buildings ICT environments. The project analyses cloud-computing decision-making concepts with reference to different Smart Building ICT attributes. In particular, it focuses on a structured programme of data collection which is achieved through semi-structured interviews, cost simulations and risk-analysis surveys. The main output is a theoretical management framework for non-expert decision-makers across variously-operated Smart Buildings. Furthermore, a decision-support tool is designed to enable non-expert managers to identify the extent of virtualization potential by evaluating different implementation options. This is presented to correlate with contract limitations, security challenges, system integration levels, sustainability, and long-term costs. These requirements are explored in contrast to cloud demand changes observed across specified periods. Dependencies were identified to greatly vary depending on numerous organizational aspects such as performance, size, and workload. The study argues that constructing long-term, sustainable, and cost-efficient strategies for any cloud deployment, depends on the thorough identification of required services off and on-premises. It points out that most of today’s heavy-burdened Smart Buildings are outsourcing these services to costly independent suppliers, which causes unnecessary management complexities, additional cost, and system incompatibility. The main conclusions argue that cloud-computing cost can differ depending on the Smart Building attributes and ICT requirements, and although in most cases cloud services are more convenient and cost effective at the early stages of the deployment and migration process, it can become costly in the future if not planned carefully using cost estimation service patterns. The results of the study can be exploited to enhance core competencies within Smart Buildings in order to maximize growth and attract new business opportunities

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    Digital Twin in the IoT context: a survey on technical features, scenarios and architectural models

    Get PDF
    Digital Twin is an emerging concept that is gaining attention in various industries. It refers to the ability to clone a physical object into a software counterpart. The softwarized object, termed logical object, reflects all the important properties and characteristics of the original object within a specific application context. To fully determine the expected properties of the Digital Twin, this paper surveys the state of the art starting from the original definition within the manufacturing industry. It takes into account related proposals emerging in other fields, namely, Augmented and Virtual Reality (e.g., avatars), Multi-agent systems, and virtualization. This survey thereby allows for the identification of an extensive set of Digital Twin features that point to the “softwarization” of physical objects. To properly consolidate a shared Digital Twin definition, a set of foundational properties is identified and proposed as a common ground outlining the essential characteristics (must-haves) of a Digital Twin. Once the Digital Twin definition has been consolidated, its technical and business value is discussed in terms of applicability and opportunities. Four application scenarios illustrate how the Digital Twin concept can be used and how some industries are applying it. The scenarios also lead to a generic DT architectural Model. This analysis is then complemented by the identification of software architecture models and guidelines in order to present a general functional framework for the Digital Twin. The paper, eventually, analyses a set of possible evolution paths for the Digital Twin considering its possible usage as a major enabler for the softwarization process
    corecore