3,311 research outputs found

    C2MS: Dynamic Monitoring and Management of Cloud Infrastructures

    Full text link
    Server clustering is a common design principle employed by many organisations who require high availability, scalability and easier management of their infrastructure. Servers are typically clustered according to the service they provide whether it be the application(s) installed, the role of the server or server accessibility for example. In order to optimize performance, manage load and maintain availability, servers may migrate from one cluster group to another making it difficult for server monitoring tools to continuously monitor these dynamically changing groups. Server monitoring tools are usually statically configured and with any change of group membership requires manual reconfiguration; an unreasonable task to undertake on large-scale cloud infrastructures. In this paper we present the Cloudlet Control and Management System (C2MS); a system for monitoring and controlling dynamic groups of physical or virtual servers within cloud infrastructures. The C2MS extends Ganglia - an open source scalable system performance monitoring tool - by allowing system administrators to define, monitor and modify server groups without the need for server reconfiguration. In turn administrators can easily monitor group and individual server metrics on large-scale dynamic cloud infrastructures where roles of servers may change frequently. Furthermore, we complement group monitoring with a control element allowing administrator-specified actions to be performed over servers within service groups as well as introduce further customized monitoring metrics. This paper outlines the design, implementation and evaluation of the C2MS.Comment: Proceedings of the The 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013), 8 page

    The Management and Security Expert (MASE)

    Get PDF
    The Management and Security Expert (MASE) is a distributed expert system that monitors the operating systems and applications of a network. It is capable of gleaning the information provided by the different operating systems in order to optimize hardware and software performance; recognize potential hardware and/or software failure, and either repair the problem before it becomes an emergency, or notify the systems manager of the problem; and monitor applications and known security holes for indications of an intruder or virus. MASE can eradicate much of the guess work of system management

    Zackup, a scalable centralized backup service

    Get PDF
    The currently available open source, centralized to disk, backup services use a custom data storage structure to effectively store backup sets. While cus- tom data storage structures are necessary to store backup sets in differential or space efficient manner, they often lead to hard to navigate structures and increase the complexity of the backup service. To solve these problems a backup service was developed that relies on the Sun Microsystems, Inc. open sourced file system ZFS. With the variety of features ZFS includes, the major- ity of common tasks to create backup sets were delegated to the file system. Along with ZFS, a number of Ruby libraries were used to help further reduce the amount of author created code. Lastly to increase the scalability of the developed backup service a job queuing based communication architecture between service entities was employed allowing for a multiple backup node solution

    Fat or thin? Is the verdict in?

    Get PDF
    ABSTRACT Thin client or network computing is a hot topic. The hype claims lower total cost of ownership, faster applications deployment and reduced management pain, compared to traditional computing architectures. Early in 1998 the Flinders University Library installed network computers in the Central and branch libraries for student access to the Internet. This paper is a review of network computers in the light of our experience over the past two years. Do network computers offer all that is claimed in the hype? Are there hidden costs? What are the issues of configuration, server scaling, network performance and fault diagnosis? Do they have a future in the Library arena

    Purple Computational Environment With Mappings to ACE Requirements for the General Availability User Environment Capabilities

    Full text link

    Ensuring system integrity and security on limited environment systems

    Get PDF
    Cyber security threats have rapidly developed in recent years and should also be considered when building or implementing systems that traditionally have not been connected to networks. More and more these systems are getting networked and controlled remotely, which widens their attack surface and lays them open to cyber threats. This means the systems should be able to detect and block malware threats without letting the controls affect daily operations. File integrity monitoring and protection could be one way to protect systems from emerging threats. The use case for this study is a computer system, that controls medical device. This kind of system does not necessarily have an internet connection and is not connected to a LAN network by default. Ensuring integrity on the system is critical as if the system would be infected by a malware, it could affect to the test results. This thesis studies what are the feasible ways to ensure system integrity on limited environment systems. Firstly these methods and tools are listed through a literature review. All of the tools are studied how they protect the system integrity. The literature review aims to select methods for further testing through a deductive reasoning. After selecting methods for testing, their implementations are installed to the testing environment. The methods are first tested for performance and then their detection and blocking capability is tested against real life threats. Finally, this thesis proposes a method which could be implemented to the presented use case. The proposal at the end is based on the conducted tests

    Monitoring and Failure Recovery of Cloud-Managed Digital Signage

    Get PDF
    Digitaal signage kasutatakse laialdaselt erinevates valdkondades, nagu näiteks transpordisüsteemid, turustusvõimalused, meelelahutus ja teised, et kuvada teavet piltide, videote ja teksti kujul. Nende ressursside usaldusväärsus, vajalike teenuste kättesaadavus ja turvameetmed on selliste süsteemide vastuvõtmisel võtmeroll. Digitaalse märgistussüsteemi tõhus haldamine on teenusepakkujatele keeruline ülesanne. Selle süsteemi rikkeid võib põhjustada mitmeid põhjuseid, nagu näiteks vigased kuvarid, võrgu-, riist- või tarkvaraprobleemid, mis on üsna korduvad. Traditsiooniline protsess sellistest ebaõnnestumistest taastumisel hõlmab sageli tüütuid ja tülikaid diagnoose. Paljudel juhtudel peavad tehnikud kohale füüsiliselt külastama, suurendades seeläbi hoolduskulusid ja taastumisaega.Selles väites pakume lahendust, mis jälgib, diagnoosib ja taandub tuntud tõrgetest, ühendades kuvarid pilvega. Pilvepõhine kaug- ja autonoomne server konfigureerib kaugseadete sisu ja uuendab neid dünaamiliselt. Iga kuva jälgib jooksvat protsessi ja saadab trace’i, logib süstemisse perioodiliselt. Negatiivide puhul analüüsitakse neid serverisse salvestatud logisid, mis optimaalselt kasutavad kohandatud logijuhtimismoodulit. Lisaks näitavad ekraanid ebaõnnestumistega toimetulemiseks enesetäitmise protseduure, kui nad ei suuda pilvega ühendust luua. Kavandatud lahendus viiakse läbi Linuxi süsteemis ja seda hinnatakse serveri kasutuselevõtuga Amazon Web Service (AWS) pilves. Peamisteks tulemusteks on meetodite kogum, mis võimaldavad kaugjuhtimisega kuvariprobleemide lahendamist.Digital signage is widely used in various fields such as transport systems, trading outlets, entertainment, and others, to display information in the form of images, videos, and text. The reliability of these resources, availability of required services and security measures play a key role in the adoption of such systems. Efficient management of the digital signage system is a challenging task to the service providers. There could be many reasons that lead to the malfunctioning of this system such as faulty displays, network, hardware or software failures that are quite repetitive. The traditional process of recovering from such failures often involves tedious and cumbersome diagnosis. In many cases, technicians need to physically visit the site, thereby increasing the maintenance costs and the recovery time. In this thesis, we propose a solution that monitors, diagnoses and recovers from known failures by connecting the displays to a cloud. A cloud-based remote and autonomous server configures the content of remote displays and updates them dynamically. Each display tracks the running process and sends the trace and system logs to the server periodically. These logs, stored at the server optimally using a customized log management module, are analysed for failures. In addition, the displays incorporate self-recovery procedures to deal with failures, when they are unable to create connection to the cloud. The proposed solution is implemented on a Linux system and evaluated by deploying the server on the Amazon Web Service (AWS) cloud. The main result of the thesis is a collection of techniques for resolving the display system failures remotely

    Tortoise: Interactive System Configuration Repair

    Full text link
    System configuration languages provide powerful abstractions that simplify managing large-scale, networked systems. Thousands of organizations now use configuration languages, such as Puppet. However, specifications written in configuration languages can have bugs and the shell remains the simplest way to debug a misconfigured system. Unfortunately, it is unsafe to use the shell to fix problems when a system configuration language is in use: a fix applied from the shell may cause the system to drift from the state specified by the configuration language. Thus, despite their advantages, configuration languages force system administrators to give up the simplicity and familiarity of the shell. This paper presents a synthesis-based technique that allows administrators to use configuration languages and the shell in harmony. Administrators can fix errors using the shell and the technique automatically repairs the higher-level specification written in the configuration language. The approach (1) produces repairs that are consistent with the fix made using the shell; (2) produces repairs that are maintainable by minimizing edits made to the original specification; (3) ranks and presents multiple repairs when relevant; and (4) supports all shells the administrator may wish to use. We implement our technique for Puppet, a widely used system configuration language, and evaluate it on a suite of benchmarks under 42 repair scenarios. The top-ranked repair is selected by humans 76% of the time and the human-equivalent repair is ranked 1.31 on average.Comment: Published version in proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE) 201
    corecore