50 research outputs found

    Adaptive Failure-Aware Scheduling for Hadoop

    Get PDF
    Given the dynamic nature of cloud environments, failures are the norm rather than the exception in data centers powering cloud frameworks. Despite the diversity of integrated recovery mechanisms in cloud frameworks, their schedulers still generate poor scheduling decisions leading to tasks' failures due to unforeseen events such as unpredicted demands of services or hardware outages. Traditionally, simulation and analytical modeling have been widely used to analyze the impact of the scheduling decisions on the failures rates. However, they cannot provide accurate results and exhaustive coverage of the cloud systems especially when failures occur. In this thesis, we present new approaches for modeling and verifying an adaptive failure-aware scheduling algorithm for Hadoop to early detect these failures and to reschedule tasks according to changes in the cloud. Hadoop is the framework of choice on many off-the-shelf clusters in the cloud to process data-intensive applications by efficiently running them across distributed multiple machines. The proposed scheduling algorithm for Hadoop relies on predictions made by machine learning algorithms trained on previously executed tasks and data collected from the Hadoop environment. To further improve Hadoop scheduling decisions on the fly, we use reinforcement learning techniques to select an appropriate scheduling action for a scheduled task. Furthermore, we propose an adaptive algorithm to dynamically detect failures of nodes in Hadoop. We implement the above approaches in ATLAS: an AdapTive Failure-Aware Scheduling algorithm that can be built on top of existing Hadoop schedulers. To illustrate the usefulness and benefits of ATLAS, we conduct a large empirical study on a Hadoop cluster deployed on Amazon Elastic MapReduce (EMR) to compare the performance of ATLAS to those of three Hadoop scheduling algorithms (FIFO, Fair, and Capacity). Results show that ATLAS outperforms these scheduling algorithms in terms of failures' rates, execution times, and resources utilization. Finally, we propose a new methodology to formally identify the impact of the scheduling decisions of Hadoop on the failures rates. We use model checking to verify some of the most important scheduling properties in Hadoop (schedulability, resources-deadlock freeness, and fairness) and provide possible strategies to avoid their occurrences in ATLAS. The formal verification of the Hadoop scheduler allows to identify more tasks failures and hence reduce the number of failures in ATLAS

    Emergent relational schemas for RDF

    Get PDF

    The twofold role of Cloud Computing in Digital Forensics: target of investigations and helping hand to evidence analysis

    Get PDF
    This PhD thesis discusses the impact of Cloud Computing infrastructures on Digital Forensics in the twofold role of target of investigations and as a helping hand to investigators. The Cloud offers a cheap and almost limitless computing power and storage space for data which can be leveraged to commit either new or old crimes and host related traces. Conversely, the Cloud can help forensic examiners to find clues better and earlier than traditional analysis applications, thanks to its dramatically improved evidence processing capabilities. In both cases, a new arsenal of software tools needs to be made available. The development of this novel weaponry and its technical and legal implications from the point of view of repeatability of technical assessments is discussed throughout the following pages and constitutes the unprecedented contribution of this wor

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    BIG DATA и анализ высокого уровня : материалы конференции

    Get PDF
    В сборнике опубликованы результаты научных исследований и разработок в области BIG DATA and Advanced Analytics для оптимизации IT-решений и бизнес-решений, а также тематических исследований в области медицины, образования и экологии

    BGSU Graduate College 2015-2016 Catalog

    Get PDF
    Bowling Green State University graduate catalog for 2015-2016.https://scholarworks.bgsu.edu/catalogs/1041/thumbnail.jp

    Literature Survey of Big Data

    Get PDF
    Mention the topic of big data, and a person is bound to experience information overload. Indeed, it is so complex with so many terms and details that people want to run away from it. When used right, big data (BD) will help people access data they need in in real time and help managers make better decisions. The purpose of this paper is to evaluate methods, procedures, and architectures for the storage and retrieval of all Federal Aviation Administration (FAA) research, engineering, and development (RE&D) data sets, to leverage on the technology innovation and advancement opportunities in the field of BD analytics. The paper also discusses all relevant Executive Orders (EOs), laws, and Office of Management and Budget (OMB) memorandums that were written to address what federal agencies under the OMB\u2019s jurisdiction must do to comply with various aspects of BD

    BGSU Graduate College 2013-2014 Catalog

    Get PDF
    Bowling Green State University graduate catalog for 2013-2014.https://scholarworks.bgsu.edu/catalogs/1000/thumbnail.jp

    BGSU Graduate College 2014-2015 Catalog

    Get PDF
    Bowling Green State University graduate catalog for 2014-2015.https://scholarworks.bgsu.edu/catalogs/1040/thumbnail.jp
    corecore