3,418 research outputs found

    StreamFlow: cross-breeding cloud with HPC

    Get PDF
    Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single-cell transcriptomic data analysis workflow.Comment: 30 pages - 2020 IEEE Transactions on Emerging Topics in Computin

    Towards risk-aware communications networking

    Get PDF

    Performance modelling, analysis and prediction of Spark jobs in Hadoop cluster : a thesis by publications presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical & Computational Sciences, Massey University, Auckland, New Zealand

    Get PDF
    Big Data frameworks have received tremendous attention from the industry and from academic research over the past decade. The advent of distributed computing frameworks such as Hadoop MapReduce and Spark are powerful frameworks that offer an efficient solution for analysing large-scale datasets running under the Hadoop cluster. Spark has been established as one of the most popular large-scale data processing engines because of its speed, low latency in-memory computation, and advanced analytics. Spark computational performance heavily depends on the selection of suitable parameters, and the configuration of these parameters is a challenging task. Although Spark has default parameters and can deploy applications without much effort, a significant drawback of default parameter selection is that it is not always the best for cluster performance. A major limitation for Spark performance prediction using existing models is that it requires either large input data or system configuration that is time-consuming. Therefore, an analytical model could be a better solution for performance prediction and for establishing appropriate job configurations. This thesis proposes two distinct parallelisation models for performance prediction: the 2D-Plate model and the Fully-Connected Node model. Both models were constructed based on serial boundaries for a certain arrangement of executors and size of the data. In order to evaluate the cluster performance, various HiBench workloads were used, and workload’s empirical data were fitted with the models for performance prediction analysis. The developed models were benchmarked with the existing models such as Amdahl’s, Gustafson, ERNEST, and machine learning. Our experimental results show that the two proposed models can quickly and accurately predict performance in terms of runtime, and they can outperform the accuracy of machine learning models when extrapolating predictions

    Connectivity between damage to physical infrastructure and social science, The: a new field study protocol concept

    Get PDF
    2016 Fall.Includes bibliographical references.The primary objective of this thesis is to introduce a field study methodology that will be calibrated over the next several years to enable researchers to collect data in the field that can be used to better understand and quantify community resilience. Specifically, a key objective is to provide a mechanism to link damage to the physical infrastructure to social and economic dimensions of a community in a measurable way. Although there have been several past attempts at creating a common post-disaster field study protocol, none of them have attempted to quantify community resilience in a quantitative manner that can be used for risk and resilience analysis. The methodology explained in this thesis is unique because it discusses potential metrics that can be used to quantify community resilience and describes methods of quantifying these metrics using field data. These metrics come from a combination of disciplines including engineering, sociology, and economics. This work combines a literature review of past field study protocols with perceived data requirements in order to outline a field study methodology that can be used for disasters (primarily natural; not anthropogenic) of any type including tornados, hurricanes, flood, tsunamis, wildland-urban interface (WUI) fires, and earthquakes. Algorithms were derived that include the ability to process raw field study data in order to create probabilistic models of resilience metrics (i.e., fragility functions). These algorithms were then demonstrated using existing field data related to population dislocation caused by Hurricane Andrew. Finally, a community resilience field study was conducted five years into the recovery process in order to investigate and model the long term effects of the May 22, 2011 tornado that occurred in Joplin, MO. The planning and execution of this study is described and the data that was gathered is used to provide an illustrative example of the interconnectivity between the physical damage and socio-economic consequences

    Cyber Defense Remediation in Energy Delivery Systems

    Get PDF
    The integration of Information Technology (IT) and Operational Technology (OT) in Cyber-Physical Systems (CPS) has resulted in increased efficiency and facilitated real-time information acquisition, processing, and decision making. However, the increase in automation technology and the use of the internet for connecting, remote controlling, and supervising systems and facilities has also increased the likelihood of cybersecurity threats that can impact safety of humans and property. There is a need to assess cybersecurity risks in the power grid, nuclear plants, chemical factories, etc. to gain insight into the likelihood of safety hazards. Quantitative cybersecurity risk assessment will lead to informed cyber defense remediation and will ensure the presence of a mitigation plan to prevent safety hazards. In this dissertation, using Energy Delivery Systems (EDS) as a use case to contextualize a CPS, we address key research challenges in managing cyber risk for cyber defense remediation. First, we developed a platform for modeling and analyzing the effect of cyber threats and random system faults on EDS\u27s safety that could lead to catastrophic damages. We developed a data-driven attack graph and fault graph-based model to characterize the exploitability and impact of threats in EDS. We created an operational impact assessment to quantify the damages. Finally, we developed a strategic response decision capability that presents optimal mitigation actions and policies that balance the tradeoff between operational resilience (tactical risk) and strategic risk. Next, we addressed the challenge of management of tactical risk based on a prioritized cyber defense remediation plan. A prioritized cyber defense remediation plan is critical for effective risk management in EDS. Due to EDS\u27s complexity in terms of the heterogeneous nature of blending IT and OT and Industrial Control System (ICS), scale, and critical processes tasks, prioritized remediation should be applied gradually to protect critical assets. We proposed a methodology for prioritizing cyber risk remediation plans by detecting and evaluating critical EDS nodes\u27 paths. We conducted evaluation of critical nodes characteristics based on nodes\u27 architectural positions, measure of centrality based on nodes\u27 connectivity and frequency of network traffic, as well as the controlled amount of electrical power. The model also examines the relationship between cost models of budget allocation for removing vulnerabilities on critical nodes and their impact on gradual readiness. The proposed cost models were empirically validated in an existing network ICS test-bed computing nodes criticality. Two cost models were examined, and although varied, we concluded the lack of correlation between types of cost models to most damageable attack path and critical nodes readiness. Finally, we proposed a time-varying dynamical model for the cyber defense remediation in EDS. We utilize the stochastic evolutionary game model to simulate the dynamic adversary of cyber-attack-defense. We leveraged the Logit Quantal Response Dynamics (LQRD) model to quantify real-world players\u27 cognitive differences. We proposed the optimal decision making approach by calculating the stable evolutionary equilibrium and balancing defense costs and benefits. Case studies on EDS indicate that the proposed method can help the defender predict possible attack action, select the related optimal defense strategy over time, and gain the maximum defense payoffs. We also leveraged software-defined networking (SDN) in EDS for dynamical cyber defense remediation. We presented an approach to aid the selection security controls dynamically in an SDN-enabled EDS and achieve tradeoffs between providing security and Quality of Service (QoS). We modeled the security costs based on end-to-end packet delay and throughput. We proposed a non-dominated sorting based multi-objective optimization framework which can be implemented within an SDN controller to address the joint problem of optimizing between security and QoS parameters by alleviating time complexity at O(MN2). The M is the number of objective functions, and N is the population for each generation, respectively. We presented simulation results that illustrate how data availability and data integrity can be achieved while maintaining QoS constraints
    • …
    corecore