Search CORE

642 research outputs found

Resilient IoT-based Monitoring System for Crude Oil Pipelines

Author: Ahmed Safuriyawu
Le Mouël Frédéric
Stouls Nicolas
Publication venue: HAL CCSD
Publication date: 14/12/2020
Field of study

International audiencePipeline networks dominate the oil and gas mid-stream sector, and although the safest means of transportation for oil and gas products, they are susceptible to failures. These failures are due to manufacturing defects, environmental effects, material degradation, or third party interference through sabotage and vandalism. Internet of Things (IoT)-based solutions are promising to address these by monitoring and predicting failures. However, some challenges remain in the deployment of industrial IoT-based solutions, as the reliability, the robustness, the maintainability, the scalability, the energy consumption, etc. This paper is therefore aimed at highlighting potential solutions for detection and mitigation of pipeline failures while addressing the robustness, the cost and scalability issues of such approach efficiently across the network infrastructure, data and service layers

INRIA a CCSD electronic archive server

Security Analysis of Interdependent Critical Infrastructures: Power, Cyber and Gas

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Our daily life is becoming more and more reliant on services provided by the infrastructures power, gas , communication networks. Ensuring the security of these infrastructures is of utmost importance. This task becomes ever more challenging as the inter-dependence among these infrastructures grows and a security breach in one infrastructure can spill over to the others. The implication is that the security practices/ analysis recommended for these infrastructures should be done in coordination. This thesis, focusing on the power grid, explores strategies to secure the system that look into the coupling of the power grid to the cyber infrastructure, used to manage and control it, and to the gas grid, that supplies an increasing amount of reserves to overcome contingencies. The first part (Part I) of the thesis, including chapters 2 through 4, focuses on the coupling of the power and the cyber infrastructure that is used for its control and operations. The goal is to detect malicious attacks gaining information about the operation of the power grid to later attack the system. In chapter 2, we propose a hierarchical architecture that correlates the analysis of high resolution Micro-Phasor Measurement Unit (microPMU) data and traffic analysis on the Supervisory Control and Data Acquisition (SCADA) packets, to infer the security status of the grid and detect the presence of possible intruders. An essential part of this architecture is tied to the analysis on the microPMU data. In chapter 3 we establish a set of anomaly detection rules on microPMU data that flag "abnormal behavior". A placement strategy of microPMU sensors is also proposed to maximize the sensitivity in detecting anomalies. In chapter 4, we focus on developing rules that can localize the source of an events using microPMU to further check whether a cyber attack is causing the anomaly, by correlating SCADA traffic with the microPMU data analysis results. The thread that unies the data analysis in this chapter is the fact that decision are made without fully estimating the state of the system; on the contrary, decisions are made using a set of physical measurements that falls short by orders of magnitude to meet the needs for observability. More specifically, in the first part of this chapter (sections 4.1- 4.2), using microPMU data in the substation, methodologies for online identification of the source Thevenin parameters are presented. This methodology is used to identify reconnaissance activity on the normally-open switches in the substation, initiated by attackers to gauge its controllability over the cyber network. The applications of this methodology in monitoring the voltage stability of the grid is also discussed. In the second part of this chapter (sections 4.3-4.5), we investigate the localization of faults. Since the number of PMU sensors available to carry out the inference is insufficient to ensure observability, the problem can be viewed as that of under-sampling a "graph signal"; the analysis leads to a PMU placement strategy that can achieve the highest resolution in localizing the fault, for a given number of sensors. In both cases, the results of the analysis are leveraged in the detection of cyber-physical attacks, where microPMU data and relevant SCADA network traffic information are compared to determine if a network breach has affected the integrity of the system information and/or operations. In second part of this thesis (Part II), the security analysis considers the adequacy and reliability of schedules for the gas and power network. The motivation for scheduling jointly supply in gas and power networks is motivated by the increasing reliance of power grids on natural gas generators (and, indirectly, on gas pipelines) as providing critical reserves. Chapter 5 focuses on unveiling the challenges and providing solution to this problem.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Recommended from our members

Enabling Resilience in Cyber-Physical-Human Water Infrastructures

Author: Han Qing
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Rapid urbanization and growth in urban populations have forced community-scale infrastructures (e.g., water, power and natural gas distribution systems, and transportation networks) to operate at their limits. Aging (and failing) infrastructures around the world are becoming increasingly vulnerable to operational degradation, extreme weather, natural disasters and cyber attacks/failures. These trends have wide-ranging socioeconomic consequences and raise public safety concerns. In this thesis, we introduce the notion of cyber-physical-human infrastructures (CPHIs) - smart community-scale infrastructures that bridge technologies with physical infrastructures and people. CPHIs are highly dynamic stochastic systems characterized by complex physical models that exhibit regionwide variability and uncertainty under disruptions. Failures in these distributed settings tend to be difficult to predict and estimate, and expensive to repair. Real-time fault identification is crucial to ensure continuity of lifeline services to customers at adequate levels of quality. Emerging smart community technologies have the potential to transform our failing infrastructures into robust and resilient future CPHIs.In this thesis, we explore one such CPHI - community water infrastructures. Current urban water infrastructures, that are decades (sometimes over a 100 years) old, encompass diverse geophysical regimes. Water stress concerns include the scarcity of supply and an increase in demand due to urbanization. Deterioration and damage to the infrastructure can disrupt water service; contamination events can result in economic and public health consequences. Unfortunately, little investment has gone into modernizing this key lifeline.To enhance the resilience of water systems, we propose an integrated middleware framework for quick and accurate identification of failures in complex water networks that exhibit uncertain behavior. Our proposed approach integrates IoT-based sensing, domain-specific models and simulations with machine learning methods to identify failures (pipe breaks, contamination events). The composition of techniques results in cost-accuracy-latency tradeoffs in fault identification, inherent in CPHIs due to the constraints imposed by cyber components, physical mechanics and human operators. Three key resilience problems are addressed in this thesis; isolation of multiple faults under a small number of failures, state estimation of the water systems under extreme events such as earthquakes, and contaminant source identification in water networks using human-in-the-loop based sensing. By working with real world water agencies (WSSC, DC and LADWP, LA), we first develop an understanding of operations of water CPHI systems. We design and implement a sensor-simulation-data integration framework AquaSCALE, and apply it to localize multiple concurrent pipe failures. We use a mixture of infrastructure measurements (i.e., historical and live water pressure/flow), environmental data (i.e., weather) and human inputs (i.e., twitter feeds), combined and enhanced with the domain model and supervised learning techniques to locate multiple failures at fine levels of granularity (individual pipeline level) with detection time reduced by orders of magnitude (from hours/days to minutes). We next consider the resilience of water infrastructures under extreme events (i.e., earthquakes) - the challenge here is the lack of apriori knowledge and the increased number and severity of damages to infrastructures. We present a graphical model based approach for efficient online state estimation, where the offline graph factorization partitions a given network into disjoint subgraphs, and the belief propagation based inference is executed on-the-fly in a distributed manner on those subgraphs. Our proposed approach can isolate 80% broken pipes and 99% loss-of-service to end-users during an earthquake.Finally, we address issues of water quality - today this is a human-in-the-loop process where operators need to gather water samples for lab tests. We incorporate the necessary abstractions with event processing methods into a workflow, which iteratively selects and refines the set of potential failure points via human-driven grab sampling. Our approach utilizes Hidden Markov Model based representations for event inference, along with reinforcement learning methods for further refining event locations and reducing the cost of human efforts.The proposed techniques are integrated into a middleware architecture, which enables components to communicate/collaborate with one another. We validate our approaches through a prototype implementation with multiple real-world water networks, supply-demand patterns from water utilities and policies set by the U.S. EPA. While our focus here is on water infrastructures in a community, the developed end-to-end solution is applicable to other infrastructures and community services which operate in disruptive and resource-constrained environments

eScholarship - University of California

Performance Evaluation of Node Placement Schemes for Water Pipelines Monitoring

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Artificial intelligence driven anomaly detection for big data systems

Author: Alnafessah Ahmad
Publication venue: Computing, Imperial College London
Publication date: 01/06/2022
Field of study

The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces

Spiral - Imperial College Digital Repository

How Can AI be Distributed in the Computing Continuum? Introducing the Neural Pub/Sub Paradigm

Author: Kumar Abhishek
Lovén Lauri
Morabito Roberto
Pirttikangas Susanna
Riekki Jukka
Tarkoma Sasu
Publication venue
Publication date: 05/09/2023
Field of study

This paper proposes the neural publish/subscribe paradigm, a novel approach to orchestrating AI workflows in large-scale distributed AI systems in the computing continuum. Traditional centralized broker methodologies are increasingly struggling with managing the data surge resulting from the proliferation of 5G systems, connected devices, and ultra-reliable applications. Moreover, the advent of AI-powered applications, particularly those leveraging advanced neural network architectures, necessitates a new approach to orchestrate and schedule AI processes within the computing continuum. In response, the neural pub/sub paradigm aims to overcome these limitations by efficiently managing training, fine-tuning and inference workflows, improving distributed computation, facilitating dynamic resource allocation, and enhancing system resilience across the computing continuum. We explore this new paradigm through various design patterns, use cases, and discuss open research questions for further exploration

arXiv.org e-Print Archive

Water quality sensor placement: a multi-objective and multi-criteria approach

Author: Barros Daniel
Brentan Bruno
Carpitella Silvia
Certa Antonella
Izquierdo Sebastián Joaquín
Meirelles Gustavo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2021
Field of study

[EN] To satisfy their main goal, namely providing quality water to consumers, water distribution networks (WDNs) need to be suitably monitored. Only well designed and reliable monitoring data enables WDN managers to make sound decisions on their systems. In this belief, water utilities worldwide have invested in monitoring and data acquisition systems. However, good monitoring needs optimal sensor placement and presents a multi-objective problem where cost and quality are conflicting objectives (among others). In this paper, we address the solution to this multi-objective problem by integrating quality simulations using EPANET-MSX, with two optimization techniques. First, multi-objective optimization is used to build a Pareto front of non-dominated solutions relating contamination detection time and detection probability with cost. To assist decision makers with the selection of an optimal solution that provides the best trade-off for their utility, a multi-criteria decision-making technique is then used with a twofold objective: 1) to cluster Pareto solutions according to network sensitivity and entropy as evaluation parameters; and 2) to rank the solutions within each cluster to provide deeper insight into the problem when considering the utility perspectives.The clustering process, which considers features related to water utility needs and available information, helps decision makers select reliable and useful solutions from the Pareto front. Thus, while several works on sensor placement stop at multi-objective optimization, this work goes a step further and provides a reduced and simplified Pareto front where optimal solutions are highlighted. The proposed methodology uses the NSGA-II algorithm to solve the optimization problem, and clustering is performed through ELECTRE TRI. The developed methodology is applied to a very well-known benchmarking WDN, for which the usefulness of the approach is shown. The final results, which correspond to four optimal solution clusters, are useful for decision makers during the planning and development of projects on networks of quality sensors. The obtained clusters exhibit distinctive features, opening ways for a final project to prioritize the most convenient solution, with the assurance of implementing a Pareto-optimal solution.Brentan, B.; Carpitella, S.; Barros, D.; Meirelles, G.; Certa, A.; Izquierdo Sebastián, J. (2021). Water quality sensor placement: a multi-objective and multi-criteria approach. Water Resources Management. 35(1):225-241. https://doi.org/10.1007/s11269-020-02720-3S225241351Barak S, Mokfi T (2019) Evaluation and selection of clustering methods using a hybrid group mcdm. Expert Syst Appl 138:112817Berry JW, Fleischer L, Hart WE, Phillips CA, Watson JP (2005) Sensor placement in municipal water networks. J Water Resour Plan Manag 131 (3):237–243Bouyssou D, Marchant T (2015) On the relations between electre tri-b and electre tri-c and on a new variant of electre tri-b. Eur J Oper Res 242(1):201–211Brentan B, Carpitella S, Izquierdo J, Luvizotto E Jr, Meirelles G (2019) A multi-objective and multi-criteria approach for district metered area design: water operation and quality analysis. In: International conference on mathematical modeling in engineering & human behaviour, vol 2019, pp 110–117Brito AJ, de Almeida AT, Mota CM (2010) A multicriteria model for risk sorting of natural gas pipelines based on electre tri integrating utility theory. Eur J Oper Res 200(3):812–821Broad DR, Maier HR, Dandy GC, Nixon JB (2008) Optimal design of water distribution systems including water quality and system uncertainty. In: Water distribution systems analysis symposium, vol 2006, pp 1–17Candelieri A, Conti D, Archetti F (2014) A graph based analysis of leak localization in urban water networks. Procedia Eng 70:228–237Carpitella S, Brentan B, Montalvo I, Izquierdo J, Certa A (2018a) Multi-objective and multi-criteria analysis for optimal pump scheduling in water systems. EPiC Series Eng 3:364–371Carpitella S, Certa A, Izquierdo J, La Fata CM (2018b) k-out-of-n systems: an exact formula for the stationary availability and multi-objective configuration design based on mathematical programming and topsis. J Comput Appl Math 330:1007–1015Carpitella S, Ocaña-Levario SJ, Benítez J, Certa A, Izquierdo J (2018c) A hybrid multi-criteria approach to gpr image mining applied to water supply system maintenance. J Appl Geophy 159:754–764Certa A, Enea M, Galante GM, La Fata CM (2017) Electre tri-based approach to the failure modes classification on the basis of risk parameters: an alternative to the risk priority number. Comput Indust Eng 108:100–110Cheung P, Piller O, Propato M (2005) Optimal location of water quality sensors in supply systems by multiobjective genetic algorithms. In: Eight international conference on computing and control in the water industry CCWI05, vol 1, p 2Christodoulou SE, Gagatsis A, Xanthos S, Kranioti S, Agathokleous A, Fragiadakis M (2013) Entropy-based sensor placement optimization for waterloss detection in water distribution networks. Water Resour Manag 27 (13):4443–4468Corrente S, Greco S, Słowiński R (2016) Multiple criteria hierarchy process for electre tri methods. Eur J Oper Res 252(1):191–203Costa AS, Govindan K, Figueira JR (2018) Supplier classification in emerging economies using the electre tri-nc method: a case study considering sustainability aspects. J Clean Prod 201:925–947De Schaetzen W, Walters G, Savic D (2000) Optimal sampling design for model calibration using shortest path, genetic and entropy algorithms. Urban Water 2(2):141–152de Winter C, Palleti VR, Worm D, Kooij R (2019) Optimal placement of imperfect water quality sensors in water distribution networks. Comput Chem Eng 121:200–211Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197Dias LC, Antunes CH, Dantas G, de Castro N, Zamboni L (2018) A multi-criteria approach to sort and rank policies based on delphi qualitative assessments and electre tri: the case of smart grids in brazil. Omega 76:100–111Eliades DG, Kyriakou M, Vrachimis S, Polycarpou MM (2016) Epanet-matlab toolkit: An open-source software for interfacing epanet with matlab. In: Proceedings of the 14th international conference on computing and control for the water industry, CCWIFernandez E, Navarro J (2011) A new approach to multi-criteria sorting based on fuzzy outranking relations: the theseus method. Eur J Oper Res 213 (2):405–413Fernández E, Figueira JR, Navarro J, Roy B (2017) Electre tri-nb: a new multiple criteria ordinal classification method. Eur J Oper Res 263 (1):214–224Figueira JR, Greco S, Roy B, Słowiński R (2010) Electre methods: main features and recent developments. In: Handbook of multicriteria analysis. Springer, New York, pp 51–89Figueira JR, Greco S, Roy B, Słowiński R (2013) An overview of electre methods and their recent extensions. J Multi-Criteria Dec Anal 20 (1-2):61–85Francés-Chust J, Brentan BM, Carpitella S, Izquierdo J, Montalvo I (2020) Optimal placement of pressure sensors using fuzzy dematel-based sensor influence. Water 12(2):493Gandy M (2004) Rethinking urban metabolism: water, space and the modern city. City 8(3):363–379Giudicianni C, Herrera M, Di Nardo A, Greco R, Creaco E, Scala A (2020) Topological placement of quality sensors in water-distribution networks without the recourse to hydraulic modeling. J Water Resour Plan Manag 146 (6):04020030Hart WE, Murray R (2010) Review of sensor placement strategies for contamination warning systems in drinking water distribution systems. J Water Resour Plan Manag 136(6):611–619Herrera M, Abraham E, Stoianov I (2016) A graph-theoretic framework for assessing the resilience of sectorised water distribution networks. Water Resour Manag 30(5):1685–1699Huang JJ, McBean EA, James W (2008) Multi-objective optimization for monitoring sensor placement in water distribution systems. In: Water distribution systems analysis symposium, vol 2006, pp 1–14Kapelan ZS, Savic DA, Walters GA (2003) A hybrid inverse transient model for leakage detection and roughness calibration in pipe networks. J Hydraul Res 41(5):481–492Lee JH (2013) Determination of optimal water quality monitoring points in sewer systems using entropy theory. Entropy 15(9):3419–3434Liu Z, Ming X (2019) A methodological framework with rough-entropy-electre tri to classify failure modes for co-implementation of smart pss. Adv Eng Inform 42:100968Marchi A, Salomons E, Ostfeld A, Kapelan Z, Simpson AR, Zecchin AC, Maier HR, Wu ZY, Elsayed SM, Song Y et al (2013) Battle of the water networks ii. J Water Resour Plan Manag 140(7):04014009Mohammed A, Harris I, Soroka A, Nujoom R (2019) A hybrid mcdm-fuzzy multi-objective programming approach for a g-resilient supply chain network design. Comput Indust Eng 127:297–312Montalvo I, Izquierdo J, Pérez-garcía R, Herrera M (2014) Water distribution system computer-aided design by agent swarm optimization. Comput-Aided Civ Inf Eng 29(6):433–448Mousseau V, Slowinski R, Zielniewicz P (2000) A user-oriented implementation of the electre-tri method integrating preference elicitation support. Comput Opera Res 27(7-8):757–777Nafi A, Crastes E, Sadiq R, Gilbert D, Piller O (2018) Intentional contamination of water distribution networks: developing indicators for sensitivity and vulnerability assessments. Stoch Environ Res Risk Assess 32(2):527–544Neto JGD, Machado MAS, Gomes LFAM, Caldeira AM, Sallum FSV (2017) Investments in a new technological infrastructure: Decision making using the electre-tri methodology. Procedia Comput Sci 122:194–199Ohar Z, Lahav O, Ostfeld A (2015) Optimal sensor placement for detecting organophosphate intrusions into water distribution systems. Water Res 73:193–203Oliker N, Ostfeld A (2015) Network hydraulics inclusion in water quality event detection using multiple sensor stations data. Water Res 80:47–58Ostfeld A, Salomons E (2005) Optimal early warning monitoring system layout for water networks security: Inclusion of sensors sensitivities and response delays. Civ Eng Environ Syst 22(3):151–169Ostfeld A, Uber JG, Salomons E, Berry JW, Hart WE, Phillips CA, Watson JP, Dorini G, Jonkergouw P, Kapelan Z et al (2008) The battle of the water sensor networks (bwsn): A design challenge for engineers and algorithms. J Water Resour Plan Manag 134(6):556–568Quiñones-Grueiro M, Verde C, Llanes-santiago O (2019) Multi-objective sensor placement for leakage detection and localization in water distribution networks. In: 2019 4th conference on control and fault tolerant systems (SysTol), IEEE, pp 129–134Ramezanian R (2019) Estimation of the profiles in posteriori electre tri: A mathematical programming model. Comput Indust Eng 128:47–59Rathi S, Gupta R, Kamble S, Sargaonkar A (2016) Risk based analysis for contamination event selection and optimal sensor placement for intermittent water distribution network security. Water Resour Manag 30(8):2671–2685Reginaldo F (2015) Portfolio management in Brazil and a proposal for evaluation and balancing of portfolio projects with electre tri and iris. Procedia Comput Sci 55:1265–1274Roy B (1968) Classement et choix en présence de points de vue multiples. Revue française d’informatique et de recherche opérationnelle 2(8):57–75Roy B (1990) The outranking approach and the foundations of electre methods. In: Readings in multiple criteria decision aid. Springer, New York, pp 155–183Sánchez-Lozano J, García-cascales M, Lamata M (2016) Comparative topsis-electre tri methods for optimal sites for photovoltaic solar farms. case study in spain. J Clean Prod 127:387–398Seiti H, Hafezalkotob A, Najafi SE, Khalaj M (2019) Developing a novel risk-based mcdm approach based on d numbers and fuzzy information axiom and its applications in preventive maintenance planning. Appl Soft Comput: 105559Shang F, Uber JG, Rossman LA et al (2008) Epanet multi-species extension user’s manual. risk reduction engineering laboratory us environmental protection agency. Cincinnati, OhioShannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423Štirbanović Z, Stanujkić D, Miljanović I, Milanović D (2019) Application of mcdm methods for flotation machine selection. Miner Eng 137:140–146Wang H, Jiang Z, Zhang H, Wang Y, Yang Y, Li Y (2019) An integrated mcdm approach considering demands-matching for reverse logistics. J Clean Prod 208:199–210Wéber R, Hős C (2020) Efficient technique for pipe roughness calibration and sensor placement for water distribution systems. J. Water Resour Plan Manag 146(1):04019070Weickgenannt M, Kapelan Z, Blokker M, Savic DA (2010) Risk-based sensor placement for contaminant detection in water distribution systems. J Water Resour Plan Manag 136(6):629–63

RiuNet

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX