16 research outputs found

    Modelling low power compute clusters for cloud simulation

    Get PDF
    In order to minimise their energy use, data centre operators are constantly exploring new ways to construct computing infrastructures. As low power CPUs, exemplified by ARM-based devices, are becoming increasingly popular, there is a growing trend for the large scale deployment of low power servers in data centres. For example, recent research has shown promising results on constructing small scale data centres using Raspberry Pi (RPi) single-board computers as their building blocks. To enable larger scale experimentation and feasibility studies, cloud simulators could be utilised. Unfortunately, stateof-the-art simulators often need significant modification to include such low power devices as core data centre components. In this paper, we introduce models and extensions to estimate the behaviour of these new components in the DISSECT-CF cloud computing simulator. We show that how a RPi based cloud could be simulated with the use of the new models. We evaluate the precision and behaviour of the implemented models using a Hadoop-based application scenario executed both in real life and simulated clouds

    Harnessing single board computers for military data analytics

    Get PDF
    Executive summary: This chapter covers the use of Single Board Computers (SBCs) to expedite onsite data analytics for a variety of military applications. Onsite data summarization and analytics is increasingly critical for command, control, and intelligence (C2I) operations, as excessive power consumption and communication latency can restrict the efficacy of down-range operations. SBCs offer power-efficient, inexpensive data-processing capabilities while maintaining a small form factor. We discuss the use of SBCs in a variety of domains, including wireless sensor networks, unmanned vehicles, and cluster computing. We conclude with a discussion of existing challenges and opportunities for future use.https://digitalcommons.usmalibrary.org/books/1010/thumbnail.jp

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Understanding the Performance of Low Power Raspberry Pi Cloud for Big Data

    Get PDF
    Nowadays, Internet-of-Things (IoT) devices generate data at high speed and large volume. Often the data require real-time processing to support high system responsiveness which can be supported by localised Cloud and/or Fog computing paradigms. However, there are considerably large deployments of IoT such as sensor networks in remote areas where Internet connectivity is sparse, challenging the localised Cloud and/or Fog computing paradigms. With the advent of the Raspberry Pi, a credit card-sized single board computer, there is a great opportunity to construct low-cost, low-power portable cloud to support real-time data processing next to IoT deployments. In this paper, we extend our previous work on constructing Raspberry Pi Cloud to study its feasibility for real-time big data analytics under realistic application-level workload in both native and virtualised environments. We have extensively tested the performance of a single node Raspberry Pi 2 Model B with httperf and a cluster of 12 nodes with Apache Spark and HDFS (Hadoop Distributed File System). Our results have demonstrated that our portable cloud is useful for supporting real-time big data analytics. On the other hand, our results have also unveiled that overhead for CPU-bound workload in virtualised environment is surprisingly high, at 67.2%. We have found that, for big data applications, the virtualisation overhead is fractional for small jobs but becomes more significant for large jobs, up to 28.6%

    Design and management of image processing pipelines within CPS : Acquired experience towards the end of the FitOptiVis ECSEL Project

    Get PDF
    Cyber-Physical Systems (CPSs) are dynamic and reactive systems interacting with processes, environment and, sometimes, humans. They are often distributed with sensors and actuators, characterized for being smart, adaptive, predictive and react in real-time. Indeed, image- and video-processing pipelines are a prime source for environmental information for systems allowing them to take better decisions according to what they see. Therefore, in FitOptiVis, we are developing novel methods and tools to integrate complex image- and video-processing pipelines. FitOptiVis aims to deliver a reference architecture for describing and optimizing quality and resource management for imaging and video pipelines in CPSs both at design- and run-time. The architecture is concretized in low-power, high-performance, smart components, and in methods and tools for combined design-time and run-time multi-objective optimization and adaptation within system and environment constraints.Peer reviewe

    Dynamic resource provisioning for data center workloads with data constraints

    Get PDF
    Dynamic resource provisioning, as an important data center software building block, helps to achieve high resource usage efficiency, leading to enormous monetary benefits. Most existing work for data center dynamic provisioning target on stateless servers, where any request can be routed to any server. However, the assumption of stateless behaviors no longer holds for subsystems that subject to data constraints, as a request may depend on a certain dataset stored on a small subset of servers. Routing a request to a server without the required dataset violates data locality or data availability properties, which may negatively impact on the response times. To solve this problem, this thesis provides an unified framework consisting of two main steps: 1) determining the proper amount of resources to serve the workload by analyzing the schedulability utilization bound; 2) avoiding transition penalties during cluster resizing operations by deliberately design data distribution policies. We apply this framework to both storage and computing subsystems, where the former includes distributed file systems, databases, memory caches, and the latter refers to systems such as Hadoop, Spark, and Storm. Proposed solutions are implemented into MemCached, HBase/HDFS, and Spark, and evaluated using various datasets, including Wikipedia, NYC taxi trace, Twitter traces, etc

    The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics

    Get PDF
    2014 - 2015The era of Big Data is leading the generation of large amounts of data, which require storage and analysis capabilities that can be only ad- dressed by distributed computing systems. To facilitate large-scale distributed computing, many programming paradigms and frame- works have been proposed, such as MapReduce and Apache Hadoop, which transparently address some issues of distributed systems and hide most of their technical details. Hadoop is currently the most popular and mature framework sup- porting the MapReduce paradigm, and it is widely used to store and process Big Data using a cluster of computers. The solutions such as Hadoop are attractive, since they simplify the transformation of an application from non-parallel to the distributed one by means of general utilities and without many skills. However, without any algorithm engineering activity, some target applications are not alto- gether fast and e cient, and they can su er from several problems and drawbacks when are executed on a distributed system. In fact, a distributed implementation is a necessary but not su cient condition to obtain remarkable performance with respect to a non-parallel coun- terpart. Therefore, it is required to assess how distributed solutions are run on a Hadoop cluster, and/or how their performance can be improved to reduce resources consumption and completion times. In this dissertation, we will show how Hadoop-based implementations can be enhanced by using carefully algorithm engineering activity, tuning, pro ling and code improvements. It is also analyzed how to achieve these goals by working on some critical points, such as: data local computation, input split size, number and granularity of tasks, cluster con guration, input/output representation, etc. i In particular, to address these issues, we choose some case studies coming from two research areas where the amount of data is rapidly increasing, namely, Digital Image Forensics and Bioinformatics. We mainly describe full- edged implementations to show how to design, engineer, improve and evaluate Hadoop-based solutions for Source Camera Identi cation problem, i.e., recognizing the camera used for taking a given digital image, adopting the algorithm by Fridrich et al., and for two of the main problems in Bioinformatics, i.e., alignment- free sequence comparison and extraction of k-mer cumulative or local statistics. The results achieved by our improved implementations show that they are substantially faster than the non-parallel counterparts, and re- markably faster than the corresponding Hadoop-based naive imple- mentations. In some cases, for example, our solution for k-mer statis- tics is approximately 30× faster than our Hadoop-based naive im- plementation, and about 40× faster than an analogous tool build on Hadoop. In addition, our applications are also scalable, i.e., execution times are (approximately) halved by doubling the computing units. Indeed, algorithm engineering activities based on the implementation of smart improvements and supported by careful pro ling and tun- ing may lead to a much better experimental performance avoiding potential problems. We also highlight how the proposed solutions, tips, tricks and insights can be used in other research areas and problems. Although Hadoop simpli es some tasks of the distributed environ- ments, we must thoroughly know it to achieve remarkable perfor- mance. It is not enough to be an expert of the application domain to build Hadop-based implementations, indeed, in order to achieve good performance, an expert of distributed systems, algorithm engi- neering, tuning, pro ling, etc. is also required. Therefore, the best performance depend heavily on the cooperation degree between the domain expert and the distributed algorithm engineer. [edited by Author]XIV n.s
    corecore