145 research outputs found

    Extreme scale parallel NBody algorithm with event driven constraint based execution model

    Get PDF
    Traditional scientific applications such as Computational Fluid Dynamics, Partial Differential Equations based numerical methods (like Finite Difference Methods, Finite Element Methods) achieve sufficient efficiency on state of the art high performance computing systems and have been widely studied / implemented using conventional programming models. For emerging application domains such as Graph applications scalability and efficiency is significantly constrained by the conventional systems and their supporting programming models. Furthermore technology trends like multicore, manycore, heterogeneous system architectures are introducing new challenges and possibilities. Emerging technologies are requiring a rethinking of approaches to more effectively expose the underlying parallelism to the applications and the end-users. This thesis explores the space of effective parallel execution of ephemeral graphs that are dynamically generated. The standard particle based simulation, solved using the Barnes-Hut algorithm is chosen to exemplify the dynamic workloads. In this thesis the workloads are expressed using sequential execution semantics, a conventional parallel programming model - shared memory semantics and semantics of an innovative execution model designed for efficient scalable performance towards Exascale computing called ParalleX. The main outcomes of this research are parallel processing of dynamic ephemeral workloads, enabling dynamic load balancing during runtime, and using advanced semantics for exposing parallelism in scaling constrained applications

    Model for Predicting Bluetooth Low Energy Micro-Location Beacon Coin Cell Battery Lifetime

    Get PDF
    Bluetooth Low Energy beacon devices, typically operating on coin cell batteries, have emerged as key components of micro-location wireless sensor networks. To design efficient and reliable networks, designers require tools for predicting battery and beacon lifetime, based on design parameters that are specific to micro-location applications. This design science research contributes to the implementation of an artifact functioning as a predictive tool for coin cell battery lifetime when powering Bluetooth Low Energy beacon devices. Building upon effective and corroborated components from other researchers, the Beacon Lifetime Model 1.0 was developed as a spreadsheet workbook, providing a user interface for designers to specify parameters, and providing a predictive engine to predict coin cell battery lifetime. Results showed that the measured and calculated predictions were consistent with those derived through other methodologies, while providing a uniquely extensible user interface which may accommodate future work on emerging components. Future work may include research on real world scenarios, as beacon devices are deployed for robust micro-location applications. Future work may also include improved battery models that capture increasingly accurate performance under micro-location workloads. Beacon Lifetime Model 1.x is designed to incorporate those emerging components, with Beacon Lifetime Model1.0 serving as the initial instantiation of this design science artifact

    New directions for remote data integrity checking of cloud storage

    Get PDF
    Cloud storage services allow data owners to outsource their data, and thus reduce their workload and cost in data storage and management. However, most data owners today are still reluctant to outsource their data to the cloud storage providers (CSP), simply because they do not trust the CSPs, and have no confidence that the CSPs will secure their valuable data. This dissertation focuses on Remote Data Checking (RDC), a collection of protocols which can allow a client (data owner) to check the integrity of data outsourced at an untrusted server, and thus to audit whether the server fulfills its contractual obligations. Robustness has not been considered for the dynamic RDCs in the literature. The R-DPDP scheme being designed is the first RDC scheme that provides robustness and, at the same time, supports dynamic data updates, while requiring small, constant, client storage. The main challenge that has to be overcome is to reduce the client-server communication during updates under an adversarial setting. A security analysis for R-DPDP is provided. Single-server RDCs are useful to detect server misbehavior, but do not have provisions to recover damaged data. Thus in practice, they should be extended to a distributed setting, in which the data is stored redundantly at multiple servers. The client can use RDC to check each server and, upon having detected a corrupted server, it can repair this server by retrieving data from healthy servers, so that the reliability level can be maintained. Previously, RDC has been investigated for replication-based and erasure coding-based distributed storage systems. However, RDC has not been investigated for network coding-based distributed storage systems that rely on untrusted servers. RDC-NC is the first RDC scheme for network coding-based distributed storage systems to ensure data remain intact when faced with data corruption, replay, and pollution attacks. Experimental evaluation shows that RDC-NC is inexpensive for both the clients and the servers. The setting considered so far outsources the storage of the data, but the data owner is still heavily involved in the data management process (especially during the repair of damaged data). A new paradigm is proposed, in which the data owner fully outsources both the data storage and the management of the data. In traditional distributed RDC schemes, the repair phase imposes a significant burden on the client, who needs to expend a significant amount of computation and communication, thus, it is very difficult to keep the client lightweight. A new self-repairing concept is developed, in which the servers are responsible to repair the corruption, while the client acts as a lightweight coordinator during repair. To realize this new concept, two novel RDC schemes, RDC-SR and ERDC-SR, are designed for replication-based distributed storage systems, which enable Server-side Repair and minimize the load on the client side. Version control systems (VCS) provide the ability to track and control changes made to the data over time. The changes are usually stored in a VCS repository which, due to its massive size, is often hosted at an untrusted CSP. RDC can be used to address concerns about the untrusted nature of the VCS server by allowing a data owner to periodically check that the server continues to store the data. The RDC-AVCS scheme being designed relies on RDC to ensure all the data versions are retrievable from the untrusted server over time. The RDC-AVCS prototype built on top of Apache SVN only incurs a modest decrease in performance compared to a regular (non-secure) SVN system

    Fault- and Yield-Aware On-Chip Memory Design and Management

    Get PDF
    Ever decreasing device size causes more frequent hard faults, which becomes a serious burden to processor design and yield management. This problem is particularly pronounced in the on-chip memory which consumes up to 70% of a processor' s total chip area. Traditional circuit-level techniques, such as redundancy and error correction code, become less effective in error-prevalent environments because of their large area overhead. In this work, we suggest an architectural solution to building reliable on-chip memory in the future processor environment. Our approaches have two parts, a design framework and architectural techniques for on-chip memory structures. Our design framework provides important architectural evaluation metrics such as yield, area, and performance based on low level defects and process variations parameters. Processor architects can quickly evaluate their designs' characteristics in terms of yield, area, and performance. With the framework, we develop architectural yield enhancement solutions for on-chip memory structures including L1 cache, L2 cache and directory memory. Our proposed solutions greatly improve yield with negligible area and performance overhead. Furthermore, we develop a decoupled yield model of compute cores and L2 caches in CMPs, which show that there will be many more L2 caches than compute cores in a chip. We propose efficient utilization techniques for excess caches. Evaluation results show that excess caches significantly improve overall performance of CMPs

    Scalability in the Presence of Variability

    Get PDF
    Supercomputers are used to solve some of the world’s most computationally demanding problems. Exascale systems, to be comprised of over one million cores and capable of 10^18 floating point operations per second, will probably exist by the early 2020s, and will provide unprecedented computational power for parallel computing workloads. Unfortunately, while these machines hold tremendous promise and opportunity for applications in High Performance Computing (HPC), graph processing, and machine learning, it will be a major challenge to fully realize their potential, because to do so requires balanced execution across the entire system and its millions of processing elements. When different processors take different amounts of time to perform the same amount of work, performance imbalance arises, large portions of the system sit idle, and time and energy are wasted. Larger systems incorporate more processors and thus greater opportunity for imbalance to arise, as well as larger performance/energy penalties when it does. This phenomenon is referred to as performance variability and is the focus of this dissertation. In this dissertation, we explain how to design system software to mitigate variability on large scale parallel machines. Our approaches span (1) the design, implementation, and evaluation of a new high performance operating system to reduce some classes of performance variability, (2) a new performance evaluation framework to holistically characterize key features of variability on new and emerging architectures, and (3) a distributed modeling framework that derives predictions of how and where imbalance is manifesting in order to drive reactive operations such as load balancing and speed scaling. Collectively, these efforts provide a holistic set of tools to promote scalability through the mitigation of variability

    Geospatial Data Indexing Analysis and Visualization via Web Services with Autonomic Resource Management

    Get PDF
    With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation

    An Adaptive Middleware for Improved Computational Performance

    Get PDF

    Cache Resource Allocation in Large-Scale Chip Multiprocessors.

    Full text link
    Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power and thermal constraints being placed on processor design, as well as the diminishing returns of building ever more complex uniprocessors. While the number of cores on a chip has increased rapidly, changes to other aspects of system design have been slower in coming. Namely, the on-chip memory hierarchy has largely been unchanged despite the shift to multicore designs. The last level of cache on chip is generally shared by all hardware threads on the chip, creating a ripe environment for resource allocation problems. This dissertation examines cache resource allocation in large-scale chip multiprocessors. It begins by performing extensive supporting research in the area of shared cache metric analysis, concluding that there is no single optimal shared cache design metric. This result supports the idea that shared caches ought not explicitly attempt to achieve optimal partitions; rather they should only react when unfavorable performance is detected. This study is followed by some studies using machine learning analysis to extract salient characteristics in predicting poor shared cache performance. The culmination of this dissertation is a shared cache management framework called SLAM (Scalable, Lightweight, Adaptive Management). SLAM is a scalable and feasible mechanism which detects inefficiency of cache usage by hardware threads sharing the cache. An inefficient thread can be easily punished by effectively reducing its cache occupancy via a modified cache insertion policy. The crux of SLAM is the detection of inefficient threads, which relies on two novel performance monitors in the cache which stem from the results of the machine learning studies: the Misses Per Access Counter (MPAC), and the Relative Insertion Tracker (RIT), which each requires only tens of bits in storage per thread. SLAM not only provides a means for extracting significant performance gains over current cache designs (up to 13.1% improvement), but SLAM also provides a means for granting differentiated quality of service to various cache sharers. Particularly as commercialized virtual servers become increasingly common, being able to provide differentiated quality of service at low cost potentially has significant value.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64727/1/hsul_1.pd

    A Study of Adaptation Mechanisms for Simulation Algorithms

    Get PDF
    The performance of a program can sometimes greatly improve if it was known in advance the features of the input the program is supposed to process, the actual operating parameters it is supposed to work with, or the specific environment it is to run on. However, this information is typically not available until too late in the program’s operation to take advantage of it. This is especially true for simulation algorithms, which are sensitive to this late-arriving information, and whose role in the solution of decision-making, inference and valuation problems is crucial. To overcome this limitation we need to provide the flexibility for a program to adapt its behaviour to late-arriving information once it becomes available. In this thesis, I study three adaptation mechanisms: run-time code generation, model-specific (quasi) Monte Carlo sampling and dynamic computation offloading, and evaluate their benefits on Monte Carlo algorithms. First, run-time code generation is studied in the context of Monte Carlo algorithms for time-series filtering in the form of the Input-Adaptive Kalman filter, a dynamically generated state estimator for non-linear, non-Gaussian dynamic systems. The second adaptation mechanism consists of the application of the functional-ANOVA decomposition to generate model-specific QMC-samplers which can then be used to improve Monte Carlo-based integration. The third adaptive mechanism treated here, dynamic computation offloading, is applied to wireless communication management, where network conditions are assessed via option valuation techniques to determine whether a program should offload computations or carry them out locally in order to achieve higher run-time (and correspondingly battery-usage) efficiency. This ability makes the program well suited for operation in mobile environments. At their core, all these applications carry out or make use of (quasi) Monte Carlo simulations on dynamic Bayesian networks (DBNs). The DBN formalism and its associated simulation-based algorithms are of great value in the solution to problems with a large uncertainty component. This characteristic makes adaptation techniques like those studied here likely to gain relevance in a world where computers are endowed with perception capabilities and are expected to deal with an ever-increasing stream of sensor and time-series data

    New simulation techniques for energy aware cloud computing environments

    Get PDF
    In this thesis we propose a new simulation platform specifically designed for modelling cloud computing environments, its underlying architectures, and the energy consumed by hardware devices. The models that consists on servers are divided into the five basic subsystems: processing system, memory system, network system, storage system, and the power supply unit. Each one of these subsystems has been built including new strategies to simulate energy aware. On the top of these models, there have been deployed the virtualization models to simulate the hypervisor and its scheduling policies. In addition, the cloud manager, the core of the simulation platform, is responsible for the provisioning resources management policies. It design offers to researchers APIs, allowing to perform studies on scheduling policies of cloud computing systems. This simulation platform is aimed to model existent and new designs of cloud computing architectures, with a customizable environment to configure the energy consumption of different components. The main characteristics of this platform are flexibility, allowing a wide possibility of designs; scalability to study large environments; and to provide a good compromise between accuracy and performance. A validation process of the simulation platform has been reached by comparing results from real experiments, with results from simulation executions obtained by modelling the real experiments. Therefore, to evaluate the possibility to foresee the energy consumption of a real cloud environment, an experiment of deploying a model of a real application has been studied. Finally, scalability experiments has been performed to study the behaviour of the simulation platform with large scale environments experiments. The main aim of scalability tests, is to calculate both, the amount of time and memory needed to execute large simulations, depending on the size of the environment simulated, and the availability of hardware resources to execute them.En esta tesis se propone una nueva plataforma de simulación específicamente diseñada para modelar entornos de computación en la nube, sus arquitecturas subyacentes, y la energía consumida por los dispositivos hardware. Los modelos que constituyen los servidores se encuentran divididos en los cinco subsistemas básicos: sistema de procesamiento, sistema de memoria, sistema de almacenamiento, sistema de red, y fuente de alimentación. Cada uno de estos subsistemas ha sido modelado incluyendo nuevas estrategias para simular su consumo energético. Sobre estos modelos se despliegan los modelos de virtualización con la finalidad de simular el hipervisor y sus políticas de planificación. Además, se ha realizado el modelo del gestor de la nube, la pieza central de la plataforma de simulación y responsable de la gestión de las políticas de aprovisionamiento de recursos. Su diseño ofrece interfaces a los investigadores, permitiendo realizar sus estudios sobre políticas de planificación en entornos de computación en la nube. Los objetivos de esta plataforma de simulación son permitir el modelado de entornos existentes y nuevos diseños arquitectónicos de computación en la nube, con un entorno configurable que permita modificar valores de consumo energético de los distintos componentes. Las principales características de esta plataforma son su flexibilidad, permitiendo una amplia posibilidad de diseños; escalabilidad, para estudiar entornos con gran número de elementos; y proveer un buen compromiso entre la precisión de los resultados y su rendimiento. Se ha realizado el proceso de validación de la plataforma de simulación mediante la comparación de resultados de experimentos realizados en entornos reales, con los resultados de simulación obtenidos de modelar dichos entornos reales. Tras ello, se ha realizado una evaluación mostrando la capacidad de prever el consumo energético de un entorno de computación en la nube que modela una aplicación real. Finalmente, se han realizado experimentos para analizar la escalabilidad, con el fin de estudiar el comportamiento de la plataforma ante la simulación de entornos de gran escala. El principal objetivo de los test de escalabilidad consiste en calcular la cantidad de tiempo y de memoria necesarios para ejecutar simulaciones grandes, dependiendo del tamaño del entorno simulado, y de la disponibilidad de recursos físicos para ejecutarlas.This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, the COST Action IC1305,”Network on Sustainable Ultrascale Computing (NESUS)”, ESTuDIo (TIN2012-36812-C02-01), SICOMORo-CM (S2013/ICE-3006), the SEPE (Servicio Público de Empleo Estatal) commonly known as INEM, my entire savings, and part from my parents.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Félix García Carballeira.- Secretario: Jorge Enrique Pérez Martínez.- Vocal: Manuel Núñez Garcí
    corecore