1,102 research outputs found

    CDAR : contour detection aggregation and routing in sensor networks

    Get PDF
    Wireless sensor networks offer the advantages of low cost, flexible measurement of phenomenon in a wide variety of applications, and easy deployment. Since sensor nodes are typically battery powered, energy efficiency is an important objective in designing sensor network algorithms. These algorithms are often application-specific, owing to the need to carefully optimize energy usage, and since deployments usually support a single or very few applications. This thesis concerns applications in which the sensors monitor a continuous scalar field, such as temperature, and addresses the problem of determining the location of a contour line in this scalar field, in response to a query, and communicating this information to a designated sink node. An energy-efficient solution to this problem is proposed and evaluated. This solution includes new contour detection and query propagation algorithms, in-network-processing algorithms, and routing algorithms. Only a small fraction of network nodes may be adjacent to the desired contour line, and the contour detection and query propagation algorithms attempt to minimize processing and communication by the other network nodes. The in-network processing algorithms reduce communication volume through suppression, compression and aggregation techniques. Finally, the routing algorithms attempt to route the contour information to the sink as efficiently as possible, while meshing with the other algorithms. Simulation results show that the proposed algorithms yield significant improvements in data and message volumes compared to baseline models, while maintaining the integrity of the contour representation

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

    Resilience for large ensemble computations

    Get PDF
    With the increasing power of supercomputers, ever more detailed models of physical systems can be simulated, and ever larger problem sizes can be considered for any kind of numerical system. During the last twenty years the performance of the fastest clusters went from the teraFLOPS domain (ASCI RED: 2.3 teraFLOPS) to the pre-exaFLOPS domain (Fugaku: 442 petaFLOPS), and we will soon have the first supercomputer with a peak performance cracking the exaFLOPS (El Capitan: 1.5 exaFLOPS). Ensemble techniques experience a renaissance with the availability of those extreme scales. Especially recent techniques, such as particle filters, will benefit from it. Current ensemble methods in climate science, such as ensemble Kalman filters, exhibit a linear dependency between the problem size and the ensemble size, while particle filters show an exponential dependency. Nevertheless, with the prospect of massive computing power come challenges such as power consumption and fault-tolerance. The mean-time-between-failures shrinks with the number of components in the system, and it is expected to have failures every few hours at exascale. In this thesis, we explore and develop techniques to protect large ensemble computations from failures. We present novel approaches in differential checkpointing, elastic recovery, fully asynchronous checkpointing, and checkpoint compression. Furthermore, we design and implement a fault-tolerant particle filter with pre-emptive particle prefetching and caching. And finally, we design and implement a framework for the automatic validation and application of lossy compression in ensemble data assimilation. Altogether, we present five contributions in this thesis, where the first two improve state-of-the-art checkpointing techniques, and the last three address the resilience of ensemble computations. The contributions represent stand-alone fault-tolerance techniques, however, they can also be used to improve the properties of each other. For instance, we utilize elastic recovery (2nd contribution) for mitigating resiliency in an online ensemble data assimilation framework (3rd contribution), and we built our validation framework (5th contribution) on top of our particle filter implementation (4th contribution). We further demonstrate that our contributions improve resilience and performance with experiments on various architectures such as Intel, IBM, and ARM processors.Amb l’increment de les capacitats de còmput dels supercomputadors, es poden simular models de sistemes físics encara més detallats, i es poden resoldre problemes de més grandària en qualsevol tipus de sistema numèric. Durant els últims vint anys, el rendiment dels clústers més ràpids ha passat del domini dels teraFLOPS (ASCI RED: 2.3 teraFLOPS) al domini dels pre-exaFLOPS (Fugaku: 442 petaFLOPS), i aviat tindrem el primer supercomputador amb un rendiment màxim que sobrepassa els exaFLOPS (El Capitan: 1.5 exaFLOPS). Les tècniques d’ensemble experimenten un renaixement amb la disponibilitat d’aquestes escales tan extremes. Especialment les tècniques més noves, com els filtres de partícules, se¿n beneficiaran. Els mètodes d’ensemble actuals en climatologia, com els filtres d’ensemble de Kalman, exhibeixen una dependència lineal entre la mida del problema i la mida de l’ensemble, mentre que els filtres de partícules mostren una dependència exponencial. No obstant, juntament amb les oportunitats de poder computar massivament, apareixen desafiaments com l’alt consum energètic i la necessitat de tolerància a errors. El temps de mitjana entre errors es redueix amb el nombre de components del sistema, i s’espera que els errors s’esdevinguin cada poques hores a exaescala. En aquesta tesis, explorem i desenvolupem tècniques per protegir grans càlculs d’ensemble d’errors. Presentem noves tècniques en punts de control diferencials, recuperació elàstica, punts de control totalment asincrònics i compressió de punts de control. A més, dissenyem i implementem un filtre de partícules tolerant a errors amb captació i emmagatzematge en caché de partícules de manera preventiva. I finalment, dissenyem i implementem un marc per la validació automàtica i l’aplicació de compressió amb pèrdua en l’assimilació de dades d’ensemble. En total, en aquesta tesis presentem cinc contribucions, les dues primeres de les quals milloren les tècniques de punts de control més avançades, mentre que les tres restants aborden la resiliència dels càlculs d’ensemble. Les contribucions representen tècniques independents de tolerància a errors; no obstant, també es poden utilitzar per a millorar les propietats de cadascuna. Per exemple, utilitzem la recuperació elàstica (segona contribució) per a mitigar la resiliència en un marc d’assimilació de dades d’ensemble en línia (tercera contribució), i construïm el nostre marc de validació (cinquena contribució) sobre la nostra implementació del filtre de partícules (quarta contribució). A més, demostrem que les nostres contribucions milloren la resiliència i el rendiment amb experiments en diverses arquitectures, com processadors Intel, IBM i ARM.Postprint (published version

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Técnicas de optimización dinámicas de aplicaciones paralelas basadas en MPI

    Get PDF
    Parallel computation on cluster architectures has become the most common solution for developing high-performance scientific applications. Message Passing Interface (MPI) [Mes94] is the message-passing library most widely used to provide communications in clusters. MPI provides a standard interface for operations such as point-to-point communication, collective communication, synchronization, and I/O operations. Along the I/O phase, the processes frequently access a common data set by issuing a large number of small non-contiguous I/O requests [NKP+96a, SR98], which might create bottlenecks in the I/O subsystem. These bottlenecks are still higher in commodity clusters, where commercial networks are usually installed. Many of those networks, such as Fast Ethernet or Gigabit, have high latency and low bandwidth which introduce performance penalties during the program execution. Scalability is also an important issue in cluster systems when many processors are used, which may cause network saturation and still higher latencies. As communication-intensive parallel applications spend a significant amount of their total execution time exchanging data between processes, the former problems may lead to poor performance not only in the I/O subsystem, but also in communication phase. Therefore, we can conclude that it is necessary to develop techniques for improving the performance of both communication and I/O subsystems. The main goal of this Ph.D. thesis is to improve the scalability and performance of MPI-based applications executed in clusters reducing the overhead of I/O and communications subsystems. In summary, this work proposes two techniques that solve these problems in an efficient way managing the high complexity of a heterogeneous environment: • Reduction in the number of communications in collective I/O operations: This thesis targets the reduction of the bottleneck in the I/O subsystem. Many applications use collective I/O operations to read/write data from/to disk. One of the most used is the Two-Phase I/O technique extended by Thakur and Choudhary in ROMIO. In this technique, many communications among the processes are performed, which could create a bottleneck. This bottleneck is still higher in commodity clusters, where commercial networks are usually installed, and in CMP clusters where the I/O bus is shared by the cores of a single node. Therefore, we propose improving locality in order to reduce the number of communications performed in Two-Phase I/O. • Reduction of transferred data volume: This thesis attemps to reduce the cost of interchanged messages, reducing the data volume by using lossless compression among processes. Furthermore, we propose turning compression on and off and selecting at run-time the most appropriate compression algorithms depending on the characteristics of each message, network performance, and compression algorithms behavior.-------------------------------------------------------------------------------------------------------------------------------------------------En la actualidad, las aplicaciones utilizadas en los entornos de computación de altas prestaciones, como por ejemplo simulaciones científicas o aplicaciones dedicadas a la extracción de datos (data-mining), necesitan además de enormes recursos de cómputo y memoria, el manejo de ingentes volúmenes de información. Las arquitecturas cluster se han convertido en la solución más común para ejecutar este tipo de aplicaciones. La librería MPI (Message Passing Interface) [Mes94] es la más utilizada en estos entornos, ya que ofrece un interfaz estándar para operaciones de comunicación punto a punto, colectivas, sincronización y de E/S. Durante la fase de E/S de las aplicaciones, los procesos acceden a un gran conjunto de datos mediante pequeñas peticiones de datos no-contiguos, por lo que pueden provocar cuellos de botella en el sistema de E/S. Estos cuellos de botella, pueden ser todavía mayor en los cluster, ya que se suelen utilizar redes comerciales como Fast Ethernet o Gigabit, las cuales tienen una gran latencia y bajo ancho de banda. Por otra parte la escalabilidad es un importante problema en los clusters, cuando se ejecutan a la vez un gran número de procesos, ya que pueden causar saturación de la red, y aumenar la latencia. Como consecuencia de una comunicación intensiva, las aplicaciones gastan mucho tiempo intercambiando información entre los procesos, provocando problemas tanto en el sistema de comunicación, como en el de E/S. Por lo tanto, podemos concluir que en un cluster los subsistemas de E/S y de comunicaciones representan uno de los principales elementos en los que conviene mejorar su rendimiento. El principal objetivo de esta Tesis Doctoral es mejorar la escalabilidad y rendimientos de las aplicaciones MPI ejecutadas en arquitecturas cluster, reduciendo la sobrecarga de los sistemas de comunicación y de E/S. Como resumen, este trabajo propone dos técnicas para resolver estos problemas de forma eficiente: • Reducción del número de comunicaciones en la operaciones colectivas de E/S: Esta tesis tiene como uno de sus objetivos reducir los cuellos de botella producidos en el sistema de E/S. Muchas aplicaciones científicas utilizan operaciones colectivas de E/S para leer/escribir datos desde/al disco. Una de las técnicas más utilizas es Two-Phase I/O ampliada por Thakur and Choudhary en ROMIO. En esta técnica se realizan muchas comunicaciones entre los procesos, por lo que pueden crear un cuello de botella. Este cuello de botella es aún mayor en los cluster que tiene instaladas redes comerciales, y en los clusters multicore donde el bus de E/S es compartido por todos los cores de un mismo nodo. Por lo tanto, nosotros proponemos aumentar la localidad y disminuir a la vez en número de comunicaciones que se producen en Two-Phase I/O para reducir los problemas de E/S en las arquitecturas cluster. • Reducción del volumen de datos en las comunicaciones: Esta tesis propone reducir el coste de las comunicaciones utilizando técnicas de compresión sin perdida. Concretamente, proponemos activar y desactivar la compresión y elegir el algoritmo de compresión en tiempo de ejecución, dependiendo de las características de cada mensaje, de la red y del comportamiento de los algoritmos de compresión

    Deduplication potential of HPC applications' checkpoints

    Get PDF
    © 2016 IEEE. HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design

    A Comprehensive Approach to WSN-Based ITS Applications: A Survey

    Get PDF
    In order to perform sensing tasks, most current Intelligent Transportation Systems (ITS) rely on expensive sensors, which offer only limited functionality. A more recent trend consists of using Wireless Sensor Networks (WSN) for such purpose, which reduces the required investment and enables the development of new collaborative and intelligent applications that further contribute to improve both driving safety and traffic efficiency. This paper surveys the application of WSNs to such ITS scenarios, tackling the main issues that may arise when developing these systems. The paper is divided into sections which address different matters including vehicle detection and classification as well as the selection of appropriate communication protocols, network architecture, topology and some important design parameters. In addition, in line with the multiplicity of different technologies that take part in ITS, it does not consider WSNs just as stand-alone systems, but also as key components of heterogeneous systems cooperating along with other technologies employed in vehicular scenarios
    corecore