1,054 research outputs found

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Trace-based Performance Analysis for Hardware Accelerators

    Get PDF
    This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods.Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    New cross-layer techniques for multi-criteria scheduling in large-scale systems

    Get PDF
    The global ecosystem of information technology (IT) is in transition to a new generation of applications that require more and more intensive data acquisition, processing and storage systems. As a result of that change towards data intensive computing, there is a growing overlap between high performance computing (HPC) and Big Data techniques in applications, since many HPC applications produce large volumes of data, and Big Data needs HPC capabilities. The hypothesis of this PhD. thesis is that the potential interoperability and convergence of the HPC and Big Data systems are crucial for the future, being essential the unification of both paradigms to address a broad spectrum of research domains. For this reason, the main objective of this Phd. thesis is purposing and developing a monitoring system to allow the HPC and Big Data convergence, thanks to giving information about behaviors of applications in a system which execute both kind of them, giving information to improve scalability, data locality, and to allow adaptability to large scale computers. To achieve this goal, this work is focused on the design of resource monitoring and discovery to exploit parallelism at all levels. These collected data are disseminated to facilitate global improvements at the whole system, and, thus, avoid mismatches between layers. The result is a two-level monitoring framework (both at node and application level) with a low computational load, scalable, and that can communicate with different modules thanks to an API provided for this purpose. All data collected is disseminated to facilitate the implementation of improvements globally throughout the system, and thus avoid mismatches between layers, which combined with the techniques applied to deal with fault tolerance, makes the system robust and with high availability. On the other hand, the developed framework includes a task scheduler capable of managing the launch of applications, their migration between nodes, as well as the possibility of dynamically increasing or decreasing the number of processes. All these thanks to the cooperation with other modules that are integrated into LIMITLESS, and whose objective is to optimize the execution of a stack of applications based on multi-criteria policies. This scheduling mode is called coarse-grain scheduling based on monitoring. For better performance and in order to further reduce the overhead during the monitorization, different optimizations have been applied at different levels to try to reduce communications between components, while trying to avoid the loss of information. To achieve this objective, data filtering techniques, Machine Learning (ML) algorithms, and Neural Networks (NN) have been used. In order to improve the scheduling process and to design new multi-criteria scheduling policies, the monitoring information has been combined with other ML algorithms to identify (through classification algorithms) the applications and their execution phases, doing offline profiling. Thanks to this feature, LIMITLESS can detect which phase is executing an application and tries to share the computational resources with other applications that are compatible (there is no performance degradation between them when both are running at the same time). This feature is called fine-grain scheduling, and can reduce the makespan of the use cases while makes efficient use of the computational resources that other applications do not use.El ecosistema global de las tecnologías de la información (IT) se encuentra en transición a una nueva generación de aplicaciones que requieren sistemas de adquisición de datos, procesamiento y almacenamiento cada vez más intensivo. Como resultado de ese cambio hacia la computación intensiva de datos, existe una superposición, cada vez mayor, entre la computación de alto rendimiento (HPC) y las técnicas Big Data en las aplicaciones, pues muchas aplicaciones HPC producen grandes volúmenes de datos, y Big Data necesita capacidades HPC. La hipótesis de esta tesis es que hay un gran potencial en la interoperabilidad y convergencia de los sistemas HPC y Big Data, siendo crucial para el futuro tratar una unificación de ambos para hacer frente a un amplio espectro de problemas de investigación. Por lo tanto, el objetivo principal de esta tesis es la propuesta y desarrollo de un sistema de monitorización que facilite la convergencia de los paradigmas HPC y Big Data gracias a la provisión de datos sobre el comportamiento de las aplicaciones en un entorno en el que se pueden ejecutar aplicaciones de ambos mundos, ofreciendo información útil para mejorar la escalabilidad, la explotación de la localidad de datos y la adaptabilidad en los computadores de gran escala. Para lograr este objetivo, el foco se ha centrado en el diseño de mecanismos de monitorización y localización de recursos para explotar el paralelismo en todos los niveles de la pila del software. El resultado es un framework de monitorización en dos niveles (tanto a nivel de nodo como de aplicación) con una baja carga computacional, escalable, y que se puede comunicar con distintos módulos gracias a una API proporcionada para tal objetivo. Todos datos recolectados se difunden para facilitar la realización de mejoras de manera global en todo el sistema, y así evitar desajustes entre capas, lo que combinado con las técnicas aplicadas para lidiar con la tolerancia a fallos, hace que el sistema sea robusto y con una alta disponibilidad. Por otro lado, el framework desarrollado incluye un planificador de tareas capaz de gestionar el lanzamiento de aplicaciones, la migración de las mismas entre nodos, además de la posibilidad de incrementar o disminuir su número de procesos de forma dinámica. Todo ello gracias a la cooperación con otros módulos que se integran en LIMITLESS, y cuyo objetivo es optimizar la ejecución de una pila de aplicaciones en base a políticas multicriterio. Esta funcionalidad se llama planificación de grano grueso. Para un mejor desempeño y con el objetivo de reducir más aún la carga durante la ejecución, se han aplicado distintas optimizaciones en distintos niveles para tratar de reducir las comunicaciones entre componentes, a la vez que se trata de evitar la pérdida de información. Para lograr este objetivo se ha hecho uso de técnicas de filtrado de datos, algoritmos de Machine Learning (ML), y Redes Neuronales (NN). Finalmente, para obtener mejores resultados en la planificación de aplicaciones y para diseñar nuevas políticas de planificación multi-criterio, los datos de monitorización recolectados han sido combinados con nuevos algoritmos de ML para identificar (por medio de algoritmos de clasificación) aplicaciones y sus fases de ejecución. Todo ello realizando tareas de profiling offline. Gracias a estas técnicas, LIMITLESS puede detectar en qué fase de su ejecución se encuentra una determinada aplicación e intentar compartir los recursos de computacionales con otras aplicaciones que sean compatibles (no se produce una degradación del rendimiento entre ellas cuando ambas se ejecutan a la vez en el mismo nodo). Esta funcionalidad se llama planificación de grano fino y puede reducir el tiempo total de ejecución de la pila de aplicaciones en los casos de uso porque realiza un uso más eficiente de los recursos de las máquinas.This PhD dissertation has been partially supported by the Spanish Ministry of Science and Innovation under an FPI fellowship associated to a National Project with reference TIN2016-79637-P (from July 1, 2018 to October 10, 2021)Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Félix García Carballeira.- Secretario: Pedro Ángel Cuenca Castillo.- Vocal: María Cristina V. Marinesc

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

    The Inter-cloud meta-scheduling

    Get PDF
    Inter-cloud is a recently emerging approach that expands cloud elasticity. By facilitating an adaptable setting, it purposes at the realization of a scalable resource provisioning that enables a diversity of cloud user requirements to be handled efficiently. This study’s contribution is in the inter-cloud performance optimization of job executions using metascheduling concepts. This includes the development of the inter-cloud meta-scheduling (ICMS) framework, the ICMS optimal schemes and the SimIC toolkit. The ICMS model is an architectural strategy for managing and scheduling user services in virtualized dynamically inter-linked clouds. This is achieved by the development of a model that includes a set of algorithms, namely the Service-Request, Service-Distribution, Service-Availability and Service-Allocation algorithms. These along with resource management optimal schemes offer the novel functionalities of the ICMS where the message exchanging implements the job distributions method, the VM deployment offers the VM management features and the local resource management system details the management of the local cloud schedulers. The generated system offers great flexibility by facilitating a lightweight resource management methodology while at the same time handling the heterogeneity of different clouds through advanced service level agreement coordination. Experimental results are productive as the proposed ICMS model achieves enhancement of the performance of service distribution for a variety of criteria such as service execution times, makespan, turnaround times, utilization levels and energy consumption rates for various inter-cloud entities, e.g. users, hosts and VMs. For example, ICMS optimizes the performance of a non-meta-brokering inter-cloud by 3%, while ICMS with full optimal schemes achieves 9% optimization for the same configurations. The whole experimental platform is implemented into the inter-cloud Simulation toolkit (SimIC) developed by the author, which is a discrete event simulation framework

    High Performance Computing Facility Operational Assessment, FY 2011 Oak Ridge Leadership Computing Facility

    Get PDF
    Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) continues to deliver the most powerful resources in the U.S. for open science. At 2.33 petaflops peak performance, the Cray XT Jaguar delivered more than 1.4 billion core hours in calendar year (CY) 2011 to researchers around the world for computational simulations relevant to national and energy security; advancing the frontiers of knowledge in physical sciences and areas of biological, medical, environmental, and computer sciences; and providing world-class research facilities for the nation's science enterprise. Users reported more than 670 publications this year arising from their use of OLCF resources. Of these we report the 300 in this review that are consistent with guidance provided. Scientific achievements by OLCF users cut across all range scales from atomic to molecular to large-scale structures. At the atomic scale, researchers discovered that the anomalously long half-life of Carbon-14 can be explained by calculating, for the first time, the very complex three-body interactions between all the neutrons and protons in the nucleus. At the molecular scale, researchers combined experimental results from LBL's light source and simulations on Jaguar to discover how DNA replication continues past a damaged site so a mutation can be repaired later. Other researchers combined experimental results from ORNL's Spallation Neutron Source and simulations on Jaguar to reveal the molecular structure of ligno-cellulosic material used in bioethanol production. This year, Jaguar has been used to do billion-cell CFD calculations to develop shock wave compression turbo machinery as a means to meet DOE goals for reducing carbon sequestration costs. General Electric used Jaguar to calculate the unsteady flow through turbo machinery to learn what efficiencies the traditional steady flow assumption is hiding from designers. Even a 1% improvement in turbine design can save the nation billions of gallons of fuel

    Scalable Observation, Analysis, and Tuning for Parallel Portability in HPC

    Get PDF
    It is desirable for general productivity that high-performance computing applications be portable to new architectures, or can be optimized for new workflows and input types, without the need for costly code interventions or algorithmic re-writes. Parallel portability programming models provide the potential for high performance and productivity, however they come with a multitude of runtime parameters that can have significant impact on execution performance. Selecting the optimal set of parameters, so that HPC applications perform well in different system environments and on different input data sets, is not trivial.This dissertation maps out a vision for addressing this parallel portability challenge, and then demonstrates this plan through an effective combination of observability, analysis, and in situ machine learning techniques. A platform for general-purpose observation in HPC contexts is investigated, along with support for its use in human-in-the-loop performance understanding and analysis. The dissertation culminates in a demonstration of lessons learned in order to provide automated tuning of HPC applications utilizing parallel portability frameworks

    Reinforcing Digital Trust for Cloud Manufacturing Through Data Provenance Using Ethereum Smart Contracts

    Get PDF
    Cloud Manufacturing(CMfg) is an advanced manufacturing model that caters to fast-paced agile requirements (Putnik, 2012). For manufacturing complex products that require extensive resources, manufacturers explore advanced manufacturing techniques like CMfg as it becomes infeasible to achieve high standards through complete ownership of manufacturing artifacts (Kuan et al., 2011). CMfg, with other names such as Manufacturing as a Service (MaaS) and Cyber Manufacturing (NSF, 2020), addresses the shortcoming of traditional manufacturing by building a virtual cyber enterprise of geographically distributed entities that manufacture custom products through collaboration. With manufacturing venturing into cyberspace, Digital Trust issues concerning product quality, data, and intellectual property security, become significant concerns (R. Li et al., 2019). This study establishes a trust mechanism through data provenance for ensuring digital trust between various stakeholders involved in CMfg. A trust model with smart contracts built on the Ethereum blockchain implements data provenance in CMfg. The study covers three data provenance models using Ethereum smart contracts for establishing digital trust in CMfg. These are Product Provenance, Order Provenance, and Operational Provenance. The models of provenance together address the most important questions regarding CMfg: What goes into the product, who manufactures the product, who transports the products, under what conditions the products are manufactured, and whether regulatory constraints/requisites are met
    corecore