2,294 research outputs found

    On the co-design of scientific applications and long vector architectures

    Get PDF
    The landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. To improve the efficiency of next-generation compute devices, architects are looking for solutions beyond the commodity CPU approach. In 2021, the five most powerful supercomputers in the world use either GP-GPU (General-purpose computing on graphics processing units) accelerators or a customized CPU specially designed to target HPC applications. This trend is only expected to grow in the next years motivated by the compute demands of science and industry. As architectures evolve, the ecosystem of tools and applications must follow. The choices in the number of cores in a socket, the floating point-units per core and the bandwidth through the memory hierarchy among others, have a large impact in the power consumption and compute capabilities of the devices. To balance CPU and accelerators, designers require accurate tools for analyzing and predicting the impact of new architectural features on the performance of complex scientific applications at scale. In such a large design space, capturing and modeling with simulators the complex interactions between the system software and hardware components is a defying challenge. Moreover, applications must be able to exploit those designs with aggressive compute capabilities and memory bandwidth configurations. Algorithms and data structures will need to be redesigned accordingly to expose a high degree of data-level parallelism allowing them to scale in large systems. Therefore, next-generation computing devices will be the result of a co-design effort in hardware and applications supported by advanced simulation tools. In this thesis, we focus our work on the co-design of scientific applications and long vector architectures. We significantly extend a multi-scale simulation toolchain enabling accurate performance and power estimations of large-scale HPC systems. Through simulation, we explore the large design space in current HPC trends over a wide range of applications. We extract speedup and energy consumption figures analyzing the trade-offs and optimal configurations for each of the applications. We describe in detail the optimization process of two challenging applications on real vector accelerators, achieving outstanding operation performance and full memory bandwidth utilization. Overall, we provide evidence-based architectural and programming recommendations that will serve as hardware and software co-design guidelines for the next generation of specialized compute devices.El panorama de las arquitecturas de los sistemas para la Computación de Alto Rendimiento (HPC, de sus siglas en inglés) sigue expandiéndose con nuevas tecnologías y complejidad adicional. Para mejorar la eficiencia de la próxima generación de dispositivos de computación, los arquitectos están buscando soluciones más allá de las CPUs. En 2021, los cinco supercomputadores más potentes del mundo utilizan aceleradores gráficos aplicados a propósito general (GP-GPU, de sus siglas en inglés) o CPUs diseñadas especialmente para aplicaciones HPC. En los próximos años, se espera que esta tendencia siga creciendo motivada por las demandas de más potencia de computación de la ciencia y la industria. A medida que las arquitecturas evolucionan, el ecosistema de herramientas y aplicaciones les debe seguir. Las decisiones eligiendo el número de núcleos por zócalo, las unidades de coma flotante por núcleo y el ancho de banda a través de la jerarquía de memoría entre otros, tienen un gran impacto en el consumo de energía y las capacidades de cómputo de los dispositivos. Para equilibrar las CPUs y los aceleradores, los diseñadores deben utilizar herramientas precisas para analizar y predecir el impacto de nuevas características de la arquitectura en el rendimiento de complejas aplicaciones científicas a gran escala. Dado semejante espacio de diseño, capturar y modelar con simuladores las complejas interacciones entre el software de sistema y los componentes de hardware es un reto desafiante. Además, las aplicaciones deben ser capaces de explotar tales diseños con agresivas capacidades de cómputo y ancho de banda de memoria. Los algoritmos y estructuras de datos deberán ser rediseñadas para exponer un alto grado de paralelismo de datos permitiendo así escalarlos en grandes sistemas. Por lo tanto, la siguiente generación de dispósitivos de cálculo será el resultado de un esfuerzo de codiseño tanto en hardware como en aplicaciones y soportado por avanzadas herramientas de simulación. En esta tesis, centramos nuestro trabajo en el codiseño de aplicaciones científicas y arquitecturas vectoriales largas. Extendemos significativamente una serie de herramientas para la simulación multiescala permitiendo así obtener estimaciones de rendimiento y potencia de sistemas HPC de gran escala. A través de simulaciones, exploramos el gran espacio de diseño de las tendencias actuales en HPC sobre un amplio rango de aplicaciones. Extraemos datos sobre la mejora y el consumo energético analizando las contrapartidas y las configuraciones óptimas para cada una de las aplicaciones. Describimos en detalle el proceso de optimización de dos aplicaciones en aceleradores vectoriales, obteniendo un rendimiento extraordinario a nivel de operaciones y completa utilización del ancho de memoria disponible. Con todo, ofrecemos recomendaciones empíricas a nivel de arquitectura y programación que servirán como instrucciones para diseñar mejor hardware y software para la siguiente generación de dispositivos de cálculo especializados.Postprint (published version

    Evaluation of low-power architectures in a scientific computing environment

    Get PDF
    HPC (High Performance Computing) represents, together with theory and experiments, the third pillar of science. Through HPC, scientists can simulate phenomena otherwise impossible to study. The need of performing larger and more accurate simulations requires to HPC to improve every day. HPC is constantly looking for new computational platforms that can improve cost and power efficiency. The Mont-Blanc project is a EU funded research project that targets to study new hardware and software solutions that can improve efficiency of HPC systems. The vision of the project is to leverage the fast growing market of mobile devices to develop the next generation supercomputers. In this work we contribute to the objectives of the Mont-Blanc project by evaluating performance of production scientific applications on innovative low power architectures. In order to do so, we describe our experiences porting and evaluating sate of the art scientific applications on the Mont-Blanc prototype, the first HPC system built with commodity low power embedded technology. We then extend our study to compare off-the-shelves ARMv8 platforms. We finally discuss the most impacting issues encountered during the development of the Mont-Blanc prototype system

    Double and Multiple Stellar Systems: Observational Techniques, Data Administration and Scientific Results

    Get PDF
    This dissertation, written as a compendium of research articles, was proposed and supervised by J.A. Docobo, Full Professor in Astronomy and Director of the Ramon María Aller Astronomical Observatory of the University of Santiago de Compostela. The focus is on the practical application of speckle interferometry techniques, including the initiation and development of several observational campaigns employing OARMA's eMCCD speckle camera attached to the 2.6m telescope at BAO. Additionally, we present 26 orbits of accessible binaries of the Southern hemisphere based on SOAR speckle data. Also, the orbital information of the double-line spectroscopic binaries, HD 183255, HD 114882, and HD 30712, together with new speckle measurements performed using large telescopes, allowed us to determine the main physical parameters of these systems

    Customer-oriented risk assessment in Network Utilities

    Get PDF
    For companies that distribute services such as telecommunications, water, energy, gas, etc., quality perceived by the customers has a strong impact on the fulfillment of financial goals, positively increasing the demand and negatively increasing the risk of customer churn (loss of customers). Failures by these companies may cause customer affection in a massive way, augmenting the intention to leave the company. Therefore, maintenance performance and specifically service reliability has a strong influence on financial goals. This paper proposes a methodology to evaluate the contribution of the maintenance department in economic terms, based on service unreliability by network failures. The developed methodology aims to provide an analysis of failures to facilitate decision making about maintenance (preventive/predictive and corrective) costs versus negative impacts in end-customer invoicing based on the probability of losing customers. Survival analysis of recurrent failures with the General Renewal Process distribution is used for this novel purpose with the intention to be applied as a standard procedure to calculate the expected maintenance financial impact, for a given period of time. Also, geographical areas of coverage are distinguished, enabling the comparison of different technical or management alternatives. Two case studies in a telecommunications services company are presented in order to illustrate the applicability of the methodology

    Failure mode prediction and energy forecasting of PV plants to assist dynamic maintenance tasks by ANN based models

    Get PDF
    In the field of renewable energy, reliability analysis techniques combining the operating time of the system with the observation of operational and environmental conditions, are gaining importance over time. In this paper, reliability models are adapted to incorporate monitoring data on operating assets, as well as information on their environmental conditions, in their calculations. To that end, a logical decision tool based on two artificial neural networks models is presented. This tool allows updating assets reliability analysis according to changes in operational and/or environmental conditions. The proposed tool could easily be automated within a supervisory control and data acquisition system, where reference values and corresponding warnings and alarms could be now dynamically generated using the tool. Thanks to this capability, on-line diagnosis and/or potential asset degradation prediction can be certainly improved. Reliability models in the tool presented are developed according to the available amount of failure data and are used for early detection of degradation in energy production due to power inverter and solar trackers functional failures. Another capability of the tool presented in the paper is to assess the economic risk associated with the system under existing conditions and for a certain period of time. This information can then also be used to trigger preventive maintenance activities

    Analysis of the impact of the Asset Health Index in a Maintenance Strategy

    Get PDF
    Hosted by the Johannes Kepler University, Linz, Austria. May 23-24, 2019 - European Safety, Reliability & Data Association (ESReDA)During many years, asset management methodologies used in industry were focused on knowing and analysing the operational control of the daily work and the impact of the maintenance on the availability. Later, the costs turn into the priority, and strategies were focused on assesses a longer lifecycle and optimizing processes and contracts. Finally, recent normative have included concepts as “knowing and managing the risks” and the target is to prioritize the maintenance tasks to the critical assets. However, taking a balanced asset management model for the operational environment, quite a lot of facilities of Oil & Gas sector are reaching the end of their initially estimated lifecycle. New challenges are related to extend the life of the main items of the facilities or at least, to find the optimal replacement moment that guarantees that the maintenance strategy is being optimized. Asset Health Index methodology considers a theoretical lifecycle of an item, in which depending on the proximity to the end of the useful life, the probability of failure increases. But take this theoretical lifecycle as a base, different operation location factors or O&M aspects can modify this period. All these factor are quantified and permit us to calculate a new theoretical profile. This paper is about assess the impact of the AHI into the maintenance strategy optimisation. AHI enables us to compare future alternative cost profiles and assess the impact in the failure probability of the item. As a result, we are able to know the risk that is taken when we enlarge the operation of an item, and the impact in the operational costs

    On the role of Prognostics and Health Management in advanced maintenance systems

    Get PDF
    The advanced use of the Information and Communication Technologies is evolving the way that systems are managed and maintained. A great number of techniques and methods have emerged in the light of these advances allowing to have an accurate and knowledge about the systems’ condition evolution and remaining useful life. The advances are recognized as outcomes of an innovative discipline, nowadays discussed under the term of Prognostics and Health Management (PHM). In order to analyze how maintenance will change by using PHM, a conceptual model is proposed built upon three views. The model highlights: (i) how PHM may impact the definition of maintenance policies; (ii) how PHM fits within the Condition Based Maintenance (CBM) and (iii) how PHM can be integrated into Reliability Centered Maintenance (RCM) programs. The conceptual model is the research finding of this review note and helps to discuss the role of PHM in advanced maintenance systems.EU Framework Programme Horizon 2020, 645733 - Sustain-Owner - H2020-MSCA-RISE-201

    A framework for effective management of condition based maintenance programs in the context of industrial development of E-Maintenance strategies

    Get PDF
    CBM (Condition Based Maintenance) solutions are increasingly present in industrial systems due to two main circumstances: rapid evolution, without precedents, in the capture and analysis of data and significant cost reduction of supporting technologies. CBM programs in industrial systems can become extremely complex, especially when considering the effective introduction of new capabilities provided by PHM (Prognostics and Health Management) and E-maintenance disciplines. In this scenario, any CBM solution involves the management of numerous technical aspects, that the maintenance manager needs to understand, in order to be implemented properly and effectively, according to the company’s strategy. This paper provides a comprehensive representation of the key components of a generic CBM solution, this is presented using a framework or supporting structure for an effective management of the CBM programs. The concept “symptom of failure”, its corresponding analysis techniques (introduced by ISO 13379-1 and linked with RCM/FMEA analysis), and other international standard for CBM open-software application development (for instance, ISO 13374 and OSA-CBM), are used in the paper for the development of the framework. An original template has been developed, adopting the formal structure of RCM analysis templates, to integrate the information of the PHM techniques used to capture the failure mode behaviour and to manage maintenance. Finally, a case study describes the framework using the referred template.Gobierno de Andalucía P11-TEP-7303 M

    Criticality Analysis for Maintenance Purposes: A Study for Complex In‐service Engineering Assets

    Get PDF
    The purpose of this paper is to establish a basis for a criticality analysis, considered here as a prerequisite, a first required step to review the current maintenance programs, of complex in‐service engineering assets. Review is understood as a reality check, a testing of whether the current maintenance activities are well aligned to actual business objectives and needs. This paper describes an efficient and rational working process and a model resulting in a hierarchy of assets, based on risk analysis and cost–benefit principles, which will be ranked according to their importance for the business to meet specific goals. Starting from a multicriteria analysis, the proposed model converts relevant criteria impacting equipment criticality into a single score presenting the criticality level. Although detailed implementation of techniques like Root Cause Failure Analysis and Reliability Centered Maintenance will be recommended for further optimization of the maintenance activities, the reasons why criticality analysis deserves the attention of engineers and maintenance and reliability managers are precisely explained here. A case study is presented to help the reader understand the process and to operationalize the mode
    corecore