Search CORE

433,453 research outputs found

Development of an oceanographic application in HPC

Author: Basciano Davide
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

High Performance Computing (HPC) is used for running advanced application programs efficiently, reliably, and quickly. In earlier decades, performance analysis of HPC applications was evaluated based on speed, scalability of threads, memory hierarchy. Now, it is essential to consider the energy or the power consumed by the system while executing an application. In fact, the High Power Consumption (HPC) is one of biggest problems for the High Performance Computing (HPC) community and one of the major obstacles for exascale systems design. The new generations of HPC systems intend to achieve exaflop performances and will demand even more energy to processing and cooling. Nowadays, the growth of HPC systems is limited by energy issues Recently, many research centers have focused the attention on doing an automatic tuning of HPC applications which require a wide study of HPC applications in terms of power efficiency. In this context, this paper aims to propose the study of an oceanographic application, named OceanVar, that implements Domain Decomposition based 4D Variational model (DD-4DVar), one of the most commonly used HPC applications, going to evaluate not only the classic aspects of performance but also aspects related to power efficiency in different case of studies. These work were realized at Bsc (Barcelona Supercomputing Center), Spain within the Mont-Blanc project, performing the test first on HCA server with Intel technology and then on a mini-cluster Thunder with ARM technology. In this work of thesis it was initially explained the concept of assimilation date, the context in which it is developed, and a brief description of the mathematical model 4DVAR. After this problem’s close examination, it was performed a porting from Matlab description of the problem of data-assimilation to its sequential version in C language. Secondly, after identifying the most onerous computational kernels in order of time, it has been developed a parallel version of the application with a parallel multiprocessor programming style, using the MPI (Message Passing Interface) protocol. The experiments results, in terms of performance, have shown that, in the case of running on HCA server, an Intel architecture, values of efficiency of the two most onerous functions obtained, growing the number of process, are approximately equal to 80%. In the case of running on ARM architecture, specifically on Thunder mini-cluster, instead, the trend obtained is labeled as "SuperLinear Speedup" and, in our case, it can be explained by a more efficient use of resources (cache memory access) compared with the sequential case. In the second part of this paper was presented an analysis of the some issues of this application that has impact in the energy efficiency. After a brief discussion about the energy consumption characteristics of the Thunder chip in technological landscape, through the use of a power consumption detector, the Yokogawa Power Meter, values of energy consumption of mini-cluster Thunder were evaluated in order to determine an overview on the power-to-solution of this application to use as the basic standard for successive analysis with other parallel styles. Finally, a comprehensive performance evaluation, targeted to estimate the goodness of MPI parallelization, is conducted using a suitable performance tool named Paraver, developed by BSC. Paraver is such a performance analysis and visualisation tool which can be used to analyse MPI, threaded or mixed mode programmes and represents the key to perform a parallel profiling and to optimise the code for High Performance Computing. A set of graphical representation of these statistics make it easy for a developer to identify performance problems. Some of the problems that can be easily identified are load imbalanced decompositions, excessive communication overheads and poor average floating operations per second achieved. Paraver can also report statistics based on hardware counters, which are provided by the underlying hardware. This project aimed to use Paraver configuration files to allow certain metrics to be analysed for this application. To explain in some way the performance trend obtained in the case of analysis on the mini-cluster Thunder, the tracks were extracted from various case of studies and the results achieved is what expected, that is a drastic drop of cache misses by the case ppn (process per node) = 1 to case ppn = 16. This in some way explains a more efficient use of cluster resources with an increase of the number of processes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Investigating data throughput and partial dynamic reconfiguration in a commodity FPGA cluster framework

Author: Palladino Nicholas
Publication venue: RIT Scholar Works
Publication date: 01/08/2011
Field of study

There are many computational kernels where parallelism can be exploited in applica- tion specific hardware, yielding significant speedup over a general purpose processor based solution. Commodity cluster computing technologies have been combined with FPGA co- processors, resulting in even greater performance capability through the exploitation of multiple levels of parallelism. One particularly economic solution both in terms of cost and power consumption is to cluster hybrid FPGAs with commodity network intercon- nects. Hybrid FPGAs combine embedded microprocessors with reconfigurable hardware resources on a single chip offering lower power consumption and cost compared to a tra- ditional I/O bus FPGA coprocessor solution. While there is a lot of promise in using com- modity hybrid FPGAs in a cluster configuration, the design flow and performance char- acteristics of such systems are currently a limiting factor to the range of applications that could benefit from such a system. The contribution of this thesis is a framework for clustering commodity FPGAs which integrates high speed DMA data transfers with a flexible FPGA resource sharing scheme enabled through partial reconfiguration. The framework includes an embedded Linux op- erating system, with a custom device driver to manage data transfers and hardware recon- figuration. User space tools for cluster computing including ssh and MPI are deployed allowing tasks to be split among nodes in the cluster. Performance analysis is performed with a homogeneous cluster composed of four Virtex-5 FXT based FPGA boards. The results demonstrate the advantages over previous work in terms of data throughput and reconfiguration, as well as promote future research efforts

RIT Scholar Works

Wide-Area Measurement Application and Power System Dynamics

Author: Chen Lang
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2011
Field of study

Frequency monitoring network (FNET) is a GPS-synchronized distribution-level phasor measurement system. It is a powerful synchronized monitoring network for large-area power systems that provides significant information and data for power system situational awareness, real time and post-event analysis, and other important aspects of bulk systems. This work explored FNET measurements and utilized them for different applications and power system analysis. An island system was built and validated with FNET measurements to study the stability of the OTEC integration. FNET measurements were also used to validate a large system model like the U.S. Eastern Interconnection. It tries to match the simulation result and frequency measurement of a real event by adjusting the simulation model. The system model is tuned with the combination of different impact factors for different confirmed actual events, and some general rules and specific tuning quantities were concluded from the model validation process. This work also investigated the behavior of the power system frequency during large-scale, synchronous societal events, like the World Cup, Super Bowl and Royal Wedding. It is apparent that large groups of people engaging in the same event at roughly the same time can have significant impacts on the power grid frequency. The systematic analysis of the accumulating and statistical FNET frequency data presents an incisive point of view on the power grid frequency behavior during such events. To better understanding of system events recorded by FNET, a visualization tool was developed to visualize major events that occurred in the North American power grid. The measurement plot combined with the geographical contour map provides intuitive visualization of the event. Finally, the EI system was simplified and clustered into four groups based on FNET measurements and simulation results of generator trip cases. The generation and load capacity of each cluster was calculated based on the clustering result and simulation model, and a flow diagram of this simplified EI system was demonstrated with clusters and power flow between them

University of Tennessee, Knoxville: Trace

A transprecision floating-point cluster for efficient near-sensor data analytics

Author: Benatti Simone
Benini Luca
Garofalo Angelo
Mach Stefan
Montagna Fabio
Ottavi Gianmarco
Rossi Davide
Tagliavini Giuseppe
Publication venue
Publication date: 27/08/2020
Field of study

Recent applications in the domain of near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose a multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our design - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with near-threshold operation, leading to a highly scalable and versatile system. We perform an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, with the aim to identify the most efficient configurations in terms of performance, energy efficiency, and area efficiency. We also provide a full-fledged software stack support, including a parallel runtime and a compilation toolchain, to enable the development of end-to-end applications. We perform an experimental assessment of our design on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. Finally, a comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors

arXiv.org e-Print Archive

Data Mining and Machine Learning Applications of Wide-Area Measurement Data in Electric Power Systems

Author: Markham Penn Norris
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

Wide-area measurement systems (WAMS) are quickly becoming an important part of modern power system operation. By utilizing the Global Positioning System, WAMS offer highly accurate time-synchronized measurements that can reveal previously unobtainable insights into the grid’s status. An example WAMS is the Frequency Monitoring Network (FNET), which utilizes a large number of Internet-connected low-cost Frequency Disturbance Recorders (FDRs) that are installed at the distribution level. The large amounts of data collected by FNET and other WAMS present unique opportunities for data mining and machine learning applications, yet these techniques have only recently been applied in this domain. The research presented here explores some additional applications that may prove useful once WAMS are fully integrated into the power system. Chapter 1 provides a brief overview of the FNET system that supplies the data used for this research. Chapter 2 reviews recent research efforts in the application of data mining and machine learning techniques to wide-area measurement data. In Chapter 3, patterns in frequency extrema in the Eastern and Western Interconnections are explored using cluster analysis. In Chapter 4, an artificial neural network (ANN)-based classifier is presented that can reliably distinguish between different types of power system disturbances based solely on their frequency signatures. Chapter 5 presents a technique for constructing electromechanical transient speed maps for large power systems using FNET data from previously detected events. Chapter 6 describes an object-oriented software framework useful for developing FNET data analysis applications. In the United States, recent environmental regulations will likely result in the removal of nearly 30 GW of oil and coal-fired generation from the grid, mostly in the Eastern Interconnection (EI). The effects of this transition on voltage stability and transmission line flows have previously not been studied from a system-wide perspective. Chapter 7 discusses the results of power flow studies designed to simulate the evolution of the EI over the next few years as traditional generation sources are replaced with greener ones such as natural gas and wind. Conclusions, a summary of the main contributions of this work, and a discussion of possible future research topics are given in Chapter 8

University of Tennessee, Knoxville: Trace

A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

Author: Cecilia Canales José María
García Carrasco José Manuel
Guerrero Ginés David
Imbernón Tudela Baldomero
Pérez Sánchez Horacio
Sanz Francisco
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC) platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs) has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO). This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor.Ingeniería, Industria y Construcció

Institutional Repository UCAM

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Repositorio Académico de la Universidad de Chile

Parallel detrended fluctuation analysis for fast event detection on massive PMU data

Author: Khan M
Ashton PM
Li M
Taylor GA
Pisica I
Liu J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2000
Field of study

("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment

Crossref

Brunel University Research Archive

Parallel detrended fluctuation analysis for fast event detection on massive PMU data

Author: Ashton PM
Khan M
Li M
Liu J
Pisica I
Taylor GA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Crossref

Brunel University Research Archive

Submodular Load Clustering with Robust Principal Component Analysis

Author: Duan Jiajun
Lu Xiao
Shi Di
Wang Yishen
Wang Zhiwei
Xu Yiran
Yi Zhehan
Publication venue
Publication date: 19/02/2019
Field of study

Traditional load analysis is facing challenges with the new electricity usage patterns due to demand response as well as increasing deployment of distributed generations, including photovoltaics (PV), electric vehicles (EV), and energy storage systems (ESS). At the transmission system, despite of irregular load behaviors at different areas, highly aggregated load shapes still share similar characteristics. Load clustering is to discover such intrinsic patterns and provide useful information to other load applications, such as load forecasting and load modeling. This paper proposes an efficient submodular load clustering method for transmission-level load areas. Robust principal component analysis (R-PCA) firstly decomposes the annual load profiles into low-rank components and sparse components to extract key features. A novel submodular cluster center selection technique is then applied to determine the optimal cluster centers through constructed similarity graph. Following the selection results, load areas are efficiently assigned to different clusters for further load analysis and applications. Numerical results obtained from PJM load demonstrate the effectiveness of the proposed approach.Comment: Accepted by 2019 IEEE PES General Meeting, Atlanta, G

arXiv.org e-Print Archive

Crossref

Clustering analysis of railway driving missions with niching

Author: Jaafar Amine
Roboam Xavier
Sareni Bruno
Publication venue: 'Emerald'
Publication date: 01/01/2012
Field of study

A wide number of applications requires classifying or grouping data into a set of categories or clusters. Most popular clustering techniques to achieve this objective are K-means clustering and hierarchical clustering. However, both of these methods necessitate the a priori setting of the cluster number. In this paper, a clustering method based on the use of a niching genetic algorithm is presented, with the aim of finding the best compromise between the inter-cluster distance maximization and the intra-cluster distance minimization. This method is applied to three clustering benchmarks and to the classification of driving missions for railway applications

Open Archive Toulouse Archive Ouverte

HAL Descartes