45,518 research outputs found
On the Exact Solution to a Smart Grid Cyber-Security Analysis Problem
This paper considers a smart grid cyber-security problem analyzing the
vulnerabilities of electric power networks to false data attacks. The analysis
problem is related to a constrained cardinality minimization problem. The main
result shows that an relaxation technique provides an exact optimal
solution to this cardinality minimization problem. The proposed result is based
on a polyhedral combinatorics argument. It is different from well-known results
based on mutual coherence and restricted isometry property. The results are
illustrated on benchmarks including the IEEE 118-bus and 300-bus systems
A block-diagonal structured model reduction scheme for power grid networks
We propose a block-diagonal structured model order reduction (BDSM) scheme for fast power grid analysis. Compared with existing power grid model order reduction (MOR) methods, BDSM has several advantages. First, unlike many power grid reductions that are based on terminal reduction and thus error-prone, BDSM utilizes an exact column-by-column moment matching to provide higher numerical accuracy. Second, with similar accuracy and macromodel size, BDSM generates very sparse block-diagonal reduced-order models (ROMs) for massive-port systems at a lower cost, whereas traditional algorithms such as PRIMA produce full dense models inefficient for the subsequent simulation. Third, different from those MOR schemes based on extended Krylov subspace (EKS) technique, BDSM is input-signal independent, so the resulting ROM is reusable under different excitations. Finally, due to its blockdiagonal structure, the obtained ROM can be simulated very fast. The accuracy and efficiency of BDSM are verified by industrial power grid benchmarks. © 2011 EDAA.published_or_final_versionDesign, Automation and Test in Europe Conference and Exhibition (DATE 2011), Grenoble, France, 14-18 March 2011. In Design, Automation, and Test in Europe Conference and Exhibition Proceedings, 2011, p. 44-4
PowerPlanningDL: Reliability-Aware Framework for On-Chip Power Grid Design using Deep Learning
With the increase in the complexity of chip designs, VLSI physical design has
become a time-consuming task, which is an iterative design process. Power
planning is that part of the floorplanning in VLSI physical design where power
grid networks are designed in order to provide adequate power to all the
underlying functional blocks. Power planning also requires multiple iterative
steps to create the power grid network while satisfying the allowed worst-case
IR drop and Electromigration (EM) margin. For the first time, this paper
introduces Deep learning (DL)-based framework to approximately predict the
initial design of the power grid network, considering different reliability
constraints. The proposed framework reduces many iterative design steps and
speeds up the total design cycle. Neural Network-based multi-target regression
technique is used to create the DL model. Feature extraction is done, and the
training dataset is generated from the floorplans of some of the power grid
designs extracted from the IBM processor. The DL model is trained using the
generated dataset. The proposed DL-based framework is validated using a new set
of power grid specifications (obtained by perturbing the designs used in the
training phase). The results show that the predicted power grid design is
closer to the original design with minimal prediction error (~2%). The proposed
DL-based approach also improves the design cycle time with a speedup of ~6X for
standard power grid benchmarks.Comment: Published in proceedings of IEEE/ACM Design, Automation and Test in
Europe Conference (DATE) 2020, 6 page
Load-Varying LINPACK: A Benchmark for Evaluating Energy Efficiency in High-End Computing
For decades, performance has driven the high-end computing (HEC) community. However, as highlighted in recent exascale studies that chart a path from petascale to exascale computing, power consumption is fast becoming the major design constraint in HEC. Consequently, the HEC community needs to address this issue in future petascale and exascale computing systems.
Current scientific benchmarks, such as LINPACK and SPEChpc, only evaluate HEC systems when running at full throttle, i.e., 100% workload, resulting in a focus on performance and ignoring the issues of power and energy consumption. In contrast, efforts like SPECpower evaluate the energy efficiency of a compute server at varying workloads. This is analogous to evaluating the energy efficiency (i.e., fuel efficiency) of an automobile at varying speeds (e.g., miles per gallon highway versus city). SPECpower, however, only evaluates the energy efficiency of a single compute server rather than a HEC system; furthermore, it is based on SPEC's Java Business Benchmarks (SPECjbb) rather than a scientific benchmark. Given the absence of a load-varying scientific benchmark to evaluate the energy efficiency of HEC systems at different workloads, we propose the load-varying LINPACK (LV-LINPACK) benchmark. In this paper, we identify application parameters that affect performance and provide a methodology to vary the workload of LINPACK, thus enabling a more rigorous study of energy efficiency in supercomputers, or more generally, HEC
Performance analysis and optimization of automotive GPUs
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) have drastically increased the performance demands of automotive systems. Suitable highperformance platforms building upon Graphic Processing Units (GPUs) have been developed to respond to this demand, being NVIDIA Jetson TX2 a relevant representative. However, whether high-performance GPU configurations are appropriate for automotive setups remains as an open question. This paper aims at providing light on this question by modelling an automotive GPU (Jetson TX2), analyzing its microarchitectural parameters against relevant benchmarks, and identifying specific configurations able to meaningfully increase performance within similar cost envelopes, or to decrease costs preserving original performance levels. Overall, our analysis opens the door to the optimization of automotive GPUs for further system efficiency.This work has been partially supported by the Spanish
Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council
(ERC) under the European Union’s Horizon 2020 research
and innovation programme (grant agreement No. 772773) and
the HiPEAC Network of Excellence. Pedro Benedicte and
Jaume Abella have been partially supported by the MINECO
under FPU15/01394 grant and Ramon y Cajal postdoctoral fellowship number RYC-2013-14717 respectively and Leonidas
Kosmidis under Juan de la Cierva-Formacin postdoctoral fellowship (FJCI-2017-34095).Peer ReviewedPostprint (author's final draft
MATEX: A Distributed Framework for Transient Simulation of Power Distribution Networks
We proposed MATEX, a distributed framework for transient simulation of power
distribution networks (PDNs). MATEX utilizes matrix exponential kernel with
Krylov subspace approximations to solve differential equations of linear
circuit. First, the whole simulation task is divided into subtasks based on
decompositions of current sources, in order to reduce the computational
overheads. Then these subtasks are distributed to different computing nodes and
processed in parallel. Within each node, after the matrix factorization at the
beginning of simulation, the adaptive time stepping solver is performed without
extra matrix re-factorizations. MATEX overcomes the stiff-ness hinder of
previous matrix exponential-based circuit simulator by rational Krylov subspace
method, which leads to larger step sizes with smaller dimensions of Krylov
subspace bases and highly accelerates the whole computation. MATEX outperforms
both traditional fixed and adaptive time stepping methods, e.g., achieving
around 13X over the trapezoidal framework with fixed time step for the IBM
power grid benchmarks.Comment: ACM/IEEE DAC 2014. arXiv admin note: substantial text overlap with
arXiv:1505.0669
Correlated Resource Models of Internet End Hosts
Understanding and modelling resources of Internet end hosts is essential for
the design of desktop software and Internet-distributed applications. In this
paper we develop a correlated resource model of Internet end hosts based on
real trace data taken from the SETI@home project. This data covers a 5-year
period with statistics for 2.7 million hosts. The resource model is based on
statistical analysis of host computational power, memory, and storage as well
as how these resources change over time and the correlations between them. We
find that resources with few discrete values (core count, memory) are well
modeled by exponential laws governing the change of relative resource
quantities over time. Resources with a continuous range of values are well
modeled with either correlated normal distributions (processor speed for
integer operations and floating point operations) or log-normal distributions
(available disk space). We validate and show the utility of the models by
applying them to a resource allocation problem for Internet-distributed
applications, and demonstrate their value over other models. We also make our
trace data and tool for automatically generating realistic Internet end hosts
publicly available
Get Out of the Valley: Power-Efficient Address Mapping for GPUs
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability.
To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31 x and power-efficiency by 1.25 x compared to state-of-the-art permutation-based address mapping
- …