962 research outputs found
Optimizing mining rates under financial uncertainty in global mining complexes
AbstractThis paper presents a distributed and dynamic programming framework to the mining production rate target tracking of multiple metal mines under financial uncertainty. A single mine׳s target tracking is stated as a stochastic optimization problem and the solution is obtained by solving the dynamic program which gives the optimal production rate schedule of each mine as a Markovian feedback control on the price process. The global solution is distributed on multiple mines by a policy iteration method, and this iterative method is shown to provide the unique equilibrium among Markovian strategies. Numerical results confirm the efficacy of the proposed global method when compared to individual optimization of mining rate target tracking
The Case for Asymmetric Systolic Array Floorplanning
The widespread proliferation of deep learning applications has triggered the
need to accelerate them directly in hardware. General Matrix Multiplication
(GEMM) kernels are elemental deep-learning constructs and they inherently map
onto Systolic Arrays (SAs). SAs are regular structures that are well-suited for
accelerating matrix multiplications. Typical SAs use a pipelined array of
Processing Elements (PEs), which communicate with local connections and
pre-orchestrated data movements. In this work, we show that the physical layout
of SAs should be asymmetric to minimize wirelength and improve energy
efficiency. The floorplan of the SA adjusts better to the asymmetric widths of
the horizontal and vertical data buses and their switching activity profiles.
It is demonstrated that such physically asymmetric SAs reduce interconnect
power by 9.1% when executing state-of-the-art Convolutional Neural Network
(CNN) layers, as compared to SAs of the same size but with a square (i.e.,
symmetric) layout. The savings in interconnect power translate, in turn, to
2.1% overall power savings.Comment: CNNA 202
Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating
Systolic Array (SA) architectures are well suited for accelerating matrix
multiplications through the use of a pipelined array of Processing Elements
(PEs) communicating with local connections and pre-orchestrated data movements.
Even though most of the dynamic power consumption in SAs is due to
multiplications and additions, pipelined data movement within the SA
constitutes an additional important contributor. The goal of this work is to
reduce the dynamic power consumption associated with the feeding of data to the
SA, by synergistically applying bus-invert coding and zero-value clock gating.
By exploiting salient attributes of state-of-the-art CNNs, such as the value
distribution of the weights, the proposed SA applies appropriate encoding only
to the data that exhibits high switching activity. Similarly, when one of the
inputs is zero, unnecessary operations are entirely skipped. This selectively
targeted, application-aware encoding approach is demonstrated to reduce the
dynamic power consumption of data streaming in CNN applications using Bfloat16
arithmetic by 1%-19%. This translates to an overall dynamic power reduction of
6.2%-9.4%.Comment: International Conference on Modern Circuits and Systems Technologies
(MOCAST
IndexMAC: A Custom RISC-V Vector Instruction to Accelerate Structured-Sparse Matrix Multiplications
Structured sparsity has been proposed as an efficient way to prune the
complexity of modern Machine Learning (ML) applications and to simplify the
handling of sparse data in hardware. The acceleration of ML models - for both
training and inference - relies primarily on equivalent matrix multiplications
that can be executed efficiently on vector processors or custom matrix engines.
The goal of this work is to incorporate the simplicity of structured sparsity
into vector execution, thereby accelerating the corresponding matrix
multiplications. Toward this objective, a new vector index-multiply-accumulate
instruction is proposed, which enables the implementation of lowcost indirect
reads from the vector register file. This reduces unnecessary memory traffic
and increases data locality. The proposed new instruction was integrated in a
decoupled RISCV vector processor with negligible hardware cost. Extensive
evaluation demonstrates significant speedups of 1.80x-2.14x, as compared to
state-of-the-art vectorized kernels, when executing layers of varying sparsity
from state-of-the-art Convolutional Neural Networks (CNNs).Comment: DATE 202
ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
Convolutional Neural Networks (CNNs) are the state-of-the-art solution for
many deep learning applications. For maximum scalability, their computation
should combine high performance and energy efficiency. In practice, the
convolutions of each CNN layer are mapped to a matrix multiplication that
includes all input features and kernels of each layer and is computed using a
systolic array. In this work, we focus on the design of a systolic array with
configurable pipeline with the goal to select an optimal pipeline configuration
for each CNN layer. The proposed systolic array, called ArrayFlex, can operate
in normal, or in shallow pipeline mode, thus balancing the execution time in
cycles and the operating clock frequency. By selecting the appropriate pipeline
configuration per CNN layer, ArrayFlex reduces the inference latency of
state-of-the-art CNNs by 11%, on average, as compared to a traditional
fixed-pipeline systolic array. Most importantly, this result is achieved while
using 13%-23% less power, for the same applications, thus offering a combined
energy-delay-product efficiency between 1.4x and 1.8x.Comment: DATE 202
Real-time ECG Monitoring using Compressive sensing on a Heterogeneous Multicore Edge-Device
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.In a typical ambulatory health monitoring systems, wearable medical sensors
are deployed on the human body to continuously collect and transmit physiological
signals to a nearby gateway that forward the measured data to the
cloud-based healthcare platform. However, this model often fails to respect the
strict requirements of healthcare systems. Wearable medical sensors are very
limited in terms of battery lifetime, in addition, the system reliance on a cloud
makes it vulnerable to connectivity and latency issues. Compressive sensing
(CS) theory has been widely deployed in electrocardiogramme ECG monitoring
application to optimize the wearable sensors power consumption. The proposed
solution in this paper aims to tackle these limitations by empowering a gatewaycentric
connected health solution, where the most power consuming tasks are
performed locally on a multicore processor. This paper explores the efficiency
of real-time CS-based recovery of ECG signals on an IoT-gateway embedded
with ARM’s big.littleTM multicore for different signal dimension and allocated
computational resources. Experimental results show that the gateway is able
to reconstruct ECG signals in real-time. Moreover, it demonstrates that using
a high number of cores speeds up the execution time and it further optimizes
energy consumption. The paper identifies the best configurations of resource
allocation that provides the optimal performance. The paper concludes that
multicore processors have the computational capacity and energy efficiency to
promote gateway-centric solution rather than cloud-centric platforms
Functional trait variation among and within species and plant functional types in mountainous Mediterranean forests
Plant structural and biochemical traits are frequently used to characterise the life history of plants. Although some common patterns of trait covariation have been identified, recent studies suggest these patterns of covariation may differ with growing location and/or plant functional type (PFT). Mediterranean forest tree/shrub species are often divided into three PFTs based on their leaf habit and form, being classified as either needleleaf evergreen (Ne), broadleaf evergreen (Be), or broadleaf deciduous (Bd). Working across 61 mountainous Mediterranean forest sites of contrasting climate and soil type, we sampled and analysed 626 individuals in order to evaluate differences in key foliage trait covariation as modulated by growing conditions both within and between the Ne, Be, and Bd functional types. We found significant differences between PFTs for most traits. When considered across PFTs and by ignoring intraspecific variation, three independent functional dimensions supporting the Leaf-Height-Seed framework were identified. Some traits illustrated a common scaling relationship across and within PFTs, but others scaled differently when considered across PFTs or even within PFTs. For most traits much of the observed variation was attributable to PFT identity and not to growing location, although for some traits there was a strong environmental component and considerable intraspecific and residual variation. Nevertheless, environmental conditions as related to water availability during the dry season and to a smaller extend to soil nutrient status and soil texture, clearly influenced trait values. When compared across species, about half of the trait-environment relationships were species-specific. Our study highlights the importance of the ecological scale within which trait covariation is considered and suggests that at regional to local scales, common trait-by-trait scaling relationships should be treated with caution. PFT definitions by themselves can potentially be an important predictor variable when inferring one trait from another. These findings have important implications for local scale dynamic vegetation models
Ambipolar charge injection and transport in a single pentacene monolayer island
Electrons and holes are locally injected in a single pentacene monolayer
island. The two-dimensional distribution and concentration of the injected
carriers are measured by electrical force microscopy. In crystalline monolayer
islands, both carriers are delocalized over the whole island. On disordered
monolayer, carriers stay localized at their injection point. These results
provide insight into the electronic properties, at the nanometer scale, of
organic monolayers governing performances of organic transistors and molecular
devices.Comment: To be published in Nano Letter
Exsolution-enhanced reverse water-gas shift chemical looping activity of Sr2FeMo0.6Ni0.4O6-δ double perovskite
This study investigates the structural evolution and redox characteristics of the double perovskite Sr2FeMo0.6Ni0.4O6-δ (SFMN) during hydrogen (H2) and carbon dioxide (CO2) redox cycles and explores the material performance in the Reverse Water-Gas Shift Chemical Looping (RWGS-CL) reaction. In-situ and ex-situ X-Ray Diffraction (XRD) and High-Resolution Transmission Electron Microscopy (HRTEM) studies reveal that H2 reduction at temperatures above 800 °C leads to the exsolution of bimetallic Ni-Fe alloy particles and the formation of a Ruddlesden-Popper (RP) phase. A core–shell structure with Ni-Fe core and a perovskite oxide shell is formed with subsequent redox cycles, and the resulting material exhibits better performance and high stability in the RWGS-CL process. Thermogravimetric (TGA) and Temperature Programmed Reduction (TPR) and Oxidation (TPO) analyses show that the optimal reduction and oxidation temperatures for maximizing the CO yield are around 850 °C and 750 °C respectively, and that the cycled material is able to work steadily under isothermal conditions at 850 °C
- …