129 research outputs found

    IOPS: An Unified SpMM Accelerator Based on Inner-Outer-Hybrid Product

    Full text link
    Sparse matrix multiplication (SpMM) is widely applied to numerous domains, such as graph processing, machine learning, and data analytics. However, inner product based SpMM induces redundant zero-element computing for mismatched nonzero operands, while outer product based approach lacks input reuse across Process Elements (PEs) and poor output locality for accumulating partial sum (psum) matrices. Besides, current works only focus on sparse-sparse matrix multiplication (SSMM) or sparse-dense matrix multiplication (SDMM), rarely performing efficiently for both. To address these problems, this paper proposes an unified SpMM accelerator, called IOPS, hybridizing inner with outer products. It reuses the input matrix among PEs with inner product dataflow, and removes zero-element calculations with outer product approach in each PE, which can efficiently process SSMM and SDMM. Moreover, an address mapping method is designed to accumulate the irregular sparse psum matrices, reducing the latency and DRAM access of psum accumulating. Furthermore, an adaptive partition strategy is proposed to tile the input matrices based on their sparsity ratios, effectively utilizing the storage of architecture and reducing DRAM access. Compared with the SSMM accelerator, SpArch, we achieve 1.7x~6.3x energy efficiency and 1.2x~4.4x resource efficiency, with 1.4x~2.1x DRAM access saving

    Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array

    Full text link
    Sparsity is an intrinsic property of convolutional neural network(CNN) and worth exploiting for CNN accelerators, but extra processing comes with hardware overhead, causing many architectures suffering from only minor profit. Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workload under the rigid systolic dataflow, causing performance degradation. Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement. To balance input feature map(IFM) and weight loads across Processing Element(PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation, and co-designed a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. Additionally, Adaptive Dataflow Configuration is applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering 1.17x-1.8x DRAM access compared with Swallow and further reducing system energy consumption. The whole design is implemented on ZynqZCU102 with 200MHz and performs at 471-, 34-, 53- and 191-image/s for AlexNet, VGG-16, ResNet-50 and GoogleNet respectively. Compared against sparse systolic-array-based accelerators, Swallow, FESA and SPOTS, Sense achieves 1x-2.25x, 1.95x-2.5x and 1.17x-2.37x performance improvement on these CNNs respectively with reasonable overhead.Comment: 14 pages, 29 figures, 6 tables, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEM

    BandMap: Application Mapping with Bandwidth Allocation forCoarse-Grained Reconfigurable Array

    Full text link
    This paper proposes an application mapping algorithm, BandMap, for coarse-grained reconfigurable array (CGRA), which allocates the bandwidth in PE array according to the transferring demands of data, especially the data with high spatial reuse, to reduce the routing PEs. To cover bandwidth allocation, BandMap maps the data flow graphs (DFGs), abstracted from applications, by solving the maximum independent set (MIS) on a mixture of tuple and quadruple resource occupation conflict graph. Compared to a state-of-art BusMap work, Bandmap can achieve reduced routing PEs with the same or even smaller initiation interval (II)

    A digital twin to quantitatively understand aging mechanisms coupled effects of NMC battery using dynamic aging profiles

    Get PDF
    Traditional lithium-ion battery modeling does not provide sufficient information to accurately verify battery performance under real-time dynamic operating conditions, particularly when considering various aging modes and mechanisms. To improve the current methods, this paper proposes a lithium-ion battery digital twin that can capture real-time data and integrate the strong coupling between SEI layer growth, anode crack propagation, and lithium plating. It can be utilized to estimate aging behavior from macroscopic full-cell level to microscopic particle level, including voltage-current profiles in dynamic aging conditions, predict the degradation behavior of Nickel-Manganese-Cobalt-Oxide (NMC) based lithium-ion batteries, and assist in electrochemical analysis. This model can improve the root cause analysis of cell aging, enabling a quantitative understanding of aging mechanism coupled effects. Three charging protocols with dynamic discharging profiles are developed to simulate real vehicle operation scenarios and used to validate the digital twin, combining operando impedance measurements, post-mortem analysis, and SEM to further prove the conclusions. The digital twin can accurately predict battery capacity fade within 0.4% MAE. The results indicate that SEI layer growth is the primary contributor to capacity degradation and resistance increase. Based on the analysis of the model, it is concluded that one of the proposed multi-step charging protocols, in comparison to a standard continuous charging protocol, can reduce the degradation of NMC-based lithium-ion batteries. This paper represents a firm physical foundation for future physics-informed machine learning development

    Elastic scattering and total reaction cross sections of 6^{6}Li studied with a microscopic continuum discretized coupled channels model

    Full text link
    We present a systematic study of 6^{6}Li elastic scattering and total reaction cross sections at incident energies around the Coulomb barrier within the continuum discretized coupled-channels (CDCC) framework, where 6^{6}Li is treated in an α\alpha+dd two-body model. Collisions with 27^{27}Al, 64^{64}Zn, 138^{138}Ba and 208^{208}Pa are analyzed. The microscopic optical potentials (MOP) based on Skyrme nucleon-nucleon interaction for α\alpha and dd are adopted in CDCC calculations and satisfactory agreement with the experimental data is obtained without any adjustment on MOPs. For comparison, the α\alpha and dd global phenomenological optical potentials (GOP) are also used in CDCC analysis and a reduction no less than 50%\% on the surface imaginary part of deuteron GOP is required for describing the data. In all cases, the 6^6Li breakup effect is significant and provides repulsive correction to the folding model potential. The reduction on the GOP of deuteron reveals a strong suppression of the reaction probability of deuteron as a component of 6^{6}Li as compared with that of a free deuteron. A further investigation is made by taking the dd breakup process into account equivalently within the dynamic polarization potential approach and it shows that dd behaves like a tightly bound nucleus in 6^{6}Li induced reactions. We also compare the CDCC results with those calculated with a 6^6Li GOP and it shows that CDCC calculations provide a better reproduction for the elastic scattering angular distributions in the sub-barrier energy region and the total reaction cross sections at energies around the Coulomb barrier.Comment: 10 pages, 12 figure

    NDVI With Artificial Neural Networks For SRTM Elevation Model Improvement – Hydrological Model Application

    Full text link
    Digital elevation model (DEM) plays a substantial role in hydrological study, from understanding the catchment characteristics, setting up a hydrological model to mapping the flood risk in the region. Depending on the nature of study and its objectives, high resolution and reliable DEM is often desired to set up a sound hydrological model. However, such source of good DEM is not always available and it is generally high-priced. Obtained through radar based remote sensing, Shuttle Radar Topography Mission (SRTM) is a publicly available DEM with resolution of 92m outside US. It is a great source of DEM where no surveyed DEM is available. However, apart from the coarse resolution, SRTM suffers from inaccuracy especially on area with dense vegetation coverage due to the limitation of radar signals not penetrating through canopy. This will lead to the improper setup of the model as well as the erroneous mapping of flood risk. This paper attempts on improving SRTM dataset, using Normalised Difference Vegetation Index (NDVI), derived from Visible Red and Near Infra-Red band obtained from Landsat with resolution of 30m, and Artificial Neural Networks (ANN). The assessment of the improvement and the applicability of this method in hydrology would be highlighted and discussed

    The HD-GYP Domain Protein RpfG of Xanthomonas oryzae pv. oryzicola Regulates Synthesis of Extracellular Polysaccharides that Contribute to Biofilm Formation and Virulence on Rice

    Get PDF
    Bacterial leaf streak caused by Xanthomonas oryzae pv. oryzicola (Xoc) is one of the most important diseases in rice. However, little is known about the pathogenicity mechanisms of Xoc. Here we have investigated the function of three HD-GYP domain regulatory proteins in biofilm formation, the synthesis of virulence factors and virulence of Xoc. Deletion of rpfG resulted in altered production of extracellular polysaccharides (EPS), abolished virulence on rice and enhanced biofilm formation, but had little effect on the secretion of proteases and motility. In contrast, mutational analysis showed that the other two HD-GYP domain proteins had no effect on virulence factor synthesis and tested phenotypes. Mutation of rpfG led to up-regulation of the type III secretion system and altered expression of three putative glycosyltransferase genes gumD, pgaC and xagB, which are part of operons directing the synthesis of different extracellular polysaccharides. The pgaABCD and xagABCD operons were greatly up-regulated in the Xoc Delta rpfG mutant, whereas the expression of the gum genes was unaltered or slightly enhanced. The elevated biofilm formation of the Xoc Delta rpfG mutant was dramatically reduced upon deletion of gumD, xagA and xagB, but not when pgaA and pgaC were deleted. Interestingly, only the Delta gumD mutant, among these single gene mutants, exhibits multiple phenotype alterations including reduced biofilm and EPS production and attenuated virulence on rice. These data indicate that RpfG is a global regulator that controls biofilm formation, EPS production and bacterial virulence in Xoc and that both gumD- and xagB-dependent EPS contribute to biofilm formation under different conditions

    A KDM6 inhibitor potently induces ATF4 and its target gene expression through HRI activation and by UTX inhibition

    Get PDF
    UTX/KDM6A encodes a major histone H3 lysine 27 (H3K27) demethylase, and is frequently mutated in various types of human cancers. Although UTX appears to play a crucial role in oncogenesis, the mechanisms involved are still largely unknown. Here we show that a specific pharmacological inhibitor of H3K27 demethylases, GSK-J4, induces the expression of transcription activating factor 4 (ATF4) protein as well as the ATF4 target genes (e.g. PCK2, CHOP, REDD1, CHAC1 and TRIB3). ATF4 induction by GSK-J4 was due to neither transcriptional nor post-translational regulation. In support of this view, the ATF4 induction was almost exclusively dependent on the heme-regulated eIF2α kinase (HRI) in mouse embryonic fibroblasts (MEFs). Gene expression profiles with UTX disruption by CRISPR-Cas9 editing and the following stable re-expression of UTX showed that UTX specifically suppresses the expression of the ATF4 target genes, suggesting that UTX inhibition is at least partially responsible for the ATF4 induction. Apoptosis induction by GSK-J4 was partially and cell-type specifically correlated with the activation of ATF4-CHOP. These findings highlight that the anti-cancer drug candidate GSK-J4 strongly induces ATF4 and its target genes via HRI activation and raise a possibility that UTX might modulate cancer formation by regulating the HRI-ATF4 axis
    corecore