17 research outputs found

    RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

    Full text link
    The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second-order information (SOI) matrix to find a more accurate descent direction and step size. However, the huge SOI matrices bring significant computational and memory overheads in the traditional architectures like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM) technology is suitable for the second-order training because of the following three reasons: First, PIM's computation happens in memory, which reduces data movement overheads; Second, ReRAM crossbars can compute SOI's inversion in O(1)O\left(1\right) time; Third, if architected properly, ReRAM crossbars can perform matrix inversion and vector-matrix multiplications which are important to the second-order training algorithms. Nevertheless, current ReRAM-based PIM techniques still face a key challenge for accelerating the second-order training. The existing ReRAM-based matrix inversion circuitry can only support 8-bit accuracy matrix inversion and the computational precision is not sufficient for the second-order training that needs at least 16-bit accurate matrix inversion. In this work, we propose a method to achieve high-precision matrix inversion based on a proven 8-bit matrix inversion (INV) circuitry and vector-matrix multiplication (VMM) circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture for the second-order training. Moreover, we propose a software mapping scheme for \archname{} to further optimize the performance by fusing VMM and INV crossbar. Experiment shows that \archname{} can achieve an average of 115.8×\times/11.4×\times speedup and 41.9×\times/12.8×\timesenergy saving compared to a GPU counterpart and PipeLayer on large-scale DNNs.Comment: 13pages, 13 figure

    A Hierarchical Scrubbing Technique for SEU Mitigation on SRAM-Based FPGAs

    No full text

    Association between remnant cholesterol and arterial stiffness: A secondary analysis based on a cross‐sectional study

    No full text
    Abstract The relationship between conventional lipid parameters and arterial stiffness (AS) has been verified by previous studies. However, it remains unknown whether non‐conventional lipid parameters have certain predictive effect on AS represented by brachial‐ankle pulse wave velocity (baPWV). Therefore, the study was to explore the relationship between remnant cholesterol (RC) and other non‐conventional lipid parameters and AS in the general population free from cardiovascular disease. The study included 912 participants aged 24–84 years from a medical health checkup center of Murakami Memorial Hospital. Logistic regression analysis and receiver operating characteristic (ROC) curves were used to examine the association between non‐conventional lipid parameters and AS. The results showed that compared with non‐AS group, the AS group had higher RC, non‐high‐density lipoprotein cholesterol (Non‐HDL‐C), atherogenic index of plasma (AIP), lipoprotein combine index (LCI), atherosclerosis index (AI), triglycerides/HDL‐C (TG/HDL‐C), Castelli's risk index I (CRI‐I) and Castelli's risk index II (CRI‐II). Then, the authors divided participants into two groups by the optimal cutoff point of 23.6 of RC determined by Youden index. The baPWV was significantly higher in higher RC group compared with lower RC group, and RC was positively correlated with baPWV. Multivariate Logistic regression analysis showed that, regarding lower RC as reference, higher RC was independently associated with higher risk of AS, independent of other risk factors (OR = 1.794, 95% CI: 1.267‐2.539, p = .001). The area under the curve of AS predicted by RC was higher than that of other non‐conventional lipid parameters (almost all p < .05). The findings indicated that increased RC was a significant predictor of AS

    A high-throughput tumor location system with deep learning for colorectal cancer histopathology image

    No full text
    Colorectal cancer is one of the major causes of morbidity and mortality worldwide, however, when discovered at an early stage, it is highly treatable. As the number of specimens increases every year, there has been a boost in the diagnostic workload on pathologists in recent years. In parallel to the development of digital pathology, deep learning has demonstrated its strong capability in feature extraction and interpretation in a variety of medical applications. In this paper, we propose a high-throughput whole-slide image (WSI) analysis system to localize tumor regions accurately with a patch-based convolutional neural network (CNN). We employ Monte Carlo adaptive sampling for a fast detection of tumors at slide level and a conditional random field (CRF) model to integrate spatial correlation for better classification accuracy. We use three datasets of colorectal cancer from The Cancer Genome Atlas (TCGA) for performance evaluation. Compared with the regular WSI analysis, the experimental benchmark shows an obvious decrease in processing time while a noticeable improvement in classification accuracy

    Priority Branches for Ship Detection in Optical Remote Sensing Images

    No full text
    Much attention is being paid to using high-performance convolutional neural networks (CNNs) in the area of ship detection in optical remoting sensing (ORS) images. However, the problem of false negatives (FNs) caused by side-by-side ships cannot be solved, and the number of false positives (FPs) remains high. This paper uses a DLA-34 network with deformable convolution layers as the backbone. The network has two priority branches: a recall-priority branch for reducing the number of FNs, and a precision-priority branch for reducing the number of FPs. In our single-shot detection method, the recall-priority branch is based on an anchor-free module without non-maximum suppression (NMS), while the precision-priority branch utilizes an anchor-based module with NMS. We perform recall-priority branch functions based on the output part of the CenterNet object detector to precisely predict center points of bounding boxes. The Bidirectional Feature Pyramid Network (BiFPN), combined with the inference part of YOLOv3, is used to improve the precision of precision-priority branch. Finally, the boxes from two branches merge, and we propose priority-based selection (PBS) for choosing the accurate ones. Results show that our proposed method sharply improves the recall rate of side-by-side ships and significantly reduces the number of false alarms. Our method also achieves the best trade-off on our improved version of HRSC2016 dataset, with 95.57% AP at 56 frames per second on an Nvidia RTX-2080 Ti GPU. Compared with the HRSC2016 dataset, not only are our annotations more accurate, but our dataset also contains more images and samples. Our evaluation metrics also included tests on small ships and incomplete forms of ships

    IBOM: An Integrated and Balanced On-Chip Memory for High Performance GPGPUs

    No full text

    Timing-driven placement for carbon nanotube circuits

    No full text
    © 2015 IEEE. Carbon nanotube field effect transistors (CNFETs), which use carbon nanotubes (CNTs) as the transistor channel, are promising substitution of conventional CMOS technology. However, due to the stochastic assembly process of CNTs, the number of CNTs in each CNFET has a large variation, resulting in a vast circuit delay variation and timing yield degradation. To overcome it, we propose a timing-driven placement method for CNFET circuits. It exploits a unique feature of CNFET circuits, namely, asymmetric spatial correlation: CNFETs that lie along the CNT growth direction are highly correlated in terms of their electrical properties. Our method distributes CNFETs of the same critical paths to different rows perpendicular to the CNT growth direction during both global and detailed placement phases, while optimizing the timing of these critical paths. Experimental results demonstrated that our approach reduces both the mean and the variance of circuit delay, leading to an improvement in timing yield

    CNFET-Based High Throughput SIMD Architecture

    No full text
    corecore