Search CORE

67 research outputs found

EXPLORATION ON THE INFILTRATION PATH OF MENTAL HEALTH EDUCATION IN COLLEGE PHYSICAL EDUCATION

Author: Cui Zhihao
Wang Zhiang
Publication venue
Publication date: 01/01/2022
Field of study

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

EXPLORATION ON THE INFILTRATION PATH OF MENTAL HEALTH EDUCATION IN COLLEGE PHYSICAL EDUCATION

Author: Cui Zhihao
Wang Zhiang
Publication venue
Publication date: 01/01/2022
Field of study

Hrčak - Portal of scientific journals of Croatia

Hier-RTLMP: A Hierarchical Automatic Macro Placer for Large-scale Complex IP Blocks

Author: Kahng Andrew B.
Varadarajan Ravi
Wang Zhiang
Publication venue
Publication date: 03/12/2023
Field of study

In a typical RTL to GDSII flow, floorplanning or macro placement is a critical step in achieving decent quality of results (QoR). Moreover, in today's physical synthesis flows (e.g., Synopsys Fusion Compiler or Cadence Genus iSpatial), a floorplan .def with macro and IO pin placements is typically needed as an input to the front-end physical synthesis. Recently, with the increasing complexity of IP blocks, and in particular with auto-generated RTL for machine learning (ML) accelerators, the number of hard macros in a single RTL block can easily run into the several hundreds. This makes the task of generating an automatic floorplan (.def) with IO pin and macro placements for front-end physical synthesis even more critical and challenging. The so-called peripheral approach of forcing macros to the periphery of the layout is no longer viable when the ratio of the sum of the macro perimeters to the floorplan perimeter is large, since this increases the required stacking depth of macros. In this paper, we develop a novel multilevel physical planning approach that exploits the hierarchy and dataflow inherent in the design RTL, and describe its realization in a new hierarchical macro placer, Hier-RTLMP. Hier-RTLMP borrows from traditional approaches used in manual system-on-chip (SoC) floorplanning to create an automatic macro placement for use with large IP blocks containing very large numbers of hard macros. Empirical studies demonstrate substantial improvements over the previous RTL-MP macro placement approach, and promising post-route improvements relative to a leading commercial place-and-route tool

arXiv.org e-Print Archive

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

Author: Deng Weiwei
Salamatian Kavé
Wang Dong
Xia Yunqing
Zhiang Qi
Publication venue
Publication date: 17/08/2023
Field of study

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens. To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction

arXiv.org e-Print Archive

Assessment of Reinforcement Learning for Macro Placement

Author: Cheng Chung-Kuan
Kahng Andrew B.
Kundu Sayak
Wang Yucheng
Wang Zhiang
Publication venue
Publication date: 27/03/2023
Field of study

We provide open, transparent implementation and assessment of Google Brain's deep reinforcement learning approach to macro placement and its Circuit Training (CT) implementation in GitHub. We implement in open source key "blackbox" elements of CT, and clarify discrepancies between CT and Nature paper. New testcases on open enablements are developed and released. We assess CT alongside multiple alternative macro placers, with all evaluation flows and related scripts public in GitHub. Our experiments also encompass academic mixed-size placement benchmarks, as well as ablation and stability studies. We comment on the impact of Nature and CT, as well as directions for future research.Comment: There are eight pages and one page for reference. It includes five figures and seven tables. This paper has been invited to ISPD 202

arXiv.org e-Print Archive

Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations

Author: Esmaeilzadeh Hadi
Ghodrati Soroush
Kahng Andrew B.
Kinzer Sean
Manasi Susmita Dey
Sapatnekar Sachin S.
Wang Zhiang
Publication venue
Publication date: 29/06/2023
Field of study

Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64X64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference

arXiv.org e-Print Archive