Search CORE

47 research outputs found

Computational and biological studies of mechanical prophylaxis against deep venous thrombosis

Author: Dai Guohao, 1970-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2001
Field of study

Thesis (Ph. D.)--Harvard--Massachusetts Institute of Technology Division of Health Sciences and Technology, 2001.Includes bibliographical references (p. 137-151).Deep vein thrombosis (DVT) of the lower extremity and induced pulmonary embolism are common complications resulting from prolonged periods of bed-rest or immobilization of the limbs. One of the most effective methods of prophylaxis against DVT is external pneumatic compression (EPC). In spite of its wide acceptance as an effective means of prophylaxis, its mechanism remains poorly understood and optimal compression conditions have not been defined. Understanding the biological consequences of EPC is an important goal for optimizing the performance of compression device and providing guidance for clinical use. In the first part of this thesis, a computational model of the leg was developed to simulate hemodynamic conditions under EPC and the influence of different modes of compression were analyzed and compared. Then, a new in vitro cell culture system was developed that can be used to examine the effect of hemodynamic conditions during EPC on endothelial cell (EC) function. The biologic response was assessed through changes in cell morphology and the expression of various pro-thrombotic and anti-thrombotic factors related to EC.(cont.) The results show that intermittent flow associated with EPC up-regulates EC fibrinolytic potential and vasomotor function. Using DNA microarray technology, the data of thrombo-regulatory factors indicates that EC gene expression shifts toward anti-thrombotic vs. pro-thrombotic under EPC. Finally, Nitric Oxide (NO), an important regulator of vasomotor and platelet functions was studied in detail under various cycles of EPC. The results show that NO production and eNOS mRNA respond differentially to modes of EPC. Further exploration using the system can potentially reveal the optimum combination of forces to better regulate thromboresistant effects desired for DVT prophylaxis.by Guohao Dai.Ph.D

DSpace@MIT

BARS: Towards Open Benchmarking for Recommender Systems

Author: Cai Guohao
Dai Quanyu
Liu Jinyang
Ma Rong
Su Liangcai
Xiao Xi
Zhang Rui
Zhu Jieming
Publication venue
Publication date: 17/07/2022
Field of study

The past two decades have witnessed the rapid development of personalized recommendation techniques. Despite significant progress made in both research and practice of recommender systems, to date, there is a lack of a widely-recognized benchmarking standard in this field. Many existing studies perform model evaluations and comparisons in an ad-hoc manner, for example, by employing their own private data splits or using different experimental settings. Such conventions not only increase the difficulty in reproducing existing studies, but also lead to inconsistent experimental results among them. This largely limits the credibility and practical value of research results in this field. To tackle these issues, we present an initiative project (namely BARS) aiming for open benchmarking for recommender systems. In comparison to some earlier attempts towards this goal, we take a further step by setting up a standardized benchmarking pipeline for reproducible research, which integrates all the details about datasets, source code, hyper-parameter settings, running logs, and evaluation results. The benchmark is designed with comprehensiveness and sustainability in mind. It covers both matching and ranking tasks, and also enables researchers to easily follow and contribute to the research in this field. This project will not only reduce the redundant efforts of researchers to re-implement or re-run existing baselines, but also drive more solid and reproducible research on recommender systems. We would like to call upon everyone to use the BARS benchmark for future evaluation, and contribute to the project through the portal at: https://openbenchmark.github.io/BARS.Comment: Accepted by SIGIR 2022. Note that version v5 is updated to keep consistency with the ACM camera-ready versio

arXiv.org e-Print Archive

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Author: Dai Guohao
Han Song
Hong Ke
Li Xiuyu
Liu Zhijian
Tang Haotian
Wang Yu
Yang Shang
Yu Zhongming
Publication venue
Publication date: 25/10/2023
Field of study

Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g.implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6x faster inference speed compared with state-of-the-art graph deep learning libraries.Comment: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this projec

arXiv.org e-Print Archive

FlashDecoding++: Faster Large Language Model Inference on GPUs

Author: Chen Kangdi
Dai Guohao
Dong Yuhan
Hong Ke
Li Xiuhong
Liu Jun
Mao Qiuli
Wang Yu
Xu Jiaming
Publication venue
Publication date: 05/01/2024
Field of study

As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to ~20% overheads for the attention computation in LLMs. (2) Under-utilized computation of flat GEMM. The shape of matrices performing GEMM in LLM inference is flat, leading to under-utilized computation and >50% performance loss after padding zeros in previous designs. (3) Performance loss due to static dataflow. Kernel performance in LLM depends on varied input data features, hardware configurations, etc. A single and static dataflow may lead to a 50.25% performance loss for GEMMs of different shapes in LLM inference. We present FlashDecoding++, a fast LLM inference engine supporting mainstream LLMs and hardware back-ends. To tackle the above challenges, FlashDecoding++ creatively proposes: (1) Asynchronized softmax with unified max value. FlashDecoding++ introduces a unified max value technique for different partial softmax computations to avoid synchronization. (2) Flat GEMM optimization with double buffering. FlashDecoding++ points out that flat GEMMs with different shapes face varied bottlenecks. Then, techniques like double buffering are introduced. (3) Heuristic dataflow with hardware resource adaptation. FlashDecoding++ heuristically optimizes dataflow using different hardware resource considering input dynamics. Due to the versatility of optimizations in FlashDecoding++, FlashDecoding++ can achieve up to 4.86x and 2.18x speedup on both NVIDIA and AMD GPUs compared to Hugging Face implementations. FlashDecoding++ also achieves an average speedup of 1.37x compared to state-of-the-art LLM inference engines on mainstream LLMs

arXiv.org e-Print Archive

Direct cell reprogramming for tissue engineering and regenerative medicine

Author: Alexander Grath
Guohao Dai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

Abstract Direct cell reprogramming, also called transdifferentiation, allows for the reprogramming of one somatic cell type directly into another, without the need to transition through an induced pluripotent state. Thus, it is an attractive approach to develop novel tissue engineering applications to treat diseases and injuries where there is a shortage of proliferating cells for tissue repair. In certain tissue damage, terminally differentiated somatic cells lose their ability to proliferate, as a result, damaged tissues cannot heal by themselves. Examples of these scenarios include myocardial infarctions, neurodegenerative diseases, and cartilage injuries. Transdifferentiation is capable of reprogramming cells that are abundant in the body into desired cell phenotypes that are able to restore tissue function in damaged areas. Therefore, direct cell reprogramming is a promising direction in the cell and tissue engineering and regenerative medicine fields. In recent years, several methods for transdifferentiation have been developed, ranging from the overexpression of transcription factors via viral vectors, to small molecules, to clustered regularly interspaced short palindromic repeats (CRISPR) and its associated protein (Cas9) for both genetic and epigenetic reprogramming. Overexpressing transcription factors by use of a lentivirus is currently the most prevalent technique, however it lacks high reprogramming efficiencies and can pose problems when transitioning to human subjects and clinical trials. CRISPR/Cas9, fused with proteins that modulate transcription, has been shown to improve efficiencies greatly. Transdifferentiation has successfully generated many cell phenotypes, including endothelial cells, skeletal myocytes, neuronal cells, and more. These cells have been shown to emulate mature adult cells such that they are able to mimic major functions, and some are capable of promoting regeneration of damaged tissue in vivo. While transdifferentiated cells have not yet seen clinical use, they have had promise in mice models, showing success in treating liver disease and several brain-related diseases, while also being utilized as a cell source for tissue engineered vascular grafts to treat damaged blood vessels. Recently, localized transdifferentiated cells have been generated in situ, allowing for treatments without invasive surgeries and more complete transdifferentiation. In this review, we summarized the recent development in various cell reprogramming techniques, their applications in converting various somatic cells, their uses in tissue regeneration, and the challenges of transitioning to a clinical setting, accompanied with potential solutions

Directory of Open Access Journals