720 research outputs found
FairGV: Fair and Fast GPU Virtualization
Increasingly high-performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric co-scheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (? 0.97 Min-Max Ratio) with little performance degradation (? 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs
ScissionLite: accelerating distributed deep learning with lightweight data compression for IIoT
Funding: This work was supported in part by the Electronics and Telecommunications Research Institute through the Korean government under Grant 23zs1300 (Research on High Performance Computing Technology to overcome limitations of AI processing) and in part by the Korea Institute for Advancement of Technology (KIAT) through the Korea Government (MOTIE) under Grant P0017011 (HRD Program for Industrial Innovation). Paper no. TII-23-4829.Industrial Internet of Things (IIoT) applications can greatly benefit from leveraging edge computing. For instance, applications relying on deep neural network (DNN) models can be sliced and distributed across IIoT devices and the network edge to reduce inference latency. However, low network performance between IIoT devices and the edge often becomes a bottleneck. In this study, we propose ScissionLite, a holistic framework designed to accelerate distributed DNN inference using lightweight data compression. Our compression method features a novel lightweight down/upsampling network tailored for performance-limited IIoT devices, which is inserted at the slicing point of a DNN model to reduce outbound network traffic without causing a significant drop in accuracy. In addition, we have developed a benchmarking tool to accurately identify the optimal slicing point of the DNN for the best inference latency. ScissionLite improves inference latency by up to 15.7× with minimal accuracy degradation.Peer reviewe
DETC2005-84974 STRESS ANALYSIS AND LIFE ASSESSMENT OF ROTOR AND RETAINING RING OF GENERATOR FOR FOSSIL POWER PLANT
Increased rating of the generator capacity can be achieved by either increasing length or diameter of generator rotor body. Increasing the length of the rotor diameter should ensure the dynamic stability. On the other hand, increasing rotor diameter should satisfy the strength limit of current rotor material. ABSTRACT In addition to the higher centrifugal forces during normal operation in 3600 rpm, a generator rotor body is subjected to the contact pressures from shrink-fit between generator rotor and retaining ring. To obtain the structural reliability and life assessment of the generator, the finite element models were developed and structural analyses were carried out. The stress distributions and the critical locations of the rotor body were identified. Further, the fatigue life is performed to estimate the remaining life of generator. The critical crack size and probability of failure are also evaluated based on the analysis results. The critical sizes of a crack of generator are predicted using linear elastic fracture mechanics. These results will be applied to the development of a larger 1000MW capacity generator. This paper presents both stress analysis and life assessment results of the new 1000MW generator rotor assembly. The baseline design of the 800MW generator rotor was also evaluated for verifying the reliability of the analysis results. Two load cases, the contact pressures from shrink-fit between rotor and retaining ring and the centrifugal forces during normal operation in 3600 rpm, were considered. To obtain the structural reliability and life assessment of the generator, the finite element models were developed and structural analyses were carried out. The stress distributions and the critical locations of the rotor body were identified. Further, the fatigue life is performed to estimate the remaining life of generator. The critical crack size and probability of failure are also evaluated based on the analysis results INTRODUCTION In rapid technology advancement of the fossil power plant, it is inevitable that the output of a given turbine generator frame size will be increased from time to time. This has required redesign of the generator to keep pace with the increased rating. For turbine generators, increased rating presents challenges for designer. The designers to ensure that the new design can be satisfied the performance capabilities and electrical rating requirements, while maintaining mechanical, thermal and magnetic limits. These challenges come out largely as a result of increasing stresses, vibrational instability, fatigue and stress corrosion crack. To obtain the structural reliability and life assessment of the new generator, stress analyses, fatigue life assessment, and critical crack evaluation are required and the finite element analysis for the generator rotor assembly is used for this purpose
Heterogeneous Secure Multi-level Remote Acceleration Service for Low-Power Integrated Systems and Devices
AbstractThis position paper presents a novel heterogeneous CPU-GPU multi-level cloud acceleration focusing on applications running on embedded systems found on low-power devices. A runtime system performs energy and performance estimations in order to automatically select local CPU-based and GPU-based tasks that should be seamlessly executed on more powerful remote devices or cloud infrastructures. Moreover, it proposes, for the first time, a secure unified model where almost any device or infrastructure can operate as an accelerated entity and/or as an accelerator serving other less powerful devices in a secure way
On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework
The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations, computing clusters and distributed cloud appliances
Revisiting the arguments for edge computing research
The first author is supported by a Royal Society Short Industry Fellowship.This article argues that low latency, high bandwidth, device proliferation, sustainable digital infrastructure, and data privacy and sovereignty continue to motivate the need for edge computing research even though its initial concepts were formulated more than a decade ago.PostprintPeer reviewe
Subtle cytotoxicity and genotoxicity differences in superparamagnetic iron oxide nanoparticles coated with various functional groups
Superparamagnetic iron oxide nanoparticles (SPIONs) have been widely utilized for the diagnosis and therapy of specific diseases, as magnetic resonance imaging (MRI) contrast agents and drug-delivery carriers, due to their easy transportation to targeted areas by an external magnetic field. For such biomedical applications, SPIONs must have multifunctional characteristics, including optimized size and modified surface. However, the biofunctionality and biocompatibility of SPIONs with various surface functional groups of different sizes have yet to be elucidated clearly. Therefore, it is important to carefully monitor the cytotoxicity and genotoxicity of SPIONs that are surfaced-modified with various functional groups of different sizes. In this study, we evaluated SPIONs with diameters of approximately 10 nm and 100~150 nm, containing different surface functional groups. SPIONs were covered with −O− groups, so-called bare SPIONs. Following this, they were modified with three different functional groups – hydroxyl (−OH), carboxylic (−COOH), and amine (−NH2) groups – by coating their surfaces with tetraethyl orthosilicate (TEOS), (3-aminopropyl)trimethoxysilane (APTMS), TEOS-APTMS, or citrate, which imparted different surface charges and sizes to the particles. The effects of SPIONs coated with these functional groups on mitochondrial activity, intracellular accumulation of reactive oxygen species, membrane integrity, and DNA stability in L-929 fibroblasts were determined by water-soluble tetrazolium, 2′,7′-dichlorodihydrofluorescein, lactate dehydrogenase, and comet assays, respectively. Our toxicological observations suggest that the functional groups and sizes of SPIONs are critical determinants of cellular responses, degrees of cytotoxicity and genotoxicity, and potential mechanisms of toxicity. Nanoparticles with various surface modifications and of different sizes induced slight, but possibly meaningful, changes in cell cytotoxicity and genotoxicity, which would be significantly valuable in further studies of bioconjugation and cell interaction for drug delivery, cell culture, and cancer-targeting applications
Loss of primary cilia promotes mitochondria-dependent apoptosis in thyroid cancer
The primary cilium is well-preserved in human differentiated thyroid cancers such as papillary and follicular carcinoma. Specific thyroid cancers such as Hurthle cell carcinoma, oncocytic variant of papillary thyroid carcinoma (PTC), and PTC with Hashimoto's thyroiditis show reduced biogenesis of primary cilia; these cancers are often associated the abnormalities in mitochondrial function. Here, we examined the association between primary cilia and the mitochondria-dependent apoptosis pathway. Tg-Cre;Ift88(flox/flox) mice (in which thyroid follicles lacked primary cilia) showed irregularly dilated follicles and increased apoptosis of thyrocytes. Defective ciliogenesis caused by deleting the IFT88 and KIF3A genes from thyroid cancer cell lines increased VDAC1 oligomerization following VDAC1 overexpression, thereby facilitating upregulation of mitochondria-dependent apoptosis. Furthermore, VDAC1 localized with the basal bodies of primary cilia in thyroid cancer cells. These results demonstrate that loss-of-function of primary cilia results in apoptogenic stimuli, which are responsible for mitochondrial-dependent apoptotic cell death in differentiated thyroid cancers. Therefore, regulating primary ciliogenesis might be a therapeutic approach to targeting differentiated thyroid cancers
- …