Search CORE

30 research outputs found

Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

Author: Crago Stephen P.
Kang Dong-In
Kang Mikyung
Lee Junghoon
Park Gyung-Leen
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/03/2011
Field of study

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments

Author: Beerel Peter A.
Crago Stephen P.
Imes Connor
Kundu Souvik
Walters John Paul
Wang Haonan
Publication venue
Publication date: 08/11/2022
Field of study

Pipeline parallelism has achieved great success in deploying large-scale transformer models in cloud environments, but has received less attention in edge environments. Unlike in cloud scenarios with high-speed and stable network interconnects, dynamic bandwidth in edge systems can degrade distributed pipeline performance. We address this issue with QuantPipe, a communication-efficient distributed edge system that introduces post-training quantization (PTQ) to compress the communicated tensors. QuantPipe uses adaptive PTQ to change bitwidths in response to bandwidth dynamics, maintaining transformer pipeline performance while incurring limited inference accuracy loss. We further improve the accuracy with a directed-search analytical clipping for integer quantization method (DS-ACIQ), which bridges the gap between estimated and real data distributions. Experimental results show that QuantPipe adapts to dynamic bandwidth to maintain pipeline performance while achieving a practical model accuracy using a wide range of quantization bitwidths, e.g., improving accuracy under 2-bit quantization by 15.85\% on ImageNet compared to naive quantization

arXiv.org e-Print Archive

A phase 1, open-label, dose-escalation study of BIIB022 (anti-IGF-1R monoclonal antibody) in subjects with relapsed or refractory solid tumors

Author: A Doern
AB Hassan
AM Crago
Artemios Vassos
AS Pappo
AW Tolcher
Carolyn D. Britten
F Atzori
Gerald R. Galluppi
H Young
JP Peyrat
L Zhang
Margaret von Mehren
MN Pollak
Mohamed Darif
O Larsson
P Bono
Peter Pieslor
R Baserga
R Kurzrock
Roger B. Cohen
Sarah Harris
SE DePrimo
Stephen Leong
Wayne Saville
WD Tap
YM Chong
Zev A. Wainberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Opportunities for Concurrent Dynamic Analysis with Explicit Inter-core Communication

Author: Jungwoo Ha
Stephen P. Crago
Publication venue
Publication date: 01/01/2010
Field of study

Multicore is now the dominant processor trend, and the number of cores is rapidly increasing. The paradigm shift to multicore forces the redesign of the software stack, which includes dynamic analysis. Dynamic analyses provide rich features to software in various areas, such as debugging, testing, optimization, and security. However, these techniques often suffer from excessive overhead, which make it less practical. Previously, this overhead has been overcome by improved processor performance as each generation gets faster, but the performance requirements of dynamic analyses in the multicore era cannot be fulfilled without redesigning for parallelism. Scalable design of dynamic analysis is a challenging problem. Not only must the analysis itself must be parallel, but the analysis must also be decoupled from the application and run concurrently. A typical method of decoupling the analysis from the application is to send the analysis data from the application to the core that runs the analysis thread via buffering. However, buffering can perturb application cache performance, and the cache coherence protocol may not be efficient, or even implemented, with large numbers of cores in the future. This paper presents our initial effort to explore the hardware design space and software approach that will alleviate the scalability problem for dynamic analysis on multicore. We choose to make use of explicit inter-core communication that is already available in a real processor, the TILE64 processor and evaluate the opportunity for scalable dynamic analyses. We provide our model and implement concurrent call graph profiling as a case study. Our evaluation shows that pure communication overhead from the application point of view is as low as 1%. We expect that our work will help design scalable dynamic analyses and will influence the design of future many-core processors

CiteSeerX

Crossref

PIM- and Stream Processor-based Processing for Radar Signal Applications

Author: Jinwoo Suh
Stephen P. Crago
Publication venue
Publication date
Field of study

The growing gap in performance between processor and memory speeds has created a problem for data-intensive applications. Recent approaches for solving this problem are processor-inmemory (PIM) technology and stream processor technology. In this paper, we assess the performance of systems based on PIM and stream processors by implementing data-intensive applications. The implementation results are compared with the measured performance of conventional systems based on the PowerPC and Pentium processors. The results show that the performance of systems based on these processors is improved up to 70 compared with conventional systems for these data-intensive applications

CiteSeerX

Dynamic power management of multiprocessor systems

Author: Dong-in Kang
Jinwoo Suh
Stephen P. Crago
Publication venue
Publication date: 01/01/2002
Field of study

Power management is critical to power-constrained real-time systems. In this paper, we present a dynamic power management algorithm. Unlike other approaches that focus on the tradeoff between power and performance, our algorithm maximizes the power utilization and performance. Our algorithm considers the dynamic nature of the environment such as changes on the available energy and adapts system parameters such as the operating voltage, frequency, and the number of processors. In our algorithm, we divide the power management problem into three sub-problems: initial power allocation, system parameter computation based on the allocated power, and dynamic update of the power and system parameters at run time. Initial power allocation minimizes wasted energy by using extra energy for useful work. It also avoids the undersupplied power situation by reducing power usage before such situations happen. The system parameters are computed to maximize the performance for a given power. During runtime, the system parameters are updated continuously to accommodate differences between the expected and real situations. The simulation results of the algorithm for a satellite system using eight Processor-In-Memory (PIM) processors are presented. 1

CiteSeerX