Search CORE

223 research outputs found

x86 instruction reordering for code compression

Author: Paroczi Zsombor
Publication venue
Publication date: 01/01/2012
Field of study

Runtime executable code compression is a method which uses standard data compression methods and binary machine code transformations to achieve smaller file size, yet maintaining the ability to execute the compressed file as a regular executable. With a disassembler, an almost perfect instructional and functional level disassembly can be generated. Using the structural information of the compiled machine code each function can be split into so called basic blocks. In this work we show that reordering instructions within basic blocks using data flow constraints can improve code compression without changing the behavior of the code. We use two kinds of data affection (read, write) and 20 data types including registers: 8 basic x86 registers, 11 eflags, and memory data. Due to the complexity of the reordering, some simplification is required. Our solution is to search local optimum of the compression on the function level and then combine the results to get a suboptimal global result. Using the reordering method better results can be achieved, namely the compression size gain for gzip can be as high as 1.24%, for lzma 0.68% on the tested executables

University of Szeged

Gen-acceleration: Pioneering work for hardware accelerator generation using large language models

Author: Vungarala Durga Lakshmi Venkata Deepak
Publication venue: Digital Commons @ NJIT
Publication date: 31/12/2023
Field of study

Optimizing computational power is critical in the age of data-intensive applications and Artificial Intelligence (AI)/Machine Learning (ML). While facing challenging bottlenecks, conventional Von-Neumann architecture with implementing such huge tasks looks seemingly impossible. Hardware Accelerators are critical in efficiently deploying these technologies and have been vastly explored in edge devices. This study explores a state-of-the-art hardware accelerator; Gemmini is studied; we leveraged the open-sourced tool. Furthermore, we developed a Hardware Accelerator in the study we compared with the Non-Von-Neumann architecture. Gemmini is renowned for efficient matrix multiplication, but configuring it for specific tasks requires manual effort and expertise. We propose implementing it by reducing manual intervention and domain expertise, making it easy to develop and deploy hardware accelerators that are time-consuming and need expertise in the field; by leveraging the Large Language Models (LLMs), they enable data-informed decision-making, enhancing performance. This work introduces an innovative method for hardware accelerator generation by undertaking the Gemmini to generate optimizing hardware accelerators for AI/ML applications and paving the way for automation and customization in the field

Digital Commons @ New Jersey Institute of Technology (NJIT)

Intelligent systems for efficiency and security

Author: Pothukuchi Raghavendra Pradyumna
Publication venue
Publication date: 01/08/2020
Field of study

As computing becomes ubiquitous and personalized, resources like energy, storage and time are becoming increasingly scarce and, at the same time, computing systems must deliver in multiple dimensions, such as high performance, quality of service, reliability, security and low power. Building such computers is hard, particularly when the operating environment is becoming more dynamic, and systems are becoming heterogeneous and distributed. Unfortunately, computers today manage resources with many ad hoc heuristics that are suboptimal, unsafe, and cannot be composed across the computer’s subsystems. Continuing this approach has severe consequences: underperforming systems, resource waste, information loss, and even life endangerment. This dissertation research develops computing systems which, through intelligent adaptation, deliver efficiency along multiple dimensions. The key idea is to manage computers with principled methods from formal control. It is with these methods that the multiple subsystems of a computer sense their environment and configure themselves to meet system-wide goals. To achieve the goal of intelligent systems, this dissertation makes a series of contributions, each building on the previous. First, it introduces the use of formal MIMO (Multiple Input Multiple Output) control for processors, to simultaneously optimize many goals like performance, power, and temperature. Second, it develops the Yukta control system, which uses coordinated formal controllers in different layers of the stack (hardware and operating system). Third, it uses robust control to develop a fast, globally coordinated and decentralized control framework called Tangram, for heterogeneous computers. Finally, it presents Maya, a defense against power side-channel attacks that uses formal control to reshape the power dissipated by a computer, confusing the attacker. The ideas in the dissertation have been demonstrated successfully with several prototypes, including one built along with AMD (Advanced Micro Devices, Inc.) engineers. These designs significantly outperformed the state of the art. The research in this dissertation brought formal control closer to computer architecture and has been well-received in both domains. It has the first application of full-fledged MIMO control for processors, the first use of robust control in computer systems, and the first application of formal control for side-channel defense. It makes a significant stride towards intelligent systems that are efficient, secure and reliable

Illinois Digital Environment for Access to Learning and Scholarship Repository

Edge Computing for Extreme Reliability and Scalability

Author: Scotece Domenico <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 03/04/2020
Field of study

The massive number of Internet of Things (IoT) devices and their continuous data collection will lead to a rapid increase in the scale of collected data. Processing all these collected data at the central cloud server is inefficient, and even is unfeasible or unnecessary. Hence, the task of processing the data is pushed to the network edges introducing the concept of Edge Computing. Processing the information closer to the source of data (e.g., on gateways and on edge micro-servers) not only reduces the huge workload of central cloud, also decreases the latency for real-time applications by avoiding the unreliable and unpredictable network latency to communicate with the central cloud

AMS Tesi di Dottorato

Energy Transparency for Deeply Embedded Programs

Author: Chamski Zbigniew
Eder Kerstin
Georgiou Kyriakos
Kerrison Steven
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2017
Field of study

Energy transparency is a concept that makes a program's energy consumption visible, from hardware up to software, through the different system layers. Such transparency can enable energy optimizations at each layer and between layers, and help both programmers and operating systems make energy-aware decisions. In this paper, we focus on deeply embedded devices, typically used for Internet of Things (IoT) applications, and demonstrate how to enable energy transparency through existing Static Resource Analysis (SRA) techniques and a new target-agnostic profiling technique, without hardware energy measurements. Our novel mapping technique enables software energy consumption estimations at a higher level than the Instruction Set Architecture (ISA), namely the LLVM Intermediate Representation (IR) level, and therefore introduces energy transparency directly to the LLVM optimizer. We apply our energy estimation techniques to a comprehensive set of benchmarks, including single- and also multi-threaded embedded programs from two commonly used concurrency patterns, task farms and pipelines. Using SRA, our LLVM IR results demonstrate a high accuracy with a deviation in the range of 1% from the ISA SRA. Our profiling technique captures the actual energy consumption at the LLVM IR level with an average error of 3%.Comment: 33 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1510.0709

arXiv.org e-Print Archive

Explore Bristol Research