7 research outputs found
User experience driven CPU frequency scaling on mobile devices towards better energy efficiency
With the development of modern smartphones, mobile devices have become ubiquitous
in our daily lives. With high processing capabilities and a vast number of applications,
users now need them for both business and personal tasks. Unfortunately, battery technology
did not scale with the same speed as computational power. Hence, modern
smartphone batteries often last for less than a day before they need to be recharged.
One of the most power hungry components is the central processing unit (CPU). Multiple
techniques are applied to reduce CPU energy consumption. Among them is dynamic
voltage and frequency scaling (DVFS). This technique reduces energy consumption
by dynamically changing CPU supply voltage depending on the currently running
workload. Reducing voltage, however, also makes it necessary to reduce the clock
frequency, which can have a significant impact on task performance. Current DVFS
algorithms deliver a good user experience, however, as experiments conducted later in
this thesis will show, they do not deliver an optimal energy efficiency for an interactive
mobile workload. This thesis presents methods and tools to determine where energy
can be saved during mobile workload execution when using DVFS. Furthermore, an
improved DVFS technique is developed that achieves a higher energy efficiency than
the current standard.
One important question when developing a DVFS technique is: How much can you
slow down a task to save energy before the negative effect on performance becomes
intolerable? The ultimate goal when optimising a mobile system is to provide a high
quality of experience (QOE) to the end user. In that context, task slowdowns become
intolerable when they have a perceptible effect on QOE. Experiments conducted in
this thesis answer this question by identifying workload periods in which performance
changes are directly perceptible by the end user and periods where they are imperceptible,
namely interaction lags and interaction idle periods. Interaction lags are the time
it takes the system to process a user interaction and display a corresponding response.
Idle periods are the periods between interactions where the user perceives the system
as idle and ready for the next input. By knowing where those periods are and how
they are affected by frequency changes, a more energy efficient DVFS governor can be
developed.
This thesis begins by introducing a methodology that measures the duration of interaction
lags as perceived by the user. It uses them as an indicator to benchmark the
quality of experience for a workload execution. A representative benchmark workload
is generated comprising 190 minutes of interactions collected from real users. In conjunction
with this QOE benchmark, a DVFS Oracle study is conducted. It is able to
find a frequency profile for an interactive mobile workload which has the maximum
energy savings achievable without a perceptible performance impact on the user. The
developed Oracle performance profile achieves a QOE which is indistinguishable from
always running on the fastest frequency while needing 45% less energy. Furthermore,
this Oracle is used as a baseline to evaluate how well current mobile frequency governors
are performing. It shows that none of these governors perform particularly well
and up to 32% energy savings are possible. Equipped with a benchmark and an optimisation
baseline, a user perception aware DVFS technique is developed in the second
part of this thesis. Initially, a runtime heuristic is introduced which is able to detect
interaction lags as the user would perceive them. Using this heuristic, a reinforcement
learning driven governor is developed which is able to learn good frequency settings
for interaction lag and idle periods based on sample observations. It consumes up to
22% less energy than current standard governors on mobile devices, and maintains a
low impact on QOE
Large Language Models for Compiler Optimization
We explore the novel application of Large Language Models to code
optimization. We present a 7B-parameter transformer model trained from scratch
to optimize LLVM assembly for code size. The model takes as input unoptimized
assembly and outputs a list of compiler options to best optimize the program.
Crucially, during training, we ask the model to predict the instruction counts
before and after optimization, and the optimized code itself. These auxiliary
learning tasks significantly improve the optimization performance of the model
and improve the model's depth of understanding.
We evaluate on a large suite of test programs. Our approach achieves a 3.0%
improvement in reducing instruction counts over the compiler, outperforming two
state-of-the-art baselines that require thousands of compilations. Furthermore,
the model shows surprisingly strong code reasoning abilities, generating
compilable code 91% of the time and perfectly emulating the output of the
compiler 70% of the time
A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation
Abstract In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art instruction set simulators (Iss) for single-core machines reach or exceed the performance levels of speed-optimised silicon implementations of embedded processors, the same does not hold for multi-core simulators where large performance penalties are to be paid. In this paper we develop a fast and scalable simulation methodology for multi-core platforms based on parallel and just-in-time (Jit) dynamic binary translation (Dbt). Our approach can model large-scale multi-core configurations, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded multi-core platform implementing the Arcompact instruction set architecture (Isa). We have evaluated our parallel simulation methodology against the industry standard Splash-2 and Eembc MultiBench benchmarks and demonstrate simulation speeds up to 25,307 Mips on a 32-core x86 host machine for as many as 2,048 target processors whilst exhibiting minimal and near constant overhead, including memory considerations
GNU Radio
GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios. It can be used with readily-available, low-cost external RF hardware to create software-defined radios, or without hardware in a simulation-like environment. It is widely used in hobbyist, academic, and commercial environments to support both wireless communications research and real-world radio systems