4,478 research outputs found
Bayesian inference for challenging scientific models
Advances in technology and computation have led to ever more complicated
scientific models of phenomena across a wide variety of fields. Many of these
models present challenges for Bayesian inference, as a result of computationally
intensive likelihoods, high-dimensional parameter spaces or large dataset sizes.
In this thesis we show how we can apply developments in probabilistic machine
learning and statistics to do inference with examples of these types of models.
As a demonstration of an applied inference problem involving a non-trivial
likelihood computation, we show how a combination of optimisation and
MCMC methods along with careful consideration of priors can be used to infer
the parameters of an ODE model of the cardiac action potential.
We then consider the problem of pileup, a phenomenon that occurs in
astronomy when using CCD detectors to observe bright sources. It complicates
the fitting of even simple spectral models by introducing an observation model
with a large number of continuous and discrete latent variables that scales with
the size of the dataset. We develop an MCMC-based method that can work in
the presence of pileup by explicitly marginalising out discrete variables and
using adaptive HMC on the remaining continuous variables. We show with
synthetic experiments that it allows us to fit spectral models in the presence
of pileup without biasing the results. We also compare it to neural Simulation-
Based Inference approaches, and find that they perform comparably to the
MCMC-based approach whilst being able to scale to larger datasets.
As an example of a problem where we wish to do inference with extremely
large datasets, we consider the Extreme Deconvolution method. The method
fits a probability density to a dataset where each observation has Gaussian
noise added with a known sample-specific covariance, originally intended
for use with astronomical datasets. The existing fitting method is batch EM,
which would not normally be applied to large datasets such as the Gaia catalog
containing noisy observations of a billion stars. In this thesis we propose two
minibatch variants of extreme deconvolution, based on an online variation of
the EM algorithm, and direct gradient-based optimisation of the log-likelihood,
both of which can run on GPUs. We demonstrate that these methods provide
faster fitting, whilst being able to scale to much larger models for use with
larger datasets.
We then extend the extreme deconvolution approach to work with non-
Gaussian noise, and to use more flexible density estimators such as normalizing
flows. Since both adjustments lead to an intractable likelihood, we resort to
amortized variational inference in order to fit them. We show that for some
datasets that flows can outperform Gaussian mixtures for extreme deconvolution,
and that fitting with non-Gaussian noise is now possible
Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems
This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Semi-online Scheduling with Lookahead
The knowledge of future partial information in the form of a lookahead to
design efficient online algorithms is a theoretically-efficient and realistic
approach to solving computational problems. Design and analysis of semi-online
algorithms with extra-piece-of-information (EPI) as a new input parameter has
gained the attention of the theoretical computer science community in the last
couple of decades. Though competitive analysis is a pessimistic worst-case
performance measure to analyze online algorithms, it has immense theoretical
value in developing the foundation and advancing the state-of-the-art
contributions in online and semi-online scheduling. In this paper, we study and
explore the impact of lookahead as an EPI in the context of online scheduling
in identical machine frameworks. We introduce a -lookahead model and design
improved competitive semi-online algorithms. For a -identical machine
setting, we prove a lower bound of and design an optimal
algorithm with a matching upper bound of on the competitive
ratio. For a -identical machine setting, we show a lower bound of
and design a -competitive improved semi-online
algorithm.Comment: 14 pages, 1 figur
Regulating ChatGPT and other Large Generative AI Models
Large generative AI models (LGAIMs), such as ChatGPT or Stable Diffusion, are
rapidly transforming the way we communicate, illustrate, and create. However,
AI regulation, in the EU and beyond, has primarily focused on conventional AI
models, not LGAIMs. This paper will situate these new generative models in the
current debate on trustworthy AI regulation, and ask how the law can be
tailored to their capabilities. After laying technical foundations, the legal
part of the paper proceeds in four steps, covering (1) direct regulation, (2)
data protection, (3) content moderation, and (4) policy proposals. It suggests
a novel terminology to capture the AI value chain in LGAIM settings by
differentiating between LGAIM developers, deployers, professional and
non-professional users, as well as recipients of LGAIM output. We tailor
regulatory duties to these different actors along the value chain and suggest
four strategies to ensure that LGAIMs are trustworthy and deployed for the
benefit of society at large. Rules in the AI Act and other direct regulation
must match the specificities of pre-trained models. In particular, regulation
should focus on concrete high-risk applications, and not the pre-trained model
itself, and should include (i) obligations regarding transparency and (ii) risk
management. Non-discrimination provisions (iii) may, however, apply to LGAIM
developers. Lastly, (iv) the core of the DSA content moderation rules should be
expanded to cover LGAIMs. This includes notice and action mechanisms, and
trusted flaggers. In all areas, regulators and lawmakers need to act fast to
keep track with the dynamics of ChatGPT et al.Comment: under revie
Mapping the Focal Points of WordPress: A Software and Critical Code Analysis
Programming languages or code can be examined through numerous analytical lenses. This project is a critical analysis of WordPress, a prevalent web content management system, applying four modes of inquiry. The project draws on theoretical perspectives and areas of study in media, software, platforms, code, language, and power structures. The applied research is based on Critical Code Studies, an interdisciplinary field of study that holds the potential as a theoretical lens and methodological toolkit to understand computational code beyond its function. The project begins with a critical code analysis of WordPress, examining its origins and source code and mapping selected vulnerabilities. An examination of the influence of digital and computational thinking follows this. The work also explores the intersection of code patching and vulnerability management and how code shapes our sense of control, trust, and empathy, ultimately arguing that a rhetorical-cultural lens can be used to better understand code\u27s controlling influence. Recurring themes throughout these analyses and observations are the connections to power and vulnerability in WordPress\u27 code and how cultural, processual, rhetorical, and ethical implications can be expressed through its code, creating a particular worldview. Code\u27s emergent properties help illustrate how human values and practices (e.g., empathy, aesthetics, language, and trust) become encoded in software design and how people perceive the software through its worldview. These connected analyses reveal cultural, processual, and vulnerability focal points and the influence these entanglements have concerning WordPress as code, software, and platform. WordPress is a complex sociotechnical platform worthy of further study, as is the interdisciplinary merging of theoretical perspectives and disciplines to critically examine code. Ultimately, this project helps further enrich the field by introducing focal points in code, examining sociocultural phenomena within the code, and offering techniques to apply critical code methods
R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics
Autonomous robotic systems, like autonomous vehicles and robotic search and
rescue, require efficient on-device training for continuous adaptation of Deep
Reinforcement Learning (DRL) models in dynamic environments. This research is
fundamentally motivated by the need to understand and address the challenges of
on-device real-time DRL, which involves balancing timing and algorithm
performance under memory constraints, as exposed through our extensive
empirical studies. This intricate balance requires co-optimizing two pivotal
parameters of DRL training -- batch size and replay buffer size. Configuring
these parameters significantly affects timing and algorithm performance, while
both (unfortunately) require substantial memory allocation to achieve
near-optimal performance.
This paper presents R^3, a holistic solution for managing timing, memory, and
algorithm performance in on-device real-time DRL training. R^3 employs (i) a
deadline-driven feedback loop with dynamic batch sizing for optimizing timing,
(ii) efficient memory management to reduce memory footprint and allow larger
replay buffer sizes, and (iii) a runtime coordinator guided by heuristic
analysis and a runtime profiler for dynamically adjusting memory resource
reservations. These components collaboratively tackle the trade-offs in
on-device DRL training, improving timing and algorithm performance while
minimizing the risk of out-of-memory (OOM) errors.
We implemented and evaluated R^3 extensively across various DRL frameworks
and benchmarks on three hardware platforms commonly adopted by autonomous
robotic systems. Additionally, we integrate R^3 with a popular realistic
autonomous car simulator to demonstrate its real-world applicability.
Evaluation results show that R^3 achieves efficacy across diverse platforms,
ensuring consistent latency performance and timing predictability with minimal
overhead.Comment: Accepted by RTSS 202
NCC: Natural Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall
Strictly serializable datastores greatly simplify the development of correct
applications by providing strong consistency guarantees. However, existing
techniques pay unnecessary costs for naturally consistent transactions, which
arrive at servers in an order that is already strictly serializable. We find
these transactions are prevalent in datacenter workloads. We exploit this
natural arrival order by executing transaction requests with minimal costs
while optimistically assuming they are naturally consistent, and then leverage
a timestamp-based technique to efficiently verify if the execution is indeed
consistent. In the process of designing such a timestamp-based technique, we
identify a fundamental pitfall in relying on timestamps to provide strict
serializability, and name it the timestamp-inversion pitfall. We find
timestamp-inversion has affected several existing works.
We present Natural Concurrency Control (NCC), a new concurrency control
technique that guarantees strict serializability and ensures minimal costs --
i.e., one-round latency, lock-free, and non-blocking execution -- in the best
(and common) case by leveraging natural consistency. NCC is enabled by three
key components: non-blocking execution, decoupled response control, and
timestamp-based consistency check. NCC avoids timestamp-inversion with a new
technique: response timing control, and proposes two optimization techniques,
asynchrony-aware timestamps and smart retry, to reduce false aborts. Moreover,
NCC designs a specialized protocol for read-only transactions, which is the
first to achieve the optimal best-case performance while ensuring strict
serializability, without relying on synchronized clocks. Our evaluation shows
that NCC outperforms state-of-the-art solutions by an order of magnitude on
many workloads
- …