2,871 research outputs found
Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements
In this thesis, I aimed to solve part of the missing heritability in neurodevelopmental disorders, using computational approaches. Next to the investigations of a novel epilepsy syndrome and investigations aiming to elucidate the regulation of the gene involved, I investigated and prioritized genomic sequences that have implications in gene regulation during the developmental stages of human brain, with the goal to create an atlas of high confidence non-coding regulatory elements that future studies can assess for genetic variants in genetically unexplained individuals suffering from neurodevelopmental disorders that are of suspected genetic origin
ENHANCING CLOUD SYSTEM RUNTIME TO ADDRESS COMPLEX FAILURES
As the reliance on cloud systems intensifies in our progressively digital world, understanding and reinforcing their reliability becomes more crucial than ever. Despite impressive advancements in augmenting the resilience of cloud systems, the growing incidence of complex failures now poses a substantial challenge to the availability of these systems. With cloud systems continuing to scale and increase in complexity, failures not only become more elusive to detect but can also lead to more catastrophic consequences. Such failures question the foundational premises of conventional fault-tolerance designs, necessitating the creation of novel system designs to counteract them.
This dissertation aims to enhance distributed systems’ capabilities to detect, localize, and react to complex failures at runtime. To this end, this dissertation makes contributions to address three emerging categories of failures in cloud systems. The first part delves into the investigation of partial failures, introducing OmegaGen, a tool adept at generating tailored checkers for detecting and localizing such failures. The second part grapples with silent semantic failures prevalent in cloud systems, showcasing our study findings, and introducing Oathkeeper, a tool that leverages past failures to infer rules and expose these silent issues. The third part explores solutions to slow failures via RESIN, a framework specifically designed to detect, diagnose, and mitigate memory leaks in cloud-scale infrastructures, developed in collaboration with Microsoft Azure. The dissertation concludes by offering insights into future directions for the construction of reliable cloud systems
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Learning and Control of Dynamical Systems
Despite the remarkable success of machine learning in various domains in recent years, our understanding of its fundamental limitations remains incomplete. This knowledge gap poses a grand challenge when deploying machine learning methods in critical decision-making tasks, where incorrect decisions can have catastrophic consequences. To effectively utilize these learning-based methods in such contexts, it is crucial to explicitly characterize their performance. Over the years, significant research efforts have been dedicated to learning and control of dynamical systems where the underlying dynamics are unknown or only partially known a priori, and must be inferred from collected data. However, much of these classical results have focused on asymptotic guarantees, providing limited insights into the amount of data required to achieve desired control performance while satisfying operational constraints such as safety and stability, especially in the presence of statistical noise.
In this thesis, we study the statistical complexity of learning and control of unknown dynamical systems. By utilizing recent advances in statistical learning theory, high-dimensional statistics, and control theoretic tools, we aim to establish a fundamental understanding of the number of samples required to achieve desired (i) accuracy in learning the unknown dynamics, (ii) performance in the control of the underlying system, and (iii) satisfaction of the operational constraints such as safety and stability. We provide finite-sample guarantees for these objectives and propose efficient learning and control algorithms that achieve the desired performance at these statistical limits in various dynamical systems. Our investigation covers a broad range of dynamical systems, starting from fully observable linear dynamical systems to partially observable linear dynamical systems, and ultimately, nonlinear systems.
We deploy our learning and control algorithms in various adaptive control tasks in real-world control systems and demonstrate their strong empirical performance along with their learning, robustness, and stability guarantees. In particular, we implement one of our proposed methods, Fourier Adaptive Learning and Control (FALCON), on an experimental aerodynamic testbed under extreme turbulent flow dynamics in a wind tunnel. The results show that FALCON achieves state-of-the-art stabilization performance and consistently outperforms conventional and other learning-based methods by at least 37%, despite using 8 times less data. The superior performance of FALCON arises from its physically and theoretically accurate modeling of the underlying nonlinear turbulent dynamics, which yields rigorous finite-sample learning and performance guarantees. These findings underscore the importance of characterizing the statistical complexity of learning and control of unknown dynamical systems.</p
Multi-Level Data-Driven Battery Management: From Internal Sensing to Big Data Utilization
Battery management system (BMS) is essential for the safety and longevity of lithium-ion battery (LIB) utilization. With the rapid development of new sensing techniques, artificial intelligence and the availability of huge amounts of battery operational data, data-driven battery management has attracted ever-widening attention as a promising solution. This review article overviews the recent progress and future trend of data-driven battery management from a multi-level perspective. The widely-explored data-driven methods relying on routine measurements of current, voltage, and surface temperature are reviewed first. Within a deeper understanding and at the microscopic level, emerging management strategies with multi-dimensional battery data assisted by new sensing techniques have been reviewed. Enabled by the fast growth of big data technologies and platforms, the efficient use of battery big data for enhanced battery management is further overviewed. This belongs to the upper and the macroscopic level of the data-driven BMS framework. With this endeavor, we aim to motivate new insights into the future development of next-generation data-driven battery management
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework
The generative Artificial Intelligence (AI) tools based on Large Language
Models (LLMs) use billions of parameters to extensively analyse large datasets
and extract critical private information such as, context, specific details,
identifying information etc. This have raised serious threats to user privacy
and reluctance to use such tools. This article proposes the conceptual model
called PrivChatGPT, a privacy-preserving model for LLMs that consists of two
main components i.e., preserving user privacy during the data
curation/pre-processing together with preserving private context and the
private training process for large-scale data. To demonstrate its
applicability, we show how a private mechanism could be integrated into the
existing model for training LLMs to protect user privacy; specifically, we
employed differential privacy and private training using Reinforcement Learning
(RL). We measure the privacy loss and evaluate the measure of uncertainty or
randomness once differential privacy is applied. It further recursively
evaluates the level of privacy guarantees and the measure of uncertainty of
public database and resources, during each update when new information is added
for training purposes. To critically evaluate the use of differential privacy
for private LLMs, we hypothetically compared other mechanisms e..g, Blockchain,
private information retrieval, randomisation, for various performance measures
such as the model performance and accuracy, computational complexity, privacy
vs. utility etc. We conclude that differential privacy, randomisation, and
obfuscation can impact utility and performance of trained models, conversely,
the use of ToR, Blockchain, and PIR may introduce additional computational
complexity and high training latency. We believe that the proposed model could
be used as a benchmark for proposing privacy preserving LLMs for generative AI
tools
- …