9,846 research outputs found
Active Coverage for PAC Reinforcement Learning
Collecting and leveraging data with good coverage properties plays a crucial
role in different aspects of reinforcement learning (RL), including reward-free
exploration and offline learning. However, the notion of "good coverage" really
depends on the application at hand, as data suitable for one context may not be
so for another. In this paper, we formalize the problem of active coverage in
episodic Markov decision processes (MDPs), where the goal is to interact with
the environment so as to fulfill given sampling requirements. This framework is
sufficiently flexible to specify any desired coverage property, making it
applicable to any problem that involves online exploration. Our main
contribution is an instance-dependent lower bound on the sample complexity of
active coverage and a simple game-theoretic algorithm, CovGame, that nearly
matches it. We then show that CovGame can be used as a building block to solve
different PAC RL tasks. In particular, we obtain a simple algorithm for PAC
reward-free exploration with an instance-dependent sample complexity that, in
certain MDPs which are "easy to explore", is lower than the minimax one. By
further coupling this exploration algorithm with a new technique to do implicit
eliminations in policy space, we obtain a computationally-efficient algorithm
for best-policy identification whose instance-dependent sample complexity
scales with gaps between policy values.Comment: Accepted at COLT 202
An investigation of entorhinal spatial representations in self-localisation behaviours
Spatial-modulated cells of the medial entorhinal cortex (MEC) and neighbouring cortices are thought to provide the neural substrate for self-localisation behaviours. These cells include grid cells of the MEC which are thought to compute path integration operations to update self-location estimates. In order to read this grid code, downstream cells are thought to reconstruct a positional estimate as a simple rate-coded representation of space.
Here, I show the coding scheme of grid cell and putative readout cells recorded from mice performing a virtual reality (VR) linear location task which engaged mice in both beaconing and path integration behaviours. I found grid cells can encode two unique coding schemes on the linear track, namely a position code which reflects periodic grid fields anchored to salient features of the track and a distance code which reflects periodic grid fields without this anchoring. Grid cells were found to switch between these coding schemes within sessions. When grid cells were encoding position, mice performed better at trials that required path integration but not on trials that required beaconing. This result provides the first mechanistic evidence linking grid cell activity to path integration-dependent behaviour.
Putative readout cells were found in the form of ramp cells which fire proportionally as a function of location in defined regions of the linear track. This ramping activity was found to be primarily explained by track position rather than other kinematic variables like speed and acceleration. These representations were found to be maintained across both trial types and outcomes indicating they likely result from recall of the track structure.
Together, these results support the functional importance of grid and ramp cells for self-localisation behaviours. Future investigations will look into the coherence between these two neural populations, which may together form a complete neural system for coding and decoding self-location in the brain
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Towards A Practical High-Assurance Systems Programming Language
Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation.
Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code.
To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
Online Network Source Optimization with Graph-Kernel MAB
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn
online the optimal source placement in large scale networks, such that the
reward obtained from a priori unknown network processes is maximized. The
uncertainty calls for online learning, which suffers however from the curse of
dimensionality. To achieve sample efficiency, we describe the network processes
with an adaptive graph dictionary model, which typically leads to sparse
spectral representations. This enables a data-efficient learning framework,
whose learning rate scales with the dimension of the spectral representation
model instead of the one of the network. We then propose Grab-UCB, an online
sequential decision strategy that learns the parameters of the spectral
representation while optimizing the action strategy. We derive the performance
guarantees that depend on network parameters, which further influence the
learning curve of the sequential decision strategy We introduce a
computationally simplified solving method, Grab-arm-Light, an algorithm that
walks along the edges of the polytope representing the objective function.
Simulations results show that the proposed online learning algorithm
outperforms baseline offline methods that typically separate the learning phase
from the testing one. The results confirm the theoretical findings, and further
highlight the gain of the proposed online learning strategy in terms of
cumulative regret, sample efficiency and computational complexity
Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are
anticipated to play crucial roles in the development of 5G-and-beyond wireless
networks and the Internet of Things (IoT). In this paper, we propose a secure
SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay,
periodically receiving and relaying small data packets from a remote IoT device
to its receiver in two hops with strict latency requirements, in the presence
of an eavesdropper. This system requires careful optimization of important
design parameters, such as the coding blocklengths of both hops, transmit
powers, and UAV's trajectory. While the overall optimization problem is
nonconvex, we tackle it by applying a block successive convex approximation
(BSCA) approach to divide the original problem into three subproblems and solve
them separately. Then, an overall iterative algorithm is proposed to obtain the
final design with guaranteed convergence. Our proposed low-complexity algorithm
incorporates 3D trajectory design and resource management to optimize the
effective average secrecy throughput of the communication system over the
course of UAV-relay's mission. Simulation results demonstrate significant
performance improvements compared to various benchmark schemes and provide
useful design insights on the coding blocklengths and transmit powers along the
trajectory of the UAV
TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning
Deep learning (DL) models for tabular data problems are receiving
increasingly more attention, while the algorithms based on gradient-boosted
decision trees (GBDT) remain a strong go-to solution. Following the recent
trends in other domains, such as natural language processing and computer
vision, several retrieval-augmented tabular DL models have been recently
proposed. For a given target object, a retrieval-based model retrieves other
relevant objects, such as the nearest neighbors, from the available (training)
data and uses their features or even labels to make a better prediction.
However, we show that the existing retrieval-based tabular DL solutions provide
only minor, if any, benefits over the properly tuned simple retrieval-free
baselines. Thus, it remains unclear whether the retrieval-based approach is a
worthy direction for tabular DL.
In this work, we give a strong positive answer to this question. We start by
incrementally augmenting a simple feed-forward architecture with an
attention-like retrieval component similar to those of many (tabular)
retrieval-based models. Then, we highlight several details of the attention
mechanism that turn out to have a massive impact on the performance on tabular
data problems, but that were not explored in prior work. As a result, we design
TabR -- a simple retrieval-based tabular DL model which, on a set of public
benchmarks, demonstrates the best average performance among tabular DL models,
becomes the new state-of-the-art on several datasets, and even outperforms GBDT
models on the recently proposed ``GBDT-friendly'' benchmark (see the first
figure).Comment: Code: https://github.com/yandex-research/tabular-dl-tab
Technology for Low Resolution Space Based RSO Detection and Characterisation
Space Situational Awareness (SSA) refers to all activities to detect, identify and track objects in Earth orbit. SSA is critical to all current and future space activities and protect space assets by providing access control, conjunction warnings, and monitoring status of active satellites. Currently SSA methods and infrastructure are not sufficient to account for the proliferations of space debris. In response to the need for better SSA there has been many different areas of research looking to improve SSA most of the requiring dedicated ground or space-based infrastructure. In this thesis, a novel approach for the characterisation of RSO’s (Resident Space Objects) from passive low-resolution space-based sensors is presented with all the background work performed to enable this novel method. Low resolution space-based sensors are common on current satellites, with many of these sensors being in space using them passively to detect RSO’s can greatly augment SSA with out expensive infrastructure or long lead times. One of the largest hurtles to overcome with research in the area has to do with the lack of publicly available labelled data to test and confirm results with. To overcome this hurtle a simulation software, ORBITALS, was created. To verify and validate the ORBITALS simulator it was compared with the Fast Auroral Imager images, which is one of the only publicly available low-resolution space-based images found with auxiliary data. During the development of the ORBITALS simulator it was found that the generation of these simulated images are computationally intensive when propagating the entire space catalog. To overcome this an upgrade of the currently used propagation method, Specialised General Perturbation Method 4th order (SGP4), was performed to allow the algorithm to run in parallel reducing the computational time required to propagate entire catalogs of RSO’s. From the results it was found that the standard facet model with a particle swarm optimisation performed the best estimating an RSO’s attitude with a 0.66 degree RMSE accuracy across a sequence, and ~1% MAPE accuracy for the optical properties. This accomplished this thesis goal of demonstrating the feasibility of low-resolution passive RSO characterisation from space-based platforms in a simulated environment
A Low-Delay MAC for IoT Applications: Decentralized Optimal Scheduling of Queues without Explicit State Information Sharing
We consider a system of several collocated nodes sharing a time slotted
wireless channel, and seek a MAC (medium access control) that (i) provides low
mean delay, (ii) has distributed control (i.e., there is no central scheduler),
and (iii) does not require explicit exchange of state information or control
signals. The design of such MAC protocols must keep in mind the need for
contention access at light traffic, and scheduled access in heavy traffic,
leading to the long-standing interest in hybrid, adaptive MACs.
Working in the discrete time setting, for the distributed MAC design, we
consider a practical information structure where each node has local
information and some common information obtained from overhearing. In this
setting, "ZMAC" is an existing protocol that is hybrid and adaptive. We
approach the problem via two steps (1) We show that it is sufficient for the
policy to be "greedy" and "exhaustive". Limiting the policy to this class
reduces the problem to obtaining a queue switching policy at queue emptiness
instants. (2) Formulating the delay optimal scheduling as a POMDP (partially
observed Markov decision process), we show that the optimal switching rule is
Stochastic Largest Queue (SLQ).
Using this theory as the basis, we then develop a practical distributed
scheduler, QZMAC, which is also tunable. We implement QZMAC on standard
off-the-shelf TelosB motes and also use simulations to compare QZMAC with the
full-knowledge centralized scheduler, and with ZMAC. We use our implementation
to study the impact of false detection while overhearing the common
information, and the efficiency of QZMAC. Our simulation results show that the
mean delay with QZMAC is close that of the full-knowledge centralized
scheduler.Comment: 28 pages, 19 figure
Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives
Deep learning has demonstrated remarkable performance across various tasks in
medical imaging. However, these approaches primarily focus on supervised
learning, assuming that the training and testing data are drawn from the same
distribution. Unfortunately, this assumption may not always hold true in
practice. To address these issues, unsupervised domain adaptation (UDA)
techniques have been developed to transfer knowledge from a labeled domain to a
related but unlabeled domain. In recent years, significant advancements have
been made in UDA, resulting in a wide range of methodologies, including feature
alignment, image translation, self-supervision, and disentangled representation
methods, among others. In this paper, we provide a comprehensive literature
review of recent deep UDA approaches in medical imaging from a technical
perspective. Specifically, we categorize current UDA research in medical
imaging into six groups and further divide them into finer subcategories based
on the different tasks they perform. We also discuss the respective datasets
used in the studies to assess the divergence between the different domains.
Finally, we discuss emerging areas and provide insights and discussions on
future research directions to conclude this survey.Comment: Under Revie
- …