12,563 research outputs found
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Wave-function parametrization of a probability measure
We show that the unitary operator on a separable Hilbert space is a
parametrization of any conditional probability measure in a standard measure
space. We propose unitary inference, a generalization of Bayesian inference. We
study implications for classical statistical mechanics.Comment: 15 pages, v3: Basis change of prior wave-function discussed and other
minor improvement
Analysing Equilibrium States for Population Diversity
Population diversity is crucial in evolutionary algorithms as it helps with
global exploration and facilitates the use of crossover. Despite many runtime
analyses showing advantages of population diversity, we have no clear picture
of how diversity evolves over time. We study how population diversity of
algorithms, measured by the sum of pairwise Hamming distances,
evolves in a fitness-neutral environment. We give an exact formula for the
drift of population diversity and show that it is driven towards an equilibrium
state. Moreover, we bound the expected time for getting close to the
equilibrium state. We find that these dynamics, including the location of the
equilibrium, are unaffected by surprisingly many algorithmic choices. All
unbiased mutation operators with the same expected number of bit flips have the
same effect on the expected diversity. Many crossover operators have no effect
at all, including all binary unbiased, respectful operators. We review
crossover operators from the literature and identify crossovers that are
neutral towards the evolution of diversity and crossovers that are not.Comment: To appear at GECCO 202
Attractor identification in asynchronous Boolean dynamics with network reduction
Identification of attractors, that is, stable states and sustained
oscillations, is an important step in the analysis of Boolean models and
exploration of potential variants. We describe an approach to the search for
asynchronous cyclic attractors of Boolean networks that exploits, in a novel
way, the established technique of elimination of components. Computation of
attractors of simplified networks allows the identification of a limited number
of candidate attractor states, which are then screened with techniques of
reachability analysis combined with trap space computation. An implementation
that brings together recently developed Boolean network analysis tools, tested
on biological models and random benchmark networks, shows the potential to
significantly reduce running times.Comment: 13 page
On questions of uniqueness for the vacant set of Wiener sausages and Brownian interlacements
We consider connectivity properties of the vacant set of (random) ensembles
of Wiener sausages in in the transient dimensions . We
prove that the vacant set of Brownian interlacements contains at most one
infinite connected component almost surely. For finite ensembles of Wiener
sausages, we provide sharp polynomial bounds on the probability that their
vacant set contains at least connected components in microscopic balls. The
main proof ingredient is a sharp polynomial bound on the probability that
several Brownian motions visit jointly all hemiballs of the unit ball while
avoiding a slightly smaller ball.Comment: 40 page
Modularizing and Assembling Cognitive Map Learners via Hyperdimensional Computing
Biological organisms must learn how to control their own bodies to achieve
deliberate locomotion, that is, predict their next body position based on their
current position and selected action. Such learning is goal-agnostic with
respect to maximizing (minimizing) an environmental reward (penalty) signal. A
cognitive map learner (CML) is a collection of three separate yet
collaboratively trained artificial neural networks which learn to construct
representations for the node states and edge actions of an arbitrary
bidirectional graph. In so doing, a CML learns how to traverse the graph nodes;
however, the CML does not learn when and why to move from one node state to
another. This work created CMLs with node states expressed as high dimensional
vectors suitable for hyperdimensional computing (HDC), a form of symbolic
machine learning (ML). In so doing, graph knowledge (CML) was segregated from
target node selection (HDC), allowing each ML approach to be trained
independently. The first approach used HDC to engineer an arbitrary number of
hierarchical CMLs, where each graph node state specified target node states for
the next lower level CMLs to traverse to. Second, an HDC-based
stimulus-response experience model was demonstrated per CML. Because
hypervectors may be in superposition with each other, multiple experience
models were added together and run in parallel without any retraining. Lastly,
a CML-HDC ML unit was modularized: trained with proxy symbols such that
arbitrary, application-specific stimulus symbols could be operated upon without
retraining either CML or HDC model. These methods provide a template for
engineering heterogenous ML systems
Multiscale structural optimisation with concurrent coupling between scales
A robust three-dimensional multiscale topology optimisation framework with concurrent coupling between scales is presented. Concurrent coupling ensures that only the microscale data required to evaluate the macroscale model during each iteration of optimisation is collected and results in considerable computational savings. This represents the principal novelty of the framework and permits a previously intractable number of design variables to be used in the parametrisation of the microscale geometry, which in turn enables accessibility to a greater range of mechanical point properties during optimisation. Additionally, the microscale data collected during optimisation is stored in a re-usable database, further reducing the computational expense of subsequent iterations or entirely new optimisation problems. Application of this methodology enables structures with precise functionally-graded mechanical properties over two-scales to be derived, which satisfy one or multiple functional objectives. For all applications of the framework presented within this thesis, only a small fraction of the microstructure database is required to derive the optimised multiscale solutions, which demonstrates a significant reduction in the computational expense of optimisation in comparison to contemporary sequential frameworks.
The derivation and integration of novel additive manufacturing constraints for open-walled microstructures within the concurrently coupled multiscale topology optimisation framework is also presented. Problematic fabrication features are discouraged through the application of an augmented projection filter and two relaxed binary integral constraints, which prohibit the formation of unsupported members, isolated assemblies of overhanging members and slender members during optimisation. Through the application of these constraints, it is possible to derive self-supporting, hierarchical structures with varying topology, suitable for fabrication through additive manufacturing processes.Open Acces
Domain Sparsification of Discrete Distributions Using Entropic Independence
We present a framework for speeding up the time it takes to sample from discrete distributions ? defined over subsets of size k of a ground set of n elements, in the regime where k is much smaller than n. We show that if one has access to estimates of marginals P_{S? ?} {i ? S}, then the task of sampling from ? can be reduced to sampling from related distributions ? supported on size k subsets of a ground set of only n^{1-?}? poly(k) elements. Here, 1/? ? [1, k] is the parameter of entropic independence for ?. Further, our algorithm only requires sparsified distributions ? that are obtained by applying a sparse (mostly 0) external field to ?, an operation that for many distributions ? of interest, retains algorithmic tractability of sampling from ?. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of ?, and in return reduce the amortized cost needed to produce many samples from the distribution ?, as is often needed in upstream tasks such as counting and inference.
For a wide range of distributions where ? = ?(1), our result reduces the domain size, and as a corollary, the cost-per-sample, by a poly(n) factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi?ski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to ? = 1). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over O(1/k) relative error established in prior work
Foundations for programming and implementing effect handlers
First-class control operators provide programmers with an expressive and efficient
means for manipulating control through reification of the current control state as a first-class object, enabling programmers to implement their own computational effects and
control idioms as shareable libraries. Effect handlers provide a particularly structured
approach to programming with first-class control by naming control reifying operations
and separating from their handling.
This thesis is composed of three strands of work in which I develop operational
foundations for programming and implementing effect handlers as well as exploring
the expressive power of effect handlers.
The first strand develops a fine-grain call-by-value core calculus of a statically
typed programming language with a structural notion of effect types, as opposed to the
nominal notion of effect types that dominates the literature. With the structural approach,
effects need not be declared before use. The usual safety properties of statically typed
programming are retained by making crucial use of row polymorphism to build and
track effect signatures. The calculus features three forms of handlers: deep, shallow,
and parameterised. They each offer a different approach to manipulate the control state
of programs. Traditional deep handlers are defined by folds over computation trees,
and are the original con-struct proposed by Plotkin and Pretnar. Shallow handlers are
defined by case splits (rather than folds) over computation trees. Parameterised handlers
are deep handlers extended with a state value that is threaded through the folds over
computation trees. To demonstrate the usefulness of effects and handlers as a practical
programming abstraction I implement the essence of a small UNIX-style operating
system complete with multi-user environment, time-sharing, and file I/O.
The second strand studies continuation passing style (CPS) and abstract machine
semantics, which are foundational techniques that admit a unified basis for implementing deep, shallow, and parameterised effect handlers in the same environment. The
CPS translation is obtained through a series of refinements of a basic first-order CPS
translation for a fine-grain call-by-value language into an untyped language. Each refinement moves toward a more intensional representation of continuations eventually
arriving at the notion of generalised continuation, which admit simultaneous support for
deep, shallow, and parameterised handlers. The initial refinement adds support for deep
handlers by representing stacks of continuations and handlers as a curried sequence of
arguments. The image of the resulting translation is not properly tail-recursive, meaning some function application terms do not appear in tail position. To rectify this the
CPS translation is refined once more to obtain an uncurried representation of stacks
of continuations and handlers. Finally, the translation is made higher-order in order to
contract administrative redexes at translation time. The generalised continuation representation is used to construct an abstract machine that provide simultaneous support for
deep, shallow, and parameterised effect handlers. kinds of effect handlers.
The third strand explores the expressiveness of effect handlers. First, I show that
deep, shallow, and parameterised notions of handlers are interdefinable by way of typed
macro-expressiveness, which provides a syntactic notion of expressiveness that affirms
the existence of encodings between handlers, but it provides no information about the
computational content of the encodings. Second, using the semantic notion of expressiveness I show that for a class of programs a programming language with first-class
control (e.g. effect handlers) admits asymptotically faster implementations than possible in a language without first-class control
Digital asset management via distributed ledgers
Distributed ledgers rose to prominence with the advent of Bitcoin, the first provably secure protocol to solve consensus in an open-participation setting. Following, active research and engineering efforts have proposed a multitude of applications and alternative designs, the most prominent being Proof-of-Stake (PoS). This thesis expands the scope of secure and efficient asset management over a distributed ledger around three axes: i) cryptography; ii) distributed systems; iii) game theory and economics. First, we analyze the security of various wallets. We start with a formal model of hardware wallets, followed by an analytical framework of PoS wallets, each outlining the unique properties of Proof-of-Work (PoW) and PoS respectively. The latter also provides a rigorous design to form collaborative participating entities, called stake pools. We then propose Conclave, a stake pool design which enables a group of parties to participate in a PoS system in a collaborative manner, without a central operator. Second, we focus on efficiency. Decentralized systems are aimed at thousands of users across the globe, so a rigorous design for minimizing memory and storage consumption is a prerequisite for scalability. To that end, we frame ledger maintenance as an optimization problem and design a multi-tier framework for designing wallets which ensure that updates increase the ledger’s global state only to a minimal extent, while preserving the security guarantees outlined in the security analysis. Third, we explore incentive-compatibility and analyze blockchain systems from a micro and a macroeconomic perspective. We enrich our cryptographic and systems' results by analyzing the incentives of collective pools and designing a state efficient Bitcoin fee function. We then analyze the Nash dynamics of distributed ledgers, introducing a formal model that evaluates whether rational, utility-maximizing participants are disincentivized from exhibiting undesirable infractions, and highlighting the differences between PoW and PoS-based ledgers, both in a standalone setting and under external parameters, like market price fluctuations. We conclude by introducing a macroeconomic principle, cryptocurrency egalitarianism, and then describing two mechanisms for enabling taxation in blockchain-based currency systems
- …