11 research outputs found
Bridging the reality gap in quantum devices with physics-aware machine learning
The discrepancies between reality and simulation impede the optimization and scalability of solid-state quantum devices. Disorder induced by the unpredictable distribution of material defects is one of the major contributions to the reality gap. We bridge this gap using physics-aware machine learning, in particular, using an approach combining a physical model, deep learning, Gaussian random field, and Bayesian inference. This approach enables us to infer the disorder potential of a nanoscale electronic device from electron-transport data. This inference is validated by verifying the algorithm’s predictions about the gate-voltage values required for a laterally defined quantum-dot device in AlGaAs/GaAs to produce current features corresponding to a double-quantum-dot regime
Random Fourier signature features
Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series
Inference of transport phenomena in quantum devices
This thesis is concerned with charge transport in electrostatically defined quantum dot devices. Such devices display a wide range of transport phenomena in both open and closed configurations. The transport regime can be tuned experimentally by controlling the voltages applied to gate electrodes, but the precise electrostatic landscape which determines the transport regime is unknown. This uncertainty is given by variations in device fabrication, material defects, and sources of electrostatic disorder.
The research chapters of this thesis consider a range of transport regimes in quantum dot devices, and infer properties of the device using both experimental and theoretical techniques. The first research chapter considers the detection of single charge transport events through a double quantum dot. By fitting an open quantum systems model to the sub-attoampere currents measured, tunnel rates are inferred. The second results chapter considers an electrostatic simulation of a quantum dot device and how it can be accelerated using deep learning. This accelerated model is then used in the third results chapter, along with experimental measurements of the transport regime, to inform a Bayesian inference algorithm and produce a set of disorder potentials to narrow the gap between simulation and reality. The final results chapter develops a differentiable quantum master equation solver which is used for parameter estimation in a theoretical study of transport in single and double quantum dots
Rethinking Attention with Performers
We introduce Performers, Transformer architectures which can estimate regular
(softmax) full-rank-attention Transformers with provable accuracy, but using
only linear (as opposed to quadratic) space and time complexity, without
relying on any priors such as sparsity or low-rankness. To approximate softmax
attention-kernels, Performers use a novel Fast Attention Via positive
Orthogonal Random features approach (FAVOR+), which may be of independent
interest for scalable kernel methods. FAVOR+ can be also used to efficiently
model kernelizable attention mechanisms beyond softmax. This representational
power is crucial to accurately compare softmax with other kernels for the first
time on large-scale tasks, beyond the reach of regular Transformers, and
investigate optimal attention-kernels. Performers are linear architectures
fully compatible with regular Transformers and with strong theoretical
guarantees: unbiased or nearly-unbiased estimation of the attention matrix,
uniform convergence and low estimation variance. We tested Performers on a rich
set of tasks stretching from pixel-prediction through text models to protein
sequence modeling. We demonstrate competitive results with other examined
efficient sparse and dense attention methods, showcasing effectiveness of the
novel attention-learning paradigm leveraged by Performers.Comment: Published as a conference paper + oral presentation at ICLR 2021. 38
pages. See
https://github.com/google-research/google-research/tree/master/protein_lm for
protein language model code, and
https://github.com/google-research/google-research/tree/master/performer for
Performer code. See
https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html
for Google AI Blo
Recommended from our members
Structure in Machine Learning: Graphical Models and Monte Carlo Methods
This thesis is concerned with two main areas: approximate inference in discrete graphical models, and random embeddings for dimensionality reduction and approximate inference in kernel methods. Approximate inference is a fundamental problem in machine learning and statistics, with strong connections to other domains such as theoretical computer science. At the same time, there has often been a gap between the success of many algorithms in this area in practice, and what can be explained by theory; thus, an important research effort is to bridge this gap. Random embeddings for dimensionality reduction and approximate inference have led to great improvements in scalability of a wide variety of methods in machine learning. In recent years, there has been much work on how the stochasticity introduced by these approaches can be better controlled, and what further computational improvements can be made.
In the first part of this thesis, we study approximate inference algorithms for discrete graphical models. Firstly, we consider linear programming methods for approximate MAP inference, and develop our understanding of conditions for exactness of these approximations. Such guarantees of exactness are typically based on either structural restrictions on the underlying graph corresponding to the model (such as low treewidth), or restrictions on the types of potential functions that may be present in the model (such as log-supermodularity). We contribute two new classes of exactness guarantees: the first of these takes the form of particular hybrid restrictions on a combination of graph structure and potential types, whilst the second is given by excluding particular substructures from the underlying graph, via graph minor theory. We also study a particular family of transformation methods of graphical models, uprooting and rerooting, and their effect on approximate MAP and marginal inference methods. We prove new theoretical results on the behaviour of particular approximate inference methods under these transformations, in particular showing that the triplet relaxation of the marginal polytope is unique in being universally rooted. We also introduce a heuristic which quickly picks a rerooting, and demonstrate benefits empirically on models over several graph topologies.
In the second part of this thesis, we study Monte Carlo methods for both linear dimensionality reduction and approximate inference in kernel machines. We prove the statistical benefit of coupling Monte Carlo samples to be almost-surely orthogonal in a variety of contexts, and study fast approximate methods of inducing this coupling. A surprising result is that these approximate methods can simultaneously offer improved statistical benefits, time complexity, and space complexity over i.i.d. Monte Carlo samples. We evaluate our methods on a variety of datasets, directly studying their effects on approximate kernel evaluation, as well as on downstream tasks such as Gaussian process regression.EPSR