3,835 research outputs found
Meta-learning algorithms and applications
Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples.
Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number.
Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation.
More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
Classical and quantum algorithms for scaling problems
This thesis is concerned with scaling problems, which have a plethora of connections to different areas of mathematics, physics and computer science. Although many structural aspects of these problems are understood by now, we only know how to solve them efficiently in special cases.We give new algorithms for non-commutative scaling problems with complexity guarantees that match the prior state of the art. To this end, we extend the well-known (self-concordance based) interior-point method (IPM) framework to Riemannian manifolds, motivated by its success in the commutative setting. Moreover, the IPM framework does not obviously suffer from the same obstructions to efficiency as previous methods. It also yields the first high-precision algorithms for other natural geometric problems in non-positive curvature.For the (commutative) problems of matrix scaling and balancing, we show that quantum algorithms can outperform the (already very efficient) state-of-the-art classical algorithms. Their time complexity can be sublinear in the input size; in certain parameter regimes they are also optimal, whereas in others we show no quantum speedup over the classical methods is possible. Along the way, we provide improvements over the long-standing state of the art for searching for all marked elements in a list, and computing the sum of a list of numbers.We identify a new application in the context of tensor networks for quantum many-body physics. We define a computable canonical form for uniform projected entangled pair states (as the solution to a scaling problem), circumventing previously known undecidability results. We also show, by characterizing the invariant polynomials, that the canonical form is determined by evaluating the tensor network contractions on networks of bounded size
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
When Deep Learning Meets Polyhedral Theory: A Survey
In the past decade, deep learning became the prevalent methodology for
predictive modeling thanks to the remarkable accuracy of deep neural networks
in tasks such as computer vision and natural language processing. Meanwhile,
the structure of neural networks converged back to simpler representations
based on piecewise constant and piecewise linear functions such as the
Rectified Linear Unit (ReLU), which became the most commonly used type of
activation function in neural networks. That made certain types of network
structure \unicode{x2014}such as the typical fully-connected feedforward
neural network\unicode{x2014} amenable to analysis through polyhedral theory
and to the application of methodologies such as Linear Programming (LP) and
Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this
paper, we survey the main topics emerging from this fast-paced area of work,
which bring a fresh perspective to understanding neural networks in more detail
as well as to applying linear optimization techniques to train, verify, and
reduce the size of such networks
Variants of Pseudo-deterministic Algorithms and Duality in TFNP
We introduce a new notion of ``faux-deterministic'' algorithms for search problems in query complexity. Roughly, for a search problem \cS, a faux-deterministic algorithm is a probability distribution over deterministic algorithms such that no computationally bounded adversary making black-box queries to a sampled algorithm can find an input on which fails to solve \cS ((x, A(x))\notin \cS). Faux-deterministic algorithms are a relaxation of \emph{pseudo-deterministic} algorithms, which are randomized algorithms with the guarantee that for any given input , the algorithm outputs a unique output with high probability. Pseudo-deterministic algorithms are statistically indistinguishable from deterministic algorithms, while faux-deterministic algorithms relax this statistical indistinguishability to computational indistinguishability.
We prove that in the query model, every verifiable search problem that has a randomized algorithm also has a faux-deterministic algorithm. By considering the pseudo-deterministic lower bound of Goldwasser et al. \cite{goldwasser_et_al:LIPIcs.CCC.2021.36}, we immediately prove an exponential gap between pseudo-deterministic and faux-deterministic complexities in query complexity. We additionally show that our faux-deterministic algorithm is also secure against quantum adversaries that can make black-box queries in superposition.
We highlight two reasons to study faux-deterministic algorithms. First, for practical purposes, one can use a faux-deterministic algorithm instead of pseudo-deterministic algorithms in most cases where the latter is required. Second, since efficient faux-deterministic algorithms exist even when pseudo-deterministic ones do not, their existence demonstrates a barrier to proving pseudo-deterministic lower bounds: Lower bounds on pseudo-determinism must distinguish pseudo-determinism from faux-determinism.
Finally, changing our perspective to the adversaries' viewpoint, we introduce a notion of ``dual problem'' \cS^{*} for search problems \cS. In the dual problem \cS^*, the input is an algorithm purporting to solve \cS, and our goal is to find an adverse input on which fails to solve \cS. We discuss several properties in the query and Turing machine model that show the new problem \cS^* is analogous to a dual for \cS
- …