19 research outputs found
Recommended from our members
Approximate Bayesian Deep Learning for Resource-Constrained Environments
Deep learning models have shown promising results in areas including computer vision, natural language processing, speech recognition, and more. However, existing point estimation-based training methods for these models may result in predictive uncertainties that are not well calibrated, including the occurrence of confident errors. Approximate Bayesian inference methods can help address these issues in a principled way by accounting for uncertainty in model parameters. However, these methods are computationally expensive both when computing approximations to the parameter posterior and when using an approximate parameter posterior to make predictions. They can also require significantly more storage than point-estimated models.
In this thesis, we address a range of questions related to trade-offs between the quality of inference and prediction and the computational scalability of Bayesian deep learning methods. We begin by developing a framework for comprehensive evaluation of Bayesian neural network models and applying this framework to a range of existing models and inference methods. Second, we address the problem of providing flexible trade-offs between prediction quality, run time, and storage by developing and evaluating a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier. Third, we investigate the trade-offs between model sparsity and inference performance for deep neural network models using several approaches to deriving sparse model structures. Fourth, we present a framework for correcting approximate posterior predictive distributions, encouraging them to prefer high-utility decisions. Finally, we investigate the use of approximate Bayesian deep learning in object detection and present an evaluation of approaches for quantifying different facets of uncertainty related to object classes and locations
Taylor Polynomial Estimator for Estimating Frequency Moments
We present a randomized algorithm for estimating the th moment of
the frequency vector of a data stream in the general update (turnstile) model
to within a multiplicative factor of , for , with high
constant confidence. For , the algorithm uses space words. This
improves over the current bound of
words by Andoni et. al. in \cite{ako:arxiv10}. Our space upper bound matches
the lower bound of Li and Woodruff \cite{liwood:random13} for and the lower bound of Andoni et. al. \cite{anpw:icalp13}
for .Comment: Supercedes arXiv:1104.4552. Extended Abstract of this paper to appear
in Proceedings of ICALP 201
How Technology Impacts and Compares to Humans in Socially Consequential Arenas
One of the main promises of technology development is for it to be adopted by
people, organizations, societies, and governments -- incorporated into their
life, work stream, or processes. Often, this is socially beneficial as it
automates mundane tasks, frees up more time for other more important things, or
otherwise improves the lives of those who use the technology. However, these
beneficial results do not apply in every scenario and may not impact everyone
in a system the same way. Sometimes a technology is developed which produces
both benefits and inflicts some harm. These harms may come at a higher cost to
some people than others, raising the question: {\it how are benefits and harms
weighed when deciding if and how a socially consequential technology gets
developed?} The most natural way to answer this question, and in fact how
people first approach it, is to compare the new technology to what used to
exist. As such, in this work, I make comparative analyses between humans and
machines in three scenarios and seek to understand how sentiment about a
technology, performance of that technology, and the impacts of that technology
combine to influence how one decides to answer my main research question.Comment: Doctoral thesis proposal. arXiv admin note: substantial text overlap
with arXiv:2110.08396, arXiv:2108.12508, arXiv:2006.1262
Integrated High-Resolution Modeling for Operational Hydrologic Forecasting
Current advances in Earth-sensing technologies, physically-based modeling, and computational processing, offer the promise of a major revolution in hydrologic forecastingâwith profound implications for the management of water resources and protection from related disasters. However, access to the necessary capabilities for managing information from heterogeneous sources, and for its deployment in robust-enough modeling engines, remains the province of large governmental agencies. Moreover, even within this type of centralized operations, success is still challenged by the sheer computational complexity associated with overcoming uncertainty in the estimation of parameters and initial conditions in large-scale or high-resolution models.
In this dissertation we seek to facilitate the access to hydrometeorological data products from various U.S. agencies and to advanced watershed modeling tools through the implementation of a lightweight GIS-based software package. Accessible data products currently include gauge, radar, and satellite precipitation; stream discharge; distributed soil moisture and snow cover; and multi-resolution weather forecasts. Additionally, we introduce a suite of open-source methods aimed at the efficient parameterization and initialization of complex geophysical models in contexts of high uncertainty, scarce information, and limited computational resources. The developed products in this suite include: 1) model calibration based on state of the art ensemble evolutionary Pareto optimization, 2) automatic parameter estimation boosted through the incorporation of expert criteria, 3) data assimilation that hybridizes particle smoothing and variational strategies, 4) model state compression by means of optimized clustering, 5) high-dimensional stochastic approximation of watershed conditions through a novel lightweight Gaussian graphical model, and 6) simultaneous estimation of model parameters and states for hydrologic forecasting applications.
Each of these methods was tested using established distributed physically-based hydrologic modeling engines (VIC and the DHSVM) that were applied to watersheds in the U.S. of different sizesâfrom a small highly-instrumented catchment in Pennsylvania, to the basin of the Blue River in Oklahoma. A series of experiments was able to demonstrate statistically-significant improvements in the predictive accuracy of the proposed methods in contrast with traditional approaches. Taken together, these accessible and efficient tools can therefore be integrated within various model-based workflows for complex operational applications in water resources and beyond
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Online learning on the programmable dataplane
This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observationsâand argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the networkâ runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network.
To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasibleâto port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
De l'apprentissage faiblement supervisé au catalogage en ligne
Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can we derive a more generic framework than the one of supervised learning in order to learn from clutter data? This question is approached through the lens of weakly supervised learning, assuming that the bottleneck of data collection lies in annotation. We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an âoptimisticâ function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels. We also discuss the advantage to incorporate unsupervised learning techniques into our framework, in particular manifold regularization approached through diffusion techniques, for which we derived a new algorithm that scales better with input dimension then the baseline method. Finally, we switch from passive to active weakly supervised learning, introducing the âactive labelingâ framework, in which a practitioner can query weak information about chosen data. Among others, we leverage the fact that one does not need full information to access stochastic gradients and perform stochastic gradient descent.Les mathĂ©matiques appliquĂ©es et le calcul nourrissent beaucoup dâespoirs Ă la suite des succĂšs rĂ©cents de lâapprentissage supervisĂ©. Dans lâindustrie, beaucoup dâingĂ©nieurs cherchent Ă remplacer leurs anciens paradigmes de pensĂ©e par lâapprentissage machine. Ătonnamment, ces ingĂ©nieurs passent plus de temps Ă collecter, annoter et nettoyer des donnĂ©es quâĂ raffiner des modĂšles. Ce phĂ©nomĂšne motive la problĂ©matique de cette thĂšse: peut-on dĂ©finir un cadre thĂ©orique plus gĂ©nĂ©ral que lâapprentissage supervisĂ© pour apprendre grĂące Ă des donnĂ©es hĂ©tĂ©rogĂšnes? Cette question est abordĂ©e via le concept de supervision faible, faisant lâhypothĂšse que le problĂšme que posent les donnĂ©es est leur annotation. On modĂ©lise la supervision faible comme lâaccĂšs, pour une entrĂ©e donnĂ©e, non pas dâune sortie claire, mais dâun ensemble de sorties potentielles. On plaide pour lâadoption dâune perspective « optimiste » et lâapprentissage dâune fonction qui vĂ©rifie la plupart des observations. Cette perspective nous permet de dĂ©finir un principe pour lever lâambiguĂŻtĂ© des informations faibles. On discute Ă©galement de lâimportance dâincorporer des techniques sans supervision dâapprĂ©hension des donnĂ©es dâentrĂ©e dans notre thĂ©orie, en particulier de comprĂ©hension de la variĂ©tĂ© sous-jacente via des techniques de diffusion, pour lesquelles on propose un algorithme rĂ©aliste afin dâĂ©viter le flĂ©au de la dimension, Ă lâinverse de ce qui existait jusquâalors. Enfin, nous nous attaquons Ă la question de collecte active dâinformations faibles, dĂ©finissant le problĂšme de « catalogage en ligne », oĂč un intendant doit acquĂ©rir une maximum dâinformations fiables sur ses donnĂ©es sous une contrainte de budget. Entre autres, nous tirons parti du fait que pour obtenir un gradient stochastique et effectuer une descente de gradient, il nây a pas besoin de supervision totale