560 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Symbolic Synthesis of Neural Networks
Neural networks adapt very well to distributed and continuous
representations, but struggle to generalize from small amounts of data.
Symbolic systems commonly achieve data efficient generalization by exploiting
modularity to benefit from local and discrete features of a representation.
These features allow symbolic programs to be improved one module at a time and
to experience combinatorial growth in the values they can successfully process.
However, it is difficult to design a component that can be used to form
symbolic abstractions and which is adequately overparametrized to learn
arbitrary high-dimensional transformations. I present Graph-based Symbolically
Synthesized Neural Networks (G-SSNNs), a class of neural modules that operate
on representations modified with synthesized symbolic programs to include a
fixed set of local and discrete features. I demonstrate that the choice of
injected features within a G-SSNN module modulates the data efficiency and
generalization of baseline neural models, creating predictable patterns of both
heightened and curtailed generalization. By training G-SSNNs, we also derive
information about desirable semantics of symbolic programs without manual
engineering. This information is compact and amenable to abstraction, but can
also be flexibly recontextualized for other high-dimensional settings. In
future work, I will investigate data efficient generalization and the
transferability of learned symbolic representations in more complex G-SSNN
designs based on more complex classes of symbolic programs. Experimental code
and data are available at
https://github.com/shlomenu/symbolically_synthesized_networks .Comment: 8 pages, 1 figure. Minor formula correction and minor textual
revisio
Deep Statistical Models with Application to Environmental Data
When analyzing environmental data, constructing a realistic statistical model is important, not only to fully characterize the physical phenomena, but also to provide valid and useful predictions. Gaussian process models are amongst the most popular tools used for this purpose. However, many assumptions are usually made when using Gaussian processes, such as stationarity of the covariance function. There are several approaches to construct nonstationary spatial and spatio-temporal Gaussian processes, including the deformation approach. In the deformation approach, the geographical domain is warped into a new domain, on which the Gaussian process is modeled to be stationary. One of the main challenges with this approach is how to construct a deformation function that is complicated enough to adequately capture the nonstationarity in the process, but simple enough to facilitate statistical inference and prediction. In this thesis, by using ideas from deep learning, we construct deformation functions that are compositions of simple warping units. In particular, deformation functions that are composed of aligning functions and warping functions are introduced to model nonstationary and asymmetric multivariate spatial processes, while spatial and temporal warping functions are used to model nonstationary spatio-temporal processes. Similarly to the traditional deformation approach, familiar stationary models are used on the warped domain. It is shown that this new approach to model nonstationarity is computationally efficient, and that it can lead to predictions that are superior to those from stationary models. We show the utility of these models on both simulated data and real-world environmental data: ocean temperatures and surface-ice elevation. The developed warped nonstationary processes can also be used for emulation. We show that a warped, gradient-enhanced Gaussian process surrogate model can be embedded in algorithms such as importance sampling and delayed-acceptance Markov chain Monte Carlo. Our surrogate models can provide more accurate emulation than other traditional surrogate models, and can help speed up Bayesian inference in problems with exponential-family likelihoods with intractable normalizing constants, for example when analyzing satellite images using the Potts model
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
First Order Logic and Twin-Width in Tournaments
We characterise the classes of tournaments with tractable first-order model checking. For every hereditary class of tournaments T, first-order model checking either is fixed parameter tractable, or is AW[*]-hard. This dichotomy coincides with the fact that T has either bounded or unbounded twin-width, and that the growth of T is either at most exponential or at least factorial. From the model-theoretic point of view, we show that NIP classes of tournaments coincide with bounded twin-width. Twin-width is also characterised by three infinite families of obstructions: T has bounded twin-width if and only if it excludes at least one tournament from each family. This generalises results of Bonnet et al. on ordered graphs.
The key for these results is a polynomial time algorithm which takes as input a tournament T and computes a linear order < on V(T) such that the twin-width of the birelation (T, <) is at most some function of the twin-width of T. Since approximating twin-width can be done in FPT time for an ordered structure (T, <), this provides a FPT approximation of twin-width for tournaments
A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers
We prove a strong composition theorem for junta complexity and show how such
theorems can be used to generically boost the performance of property testers.
The -approximate junta complexity of a function is the
smallest integer such that is -close to a function that
depends only on variables. A strong composition theorem states that if
has large -approximate junta complexity, then has even
larger -approximate junta complexity, even for . We develop a fairly complete understanding of this behavior,
proving that the junta complexity of is characterized by that of
along with the multivariate noise sensitivity of . For the important
case of symmetric functions , we relate their multivariate noise sensitivity
to the simpler and well-studied case of univariate noise sensitivity.
We then show how strong composition theorems yield boosting algorithms for
property testers: with a strong composition theorem for any class of functions,
a large-distance tester for that class is immediately upgraded into one for
small distances. Combining our contributions yields a booster for junta
testers, and with it new implications for junta testing. This is the first
boosting-type result in property testing, and we hope that the connection to
composition theorems adds compelling motivation to the study of both topics.Comment: 44 pages, 1 figure, FOCS 202
Deployment of Deep Neural Networks on Dedicated Hardware Accelerators
Deep Neural Networks (DNNs) have established themselves as powerful tools for
a wide range of complex tasks, for example computer vision or natural language
processing. DNNs are notoriously demanding on compute resources and as a
result, dedicated hardware accelerators for all use cases are developed. Different
accelerators provide solutions from hyper scaling cloud environments for the
training of DNNs to inference devices in embedded systems. They implement
intrinsics for complex operations directly in hardware. A common example
are intrinsics for matrix multiplication. However, there exists a gap between
the ecosystems of applications for deep learning practitioners and hardware
accelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics
is still mainly defined by human hardware and software experts.
Methods to automatically utilize hardware intrinsics in DNN operators are a
subject of active research. Existing literature often works with transformationdriven
approaches, which aim to establish a sequence of program rewrites and
data-layout transformations such that the hardware intrinsic can be used to
compute the operator. However, the complexity this of task has not yet been
explored, especially for less frequently used operators like Capsule Routing. And
not only the implementation of DNN operators with intrinsics is challenging,
also their optimization on the target device is difficult. Hardware-in-the-loop
tools are often used for this problem. They use latency measurements of implementations
candidates to find the fastest one. However, specialized accelerators
can have memory and programming limitations, so that not every arithmetically
correct implementation is a valid program for the accelerator. These invalid
implementations can lead to unnecessary long the optimization time.
This work investigates the complexity of transformation-driven processes to
automatically embed hardware intrinsics into DNN operators. It is explored
with a custom, graph-based intermediate representation (IR). While operators
like Fully Connected Layers can be handled with reasonable effort, increasing
operator complexity or advanced data-layout transformation can lead to scaling issues.
Building on these insights, this work proposes a novel method to embed
hardware intrinsics into DNN operators. It is based on a dataflow analysis.
The dataflow embedding method allows the exploration of how intrinsics and
operators match without explicit transformations. From the results it can derive
the data layout and program structure necessary to compute the operator with
the intrinsic. A prototype implementation for a dedicated hardware accelerator
demonstrates state-of-the art performance for a wide range of convolutions, while
being agnostic to the data layout. For some operators in the benchmark, the
presented method can also generate alternative implementation strategies to
improve hardware utilization, resulting in a geo-mean speed-up of Ă2.813 while
reducing the memory footprint. Lastly, by curating the initial set of possible
implementations for the hardware-in-the-loop optimization, the median timeto-
solution is reduced by a factor of Ă2.40. At the same time, the possibility to
have prolonged searches due a bad initial set of implementations is reduced,
improving the optimizationâs robustness by Ă2.35
- âŠ