5,293 research outputs found
A Near-Optimal Depth-Hierarchy Theorem for Small-Depth Multilinear Circuits
We study the size blow-up that is necessary to convert an algebraic circuit
of product-depth to one of product-depth in the multilinear
setting.
We show that for every positive
there is an explicit multilinear polynomial on variables
that can be computed by a multilinear formula of product-depth and
size , but not by any multilinear circuit of product-depth and
size less than . This result is tight up to the
constant implicit in the double exponent for all
This strengthens a result of Raz and Yehudayoff (Computational Complexity
2009) who prove a quasipolynomial separation for constant-depth multilinear
circuits, and a result of Kayal, Nair and Saha (STACS 2016) who give an
exponential separation in the case
Our separating examples may be viewed as algebraic analogues of variants of
the Graph Reachability problem studied by Chen, Oliveira, Servedio and Tan
(STOC 2016), who used them to prove lower bounds for constant-depth Boolean
circuits
Functional Lower Bounds for Restricted Arithmetic Circuits of Depth Four
Recently, Forbes, Kumar and Saptharishi [CCC, 2016] proved that there exists
an explicit -variate and degree polynomial such
that if any depth four circuit of bounded formal degree which computes
a polynomial of bounded individual degree , that is functionally
equivalent to , then must have size .
The motivation for their work comes from Boolean Circuit Complexity. Based on
a characterization for circuits by Yao [FOCS, 1985] and Beigel and
Tarui [CC, 1994], Forbes, Kumar and Saptharishi [CCC, 2016] observed that
functions in can also be computed by algebraic
circuits (i.e., circuits of the form -- sums
of powers of polynomials) of size. Thus they argued that a
"functional" lower bound for an explicit
polynomial against circuits would imply a
lower bound for the "corresponding Boolean function" of against non-uniform
. In their work, they ask if their lower bound be extended to
circuits.
In this paper, for large integers and such that , we show that any circuit of
bounded individual degree at most that
functionally computes Iterated Matrix Multiplication polynomial
() over must have size . Since Iterated
Matrix Multiplication over is functionally in
, improvement of the afore mentioned lower bound to hold for
quasipolynomially large values of individual degree would imply a fine-grained
separation of from
Lower Bounds for Depth Three Arithmetic Circuits with Small Bottom Fanin
Shpilka and Wigderson (CCC 99) had posed the problem of proving exponential lower bounds for (nonhomogeneous) depth three arithmetic circuits with bounded bottom fanin over a field F of characteristic zero. We resolve this problem by proving a N^(Omega(d/t)) lower bound for (nonhomogeneous) depth three arithmetic circuits with bottom fanin at most t computing an explicit N-variate polynomial of degree d over F.
Meanwhile, Nisan and Wigderson (CC 97) had posed the problem of proving superpolynomial lower bounds for homogeneous depth five arithmetic circuits. Over fields of characteristic zero, we show a lower bound of N^(Omega(sqrt(d))) for homogeneous depth five circuits (resp. also for depth three circuits) with bottom fanin at most N^(u), for any fixed u < 1. This resolves the problem posed by Nisan and Wigderson only partially because of the added restriction on the bottom fanin (a general homogeneous depth five circuit has bottom fanin at most N)
Format Abstraction for Sparse Tensor Algebra Compilers
This paper shows how to build a sparse tensor algebra compiler that is
agnostic to tensor formats (data layouts). We develop an interface that
describes formats in terms of their capabilities and properties, and show how
to build a modular code generator where new formats can be added as plugins. We
then describe six implementations of the interface that compose to form the
dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants
thereof. With these implementations at hand, our code generator can generate
code to compute any tensor algebra expression on any combination of the
aforementioned formats.
To demonstrate our technique, we have implemented it in the taco tensor
algebra compiler. Our modular code generator design makes it simple to add
support for new tensor formats, and the performance of the generated code is
competitive with hand-optimized implementations. Furthermore, by extending taco
to support a wider range of formats specialized for different application and
data characteristics, we can improve end-user application performance. For
example, if input data is provided in the COO format, our technique allows
computing a single matrix-vector multiplication directly with the data in COO,
which is up to 3.6 faster than by first converting the data to CSR.Comment: Presented at OOPSLA 201
- …