Simple Reductions from Formula-SAT to Pattern Matching on Labeled Graphs
  and Subtree Isomorphism by Gibney, Daniel et al.
ar
X
iv
:2
00
8.
11
78
6v
1 
 [c
s.C
C]
  2
6 A
ug
 20
20
Simple Reductions from Formula-SAT to Pattern Matching on
Labeled Graphs and Subtree Isomorphism
Daniel Gibney∗ Gary Hoppenworth† Sharma V. Thankachan‡
Abstract
The CNF formula satisfiability problem (CNF-SAT) has been reduced to many fundamental
problems in P to prove tight lower bounds under the Strong Exponential Time Hypothesis
(SETH). Recently, the works of Abboud, Hansen, Vassilevska W. and Williams (STOC16), and
later, Abboud and Bringmann (ICALP18) have proposed basing lower bounds on the hardness of
general boolean formula satisfiability (Formula-SAT). Reductions from Formula-SAT have two
advantages over the usual reductions from CNF-SAT: (1) conjectures on the hardness of Formula-
SAT are arguably much more plausible than those of CNF-SAT, and (2) these reductions give
consequences even for logarithmic improvements in a problems upper bounds.
Here we give tight reductions from Formula-SAT to two more problems: pattern matching on
labeled graphs (PMLG) and subtree isomorphism. Previous reductions from Formula-SAT were
to sequence alignment problems such as Edit Distance, LCS, and Frechet Distance and required
some technical work. This paper uses ideas similar to those used previously, but in a decidedly
simpler setting, helping to illustrate the most salient features of the underlying techniques.
∗Dept. of CS, University of Central Florida, Orlando, USA. e-mail: daniel.j.gibney@gmail.com
†Dept. of CS, University of Central Florida, Orlando, USA. e-mail: gary.hoppenworth@gmail.com
‡Dept. of CS, University of Central Florida, Orlando, USA. e-mail: sharma.thankachan@ucf.edu
1
1 Introduction and Related Work
The Strong Exponential Time Hypothesis (SETH) has proven to be a powerful tool in establish-
ing conditional lower bounds for many problems with known polynomial-time solutions. However,
recent work by Abboud, Hansen, Vassilevska W., and Williams [3], as well as Abboud and Bring-
mann [2] has sought to use the hardness of general Formula-SAT problems as the basis for fine-
grained conditional lower bounds, rather than CNF-SAT and SETH. Since general Formula-SAT
contains within it all CNF-SAT instances, Formula-SAT is at least as hard as CNF-SAT. Addi-
tionally, when basing conditional lower bounds on Formula-SAT rather than CNF-SAT, the same
algorithmic breakthroughs that previously would have violated SETH, now have far more remark-
able consequences (see Section 1.2 for examples). This makes it plausible that conjectures based on
the hardness of Formula-SAT are more likely to hold than those based on the hardness of CNF-SAT.
Aside from a plausible increase in the robustness of the conjectures, using Formula-SAT as a
starting point has the advantage of allowing for tighter hardness results. Previous lower bounds
based on SETH have been effective in establishing results of the form: an algorithm running in time
O(nc−ε) for some ε > 0, where the best-known solution has time complexity O˜(nc) would violate
SETH. Despite this success, SETH has proven less effective at establishing tighter fine-grained
hardness results regarding how many logarithmic-factors can be shaved. In fact, the impossibility
of proving such a hardness result via fine-grained reductions from CNF-SAT was proven in [2].
Overcoming this by using Formula-SAT as a starting point, in [3] conditional lower bounds of
this form were established for Edit Distance and Longest Common Subsequence (LCS). In [2], the
results on LCS were further extended to show that an O(n2/ log7+ε n) time solution for LCS would
imply major breakthroughs in circuit complexity. As a final example, work in [28] uses reductions
from Formula-SAT to analyze which regular expression matching problems can have super-polylog
factors shaved from their time complexity, and which cannot.
In this work, we will use Formula-SAT to establish hardness results similar to those listed above,
but for two additional fundamental problems, Pattern Matching on Labeled Graphs (PMLG) and
Subtree Isomorphism. We describe these problems next.
Pattern Matching On Labeled Graphs. (PMLG) Given an alphabet Σ, a labeled graph G is
a triplet (V,E,L), where (V,E) corresponds to the vertices and edges of a graph, and L : V → Σ+
is a function that defines a nonempty string (i.e., label) over Σ to each vertex in G. For any string S,
we use S[..ℓ] to denote its prefix ending at ℓ and S[ℓ..] to denote its suffix starting at ℓ. We say that a
pattern P occurs in G if there is a path v1, v2, . . . , vm in G such that L(v1)[ℓ..]◦L(v2)◦· · ·◦L(vm)[..ℓ
′]
equals P for some ℓ, ℓ′. Given a labeled graph G and a pattern P , the PMLG problem is to decide
if there exists an occurrence of P in G
The PMLG problem began being intensely studied roughly thirty years ago in the context
of alignment of strings (equivalent to approximate matching under edits, mismatches, etc.) in
hypertext. This was initiated by Manber and Wu [19] and underwent several improvements [4, 5,
21, 22]. In the case where changes are allowed in the pattern, but not in the graph, the best-known
algorithm runs in time O(|V |+ |E||P |), matching the time complexity of the dynamic programming
solution of the exact problem, and is by Rautiainen and Marschall [24]. In the case where changes
are allowed in the graph as well, the problem is NP-complete [5], even for binary alphabet [15].
The work by Equi et al. in [11] established the SETH based lower bounds for exact matching.
Subtree Isomorphism. Given two trees T1 and T2, is T1 contained in T2? This problem has
2
been the subject of extensive study [9, 17, 18, 25, 31, 33], much of this research dating back several
decades. For general trees, both with at most n vertices, the currently best known solution has a
time bound that is O(nω), where ω is the exponent on fast-matrix multiplication [31]; for rooted,
constant maximum degree trees it is O(n2/ log n) [17]; and, for ordered trees it is O(n log n) [10].
Here we will be considering rooted trees with constant maximum degree. In terms of lower bounds,
SETH based quadratic lower bounds for this version of the problem have been established in [1],
even for binary rooted trees.
Road Map. We will first describe the Formula-SAT problem and deMorgan Formulas in more
detail. Following this, we will state our results for PMLG and Subtree Isomorphism in terms of
its implications for solving Formula-SAT, along with the resulting corollaries. Section 2 provides
the reduction from Formula-SAT to PMLG. The reduction to Subtree Isomorphism is in Section
3. Finally, in Section 4 we discuss the similar themes and techniques that appear in both of these
reductions.
1.1 Formula-SAT
deMorgan Formulas. For our purposes, we define a deMorgan formula over n Boolean input
variables as a rooted binary tree where each leaf node represents an input variable or its negation,
and every internal node represents a logical operator from the set {∧,∨}. Leaf nodes will be called
input gates, and internal nodes will be called AND/OR gates. For a given bit assignment x, we
define F (x) as the binary value output at the root of F when the input bits are propagated from
the leaves to the root of F . The size of the formula, which we will denote as s, is defined as the
number of leaves in the tree.
Problem 1 (Formula-SAT). Given a deMorgan formula F of size s over n inputs, does there exist
an input x ∈ {0, 1}n such that F (x) = 1?
The set of all Formula-SAT instances obviously contains within it all CNF-SAT instances. Un-
surprisingly, due to its generality, it appears harder to derive efficient solutions for Formula-SAT.
For CNF-SAT there exists ever-improving upper bounds [6, 12, 20, 23, 26, 29]. There also exists
upper bounds for more general circuits such as ours, however, these work through restricting some
parameter of the circuit, often some combination of the size, depth, and type of gates used within
it (see for example [7, 13, 14, 27, 30, 32]).
1.2 Our Results
Our reduction will create an instance of PMLG (or Subtree Isomorphism) from a given instance of
Formula-SAT. In doing so, we make explicit the roles that the size of the circuit s and the number
of inputs n play in determining the size of the resulting instance.
Theorem 1. A Formula-SAT instance of size s on n inputs can be reduced to an instance of PMLG
over a binary alphabet with a graph G = (V,E) and pattern P such that |P | is of size O(2n/2 · s)
and |E| is of size O(2n/2 · s2) in O(|E|) time, where G is a DAG with maximum total degree1 three.
1Total degree is in-degree plus out-degree.
3
Similarly, for Subtree Isomorphism we have the following theorem.
Theorem 2. A Formula-SAT instance of size s on n inputs can be reduced to an instance of Subtree
Isomorphism on two binary trees T1 and T2, where the size of T1 is O(2
n/2 · s), and the size of T2
is O(2n/2 · s2) in O(|T2|) time.
Combining Theorems 1 and 2 with observations made by Abboud et al. in [3] (and restated in
Appendix A), we obtain the following ‘breakthrough’ implications of a strongly subquadratic time
algorithm for PMLG or Subtree Isomorphism. Proofs are deferred to Appendix A.
Corollary 1. The existence of a strongly subquadratic time algorithm for PMLG (or Subtree Iso-
morphism) would imply the class ENP (1) does not have non-uniform 2o(n)-size Boolean formulas
and (2) does not have non-uniform o(n)-depth circuits of bounded fan-in. It also implies that
NTIME[2O(n)] is not in non-uniform NC.
The second corollary gives the consequences of being able to shave arbitrarily many logarithmic
factors from the quadratic time complexity.
Corollary 2. If PMLG (or Subtree Isomorphism) can be solved in time O( |E||P |logc |E|) or O(
|E||P |
logc |P |)
( O( |T1||T2|logc |T1|) or O(
|T1||T2|
logc |T2|
) resp.) for all c = Θ(1), then NTIME[2O(n)] does not have non-uniform
polynomial-size log-depth circuits.
In fact, we can give a particular constant c for which shaving a logc n factor would yield sur-
prising new results in complexity theory. The following log-sensitive lower bounds leave a huge gap
from the best known upper bounds; we present these corollaries purely for instructive purposes.
Hardness of Shaving Log Factors. We work under the Word-RAM model and limit the set
of constant-time primitive operations to those operations which are robust to change in word size.
Specifically, suppose we are given a word size of w = Θ(log n) and an operation that can be
performed in O(1) time. We stipulate that we must be able to simulate this operation on words of
size W = Θ(2w) in time n1+o(1). This is a reasonable assumption that is satisfied by many constant
time operations such as addition, subtraction, multiplication, and division with remainder. See [2]
for a detailed discussion.
The following hypothesis was suggested by Abboud and Bringmann in [2]. It reflects the fact
that the best known algorithmic solutions to Formula-SAT2 fail to provide a time complexity better
than the na¨ıve solution on formulas of size s = n3+Ω(1).
Hypothesis 1 ([2]). There is no algorithm that can solve SAT on deMorgan formulas of size
s = n3+Ω(1) in O(2
n
nε ) time for some ε > 0 in the Word-RAM model.
Corollary 3. Hypothesis 1 is false if PMLG (respectively Subtree Isomorphism) can be solved in
time O
(
|E||P |
log10+ε |E|
)
or O
(
|E||P |
log10+ε |P |
)
, (respectively O
(
|T1||T2|
log10+ε |T1|
)
or O
(
|T1||T2|
log10+ε |T2|
)
) for any ε > 0.
Proof. We show the proof for PMLG; the proof for Subtree Isomorphism is identical. By Theorem
1, an O( |E||P |
log10+ε |E|
) algorithm for PMLG can be converted to yield an algorithm running in n1+o(1) ·
2As observed by Williams in [34], for deMorgan formulas of size n3−o(1) there exists a randomized 2n−n
Ω(1)
time,
zero error algorithm which can be obtained by applying results from [8] and [16].
4
(2n/2·s2)(2n/2s)
log10+ε(2n/2·s2)
= O
(
2n·s3
n9+ε
)
time for Formula-SAT (note the n1+o(1) factor introduced when moving
from a word size of Θ(log n) to Θ(n)). If we choose s = n3+ε/6 then this yields an algorithm for
Formula-SAT of time O( 2
n
nε/2
), and Hypothesis 1 is false.
Again thanks to results highlighted by Abboud et al. in [3], we can also say the following about
shaving a constant number of logarithmic factors from the quadratic time complexity. The proof
is deferred to Appendix A.
Corollary 4. ENP cannot be computed by non-uniform formulas of cubic size if PMLG (respec-
tively Subtree Isomorphism) can be solved in time O
(
|E||P |
log20+ε |E|
)
or O
(
|E||P |
log20+ε |P |
)
(respectively
O
(
|T1||T2|
log20+ε |T1|
)
or O
(
|T1||T2|
log20+ε |T2|
)
) for any ε > 0.
The same hardness results for PMLG apply for several more specific types of graphs (details will
be presented in the full version of this paper). These include when the graph G is a deterministic
DAG (at most one edge leaves a vertex with the same leading character on an edge label) of total
degree at most 3, and the case when G is a directed or undirected planar graph of degree at most
3.
2 Reduction from Formula-SAT to PMLG
2.1 Technical Overview
Our reduction from Formula-SAT to PMLG uses an intermediate problem called Formula-Pair.
Definition 1 (Formula-Pair). Given a deMorgan Formula F = F (x1, . . . , xm, y1, . . . , ym) of size
2m where each input is used exactly once, and two sets A,B ⊆ {0, 1}m each of size N , does there
exist a ∈ A and b ∈ B such that F (a, b) = F (a1, . . . , am, b1, . . . , bm) = 1?
The role Formula-Pair plays in our reduction is analogous to the role of the Orthogonal Vectors
Problem in many SETH reductions. It was proven in [2] that an instance of Formula-SAT on a
formula of size s over n inputs can be reduced to an instance of Formula-Pair on two sets of size
N = O(2n/2) and a formula of size O(s) in linear time (in particular, they reduce from a harder
problem they call F1-Formula-SAT). Note that we may assume that F contains no input gates with
negated binary variables, since if variable xi is negated in F , we can flip bit ai for all a ∈ A.
We begin our reduction from Formula-Pair to PMLG by considering a formula F and some
input bit assignments a ∈ A and b ∈ B. We then construct a pattern P and labeled graph G such
that P occurs in G if and only if together a and b satisfy F . In this step, we must ensure that our
construction of P only relies on the input bit assignments of a, and our construction of G only relies
on the input bit assignments of b. This allows us to create patterns P1, P2, . . . , PN corresponding
to the N bit assignments in A, and graphs G1, G2, . . . , GN corresponding to the N bit assignments
in B. Then we will have that Pi occurs in Gj if and only if F (a, b) = 1, where a ∈ A is the bit
assignment corresponding to Pi, and b ∈ B is the bit assignment corresponding to Gj . Finally, we
combine these patterns and graphs into a product pattern P and a product graph G such that P
occurs in G if and only if some Pi occurs in some Gj . This will complete the reduction.
5
2.2 Reduction
Given a deMorgan formula F and a complete assignment of input bits (a, b) where a ∈ A and b ∈ B,
we will construct a corresponding pattern P and labeled DAG G over alphabet {0, 1, $} such that P
occurs in G if and only if the output of F is 1 on input (a, b). This pattern and graph will be built
recursively, starting with the input gates as a base case. For a gate g = (g1 ∗ g2) where ∗ ∈ {∨,∧},
we will construct a corresponding pattern and graph for gate g by merging the patterns and graphs
of subgates g1 and g2. At each step in this process, the pattern corresponding to gate g occurs in
the graph corresponding to gate g if and only if g evaluates to 1 on input (a, b).
Invariants. We will maintain the following invariants during this recursive procedure. Let g be a
gate of F with height h, and let P and G be the pattern and graph corresponding to gate g in our
construction.
1. Graph G will have a designated source vertex and sink vertex, both with label “1”. Every
maximal path in G will be of length |P | and start and end at the source and sink vertices of
G respectively.
2. The construction of pattern P is independent of the choice of bit assignment b ∈ B, and the
construction of graph G is independent of the choice of bit assignment a ∈ A.
3. Pattern P occurs in G if and only if g has output 1 on input (a, b).
Observe that by the first invariant, every occurrence of pattern P in graph G will start at the
source vertex of G and end at the sink vertex of G. If this is the case, we will say that G matches
P . We will also refer to the designated source and sink vertices of G as the start and end vertices
of G.
a.
1
G1
G2
1
b.
1
G1
U(|P2|)
1
U(|P1|)
G2
c.
1
0
...
0
1
1
...
1
Figure 1: From left to right: the graph constructed for gate g = (g1 ∧ g2), the graph constructed
for gate g = (g1 ∨ g2), and the Universal Subgraph U(x). Note that Universal Subgraph U(x) has
a series of x− 2 vertex pairs labeled 0 and 1, so that its maximal path length is x.
6
Input Gate. Each input gate g in F takes as input a binary variable z. We will design a graph
G and pattern P such that G matches P if and only if z had value 1 in bit assignment (a, b), and
hence g evaluates to 1. Our construction depends on whether z corresponds to an input bit in a or
b.
• Case 1. z corresponds to some ai ∈ a. We let P := 1ai1 and G be a path of length
three with all vertices labeled 1.
• Case 2. z corresponds to some bi ∈ b. We let P := 111 and G be a path of length three
with the first and last vertex labeled 1 and the middle vertex labeled bi.
The start vertex of G will be the first vertex in the path, and the end vertex of G will be the
third (last) vertex in the path. Then our graph G matches pattern P if and only if z = 1 and thus
the input gate evaluates to true. Additionally, the construction of P does not depend on b and the
construction of G does not depend on a. All invariants are satisfied.
AND Gate. Given a gate g = (g1 ∧ g2) and the graphs and patterns (G1, P1) and (G2, P2)
corresponding to gates g1 and g2 respectively, we must construct a product graph G and pattern P
such that G matches P if and only if G1 matches P1 and G2 matches P2. This is done rather easily.
Let P := 1P1P21. Now let our product graph G be defined as in Figure 1.a. Our start vertex is
labeled 1 and has an outgoing edge to the start vertex of subgraph G1. The end vertex of G1 in
turn has an outgoing edge to start vertex of subgraph G2, whose own end vertex has an outgoing
edge to the final vertex of G. We now verify all invariants are satisfied.
• Invariant 1. We assume that every maximal path in G1 (respectively G2) is of length |P1|
(respectively |P2|). Then by the construction of P and G, every maximal path in G is of
length |P |. The invariant is maintained.
• Invariant 2. Assuming that the construction of P1 and P2 is independent of b, and the
construct of G1 and G2 is independent of a, it follows that the construction of pattern P
is independent of bit assignment b, and the construction of graph G is independent of bit
assignment a.
• Invariant 3. Since every occurrence of P in G starts at the start vertex of G and ends at
the end vertex, we must conclude that P occurs in G if and only if P1 occurs in G1 and P2
occurs in G2. Then by our invariant P occurs in G if and only if g evaluates to 1 on input
(a, b). The invariant is preserved.
OR Gate. Given a gate g = (g1 ∨ g2) and the graphs and patterns (G1, P1) and (G2, P2) corre-
sponding to gates g1 and g2 respectively, we must construct a product graph G and pattern P such
that G matches P if and only if G1 matches P1 or G2 matches P2. As with our AND gate, we let
P := 1P1P21. Our product graph G (see Figure 1.b) splits into two branches. One branch checks
if G1 matches P1 and ignores P2, while the other branch checks if G2 matches P2 and ignores P1.
We are able to ignore P2 (respectively P1) by constructing a ‘universal’ subgraph that matches all
binary strings that start and end with 1 and are of length |P2| (respectively |P1|). We let U(x)
denote the universal subgraph for length x, and we depict our construction of U(x) in Figure 1.c.
Observe that graphs U(|P1|) and U(|P2|) match P1 and P2 respectively. We now check that all
invariants are satisfied.
7
• Invariant 1. A similar argument as in the AND gate shows that every maximal path in G is
of length |P | and passes through the start and end vertices of G. The invariant is preserved.
• Invariant 2. Pattern P is independent of bit assignment b by a similar argument as with the
AND gate construction. However, for our graph G, we must verify that subgraphs U(|P1|)
and U(|P2|) of G do not depend on bit assignment a. This will follow from proving that
the lengths of patterns P1 and P2 do not depend on the bit assignment a. Note that in
each of the input, AND, and OR gate constructions, the length of the constructed pattern is
the same regardless of the bit assignment a. Thus we conclude that U(|P1|) and U(|P2|) are
independent of the bit assignment a, and therefore the construction of graph G is independent
of the bit assignment a.
• Invariant 3. Since every occurrence of pattern P starts at the start vertex of G and ends
at the end vertex, it is immediate that G matches P if and only if G1 matches P1 or G2
matches P2. It immediately follows from our invariant that G matches P if and only if gate
g = (g1 ∨ g2) evaluates to 1 on input (a, b).
2.3 Completing the Reduction
Now corresponding to our formula F of size s and a complete assignment of input bits (a, b), we can
build a pattern P and a graph G such that G matches P if and only if assignment (a, b) satisfies
F . Note that we only add a constant number of symbols to our pattern P for each gate in F , and
there are fewer than 2s gates in F , so |P | = O(s). On the other hand, each OR gate in F can
contribute O(|P |) vertices and edges to our final graph G. It follows that G is of size O(s2).
Using our construction, for every a ∈ A we may construct a corresponding pattern P , and
for every b ∈ B we may construct a corresponding graph G. We will denote these patterns and
graphs by P1, P2, . . . , PN and G1, G2, . . . , GN respectively. Note that each pattern Pj makes no
assumptions on the bit assignment b, and graph Gi makes no assumptions on the bit assignment a.
It follows that Gi matches Pj if and only if together the corresponding bit assignments a ∈ A and
b ∈ B satisfy F .
Next, we construct a final graph G and pattern P such that P occurs in G if and only if some
Gi matches some Pj . This will complete our reduction. We define our final pattern P as follows:
P := $$P1$P2$ · · · $PN$$. The structure of our final graph G is similar to the final graph presented
in [11]. We present this graph in Figure 2 and briefly explain the intuition behind it. Let µ = |Pi|
for any i. Then subgraph U(µ) will match any subpattern Pi in P . The graph G uses U(µ) to
match the subpatterns Pi in P that do not match with any Gj . Note that since pattern P has a
prefix of two $ symbols and a suffix of two $ symbols, P is forced to pass through the second row
of G. More specifically, the first row of G alone cannot match the $$ suffix of P , and the third
row of G alone cannot match the $$ prefix of P . Then it can be seen that P occurs in G only
if P passes through the second row of G, and hence some subgraph Gi matches some subpattern
Pj . Then by construction, P occurs in G if and only if there exists a ∈ A and b ∈ B such that
F (a, b) = 1. Furthermore, our final graph is a DAG of size O(N · s2) and our final pattern P is of
length O(N · s). This completes our reduction from Formula-SAT to PMLG on DAGs.
8
$ U(µ)
1
$ $ U(µ)
N − 1
$ $ U(µ)
2N − 2
$
$ $ $
$ $
$ $
G1 $ GN $
$ $
U(µ)
1
$ $ U(µ)
N
$ $ U(µ)
2N − 2
$
$ $ $
Figure 2: Our final graph G. Here µ = |Pi|.
3 Reduction from Formula-SAT to Subtree Isomorphism
3.1 Technical Overview
We begin our reduction from Formula-Pair to Subtree Isomorphism by considering a formula F and
some input bit assignments a ∈ A and b ∈ B. We then construct trees Ta and Tb such that Ta is
contained in Tb if and only if together a and b satisfy F . In this step it is important that we ensure
that our construction of Ta only relies on the input bit assignments of a, and our construction of
Tb only relies on the input bit assignments of b. This allows us to create N Ta trees corresponding
to the N bit assignments a in A, and N Tb trees corresponding to the N bit assignments b in B.
Then we will have that some Ta tree is contained in some Tb tree if and only if the corresponding
bit assignments a ∈ A and b ∈ B satisfy F (a, b) = 1. Finally, we combine these trees into two final
trees TA and TB such that TA is contained in TB if and only if some Ta is contained in some Tb.
This will complete the reduction.
3.2 Reduction
Given a deMorgan formula F and a complete assignment of input bits (a, b) where a ∈ A and b ∈ B,
we will construct the corresponding rooted trees Ta and Tb such that Ta is contained in Tb if and
only if the output of F (a, b) = 1. These trees will be constructed recursively, starting with the
input gates of F as a base case. For a gate g = (g1 ∗ g2) where ∗ ∈ {∨,∧}, we will construct the
corresponding trees T ga and T
g
b for gate g by merging the trees of subgates g1 and g2. At each step
in this process, T ga will be contained in T
g
b if and only if gate g has output 1 on input (a, b).
Invariants. We will maintain the following invariants throughout our construction. Let g be a
gate of F with height h.
1. The height of T ga is equal to the height of T
g
b and is at most 4h.
9
2. The construction of T ga is independent of the choice of bit assignment b ∈ B, and the con-
struction of T gb is independent of the choice of bit assignment a ∈ A.
3. Tree T ga is contained in tree T
g
b if and only if gate g has output 1 on input (a, b).
Input T ga T
g
b
ai = 0
va vb
ai = 1
va vb
bj = 0
va vb
bj = 1
va vb
Figure 3: The trees T ga and T
g
b corresponding
to input gate g = ai or g = bj.
v
0
a
1
v
1
a
2
v
3
a
T
1
a
v
4
a
T
2
a
1
v
2
a
2
v
0
b
1
v
1
b
2
v
3
b
T
1
b
v
4
b
T
2
b
1
v
2
b
2
Figure 4: The trees T ga (top) and T
g
b (bottom)
corresponding to AND gate g = (g1 ∧ g2).
Input Gate. Given an input gate g corresponding to a bit value ai ∈ a (respectively, a bit value
bj ∈ b), we will construct trees T
g
a and T
g
b so that T
g
a is contained in T
g
b if and only if ai = 1
(respectively, bj = 1). We construct T
g
a and T
g
b as in Figure 3. These trees are rooted at vertices
va and vb respectively. We define input gates of F to have a height of one, so the trees in Figure 3
satisfy the first invariant. The remaining two invariants can be verified by examining every case of
Figure 3.
AND Gate. Given an input gate g = (g1 ∧ g2), and the trees T
1
a , T
1
b and T
2
a , T
2
b corresponding to
gates g1 and g2 respectively, we wish to construct trees T
g
a and T
g
b so that T
g
a is contained in T
g
b if
and only if gate g has output 1 on input (a, b). By our third invariant it suffices to ensure that T ga
is contained in T gb if and only if T
1
a is contained in T
1
b AND T
2
a is contained in T
2
b . We construct
trees T ga and T
g
b as in Figure 4. The trees are rooted at vertices v
0
a and v
0
b respectively. We now
verify that all invariants are satisfied.
10
v
0
a
1
v
1
a
2
v
3
a
T
1
a
v
4
a
T
2
a
1
v
2
a
2
v
0
b
1
v
1
b
2
v
4
b
T
1
b
v
2
b
v
5
b
T
2
b
1
2
v
3
b
v
6
b
Ug
2
2
Figure 5: The trees T ga (left) and T
g
b (right) corresponding to OR gate g = (g1 ∨ g2).
• Invariant 1. By our inductive hypothesis tree T 1a has the same height as T
1
b and T
2
a has the
same height as T 2b , so it follows from our construction that T
g
a has the same height as T
g
b .
Now to see why the height of these trees is at most 4h, note that subtrees T 1a , T
1
b , T
2
a , T
2
b have
height at most 4(h − 1), and so trees T ga and T
g
b have height at most 4(h − 1) + 4 = 4h.
• Invariant 2. We assume that the construction of trees T 1a and T
2
a is independent of b, and
the trees T 1b and T
2
b are independent of a. Then it can be easily verified that tree T
g
a does
not depend on b, and tree T gb does not depend on a.
• Invariant 3. We must show that tree T ga is contained in tree T
g
b if and only if g evaluates to 1
on bit assignment (a, b). By our inductive hypothesis, it suffices to show that T ga is contained
in T gb if and only if T
1
a is contained in T
1
b AND T
2
a is contained in T
2
b . The ‘if’ direction is
immediate from our construction: just map vertex via in T
g
a to vertex vib in T
g
a for i ∈ [0, 4],
and map trees T 1a and T
1
b to subtrees of T
2
a and T
2
b respectively.
For the ‘only if’ direction we must prove that subtree T 1a can only map to a subtree of T
1
b ,
and subtree T 2a can only map to a subtree of T
2
b . First note that since trees T
g
a and T
g
b have
the same height, every isomorphism between T ga and a subtree T
g
b must map the root vertex
v0a of T
g
a to the root vertex v0b of T
g
b . Now suppose T
1
a is mapped to T
2
b in some isomorphism
between T ga and a subtree of T
g
b . Then vertex v
3
a would be mapped to vertex v
4
b , and the path
of length two hanging off v3a would have nowhere to map to. It immediately follows that in
every valid subtree isomorphism, T 1a is mapped to T
1
b , and T
2
a is mapped to T
2
b . Then T
g
a is
contained in T gb if and only if T
1
a is contained in T
1
b and T
2
a is contained in T
2
b .
OR Gate. Given an input gate g = (g1 ∨ g2), and the trees T
1
a , T
1
b and T
2
a , T
2
b corresponding to
gates g1 and g2 respectively, we will construct trees T
g
a and T
g
b so that T
g
a is contained in T
g
b if and
only if T 1a is contained in T
1
b OR T
2
a is contained in T
2
b . We construct trees T
g
a and T
g
b as in Figure
5. These trees are rooted at vertices v0a and v
0
b respectively. Tree T
g
b contains a subtree Ug, which
we call a universal subtree. We design Ug so that it contains both tree T
1
a and tree T
2
a for every bit
assignment a. This will allow either T 1a or T
2
a to match with Ug, thus achieving the OR gate logic.
We now construct our universal subtree Ug. First, observe that for any gate g and any two bit
assignments a, a′ ∈ A, the only difference between trees T ga and T
g
a′ is in the input gate subtrees.
11
1...
Ta1
. . .
. . .
N
...
TaN
...
TaN
. . .
. . .
2x
...
TaN
1
...
U
. . .
. . .
2x − 1
...
U
2x
1
Tb1
. . .
N
TbN
. . .
2x
Figure 6: The final TA (left) and TB (right).
There are two different input gate subtrees in T ga : the ai = 0 subtree composed of a root vertex and
two leaves, and the ai = 1 subtree composed of a root vertex with a single leaf (see Figure 3). Note
that the ai = 0 input subtree contains the ai = 1 input subtree. Then if we define a bit assignment
u = 0m, it follows that for every a ∈ A, the tree T ga is contained within the tree T
g
u . Then for trees
T 1a and T
2
a we construct trees T
1
u and T
2
u so that T
1
a is contained in T
1
u and T
2
a is contained in T
2
u
for all a ∈ A. We define our universal subtree Ug as the tree created by merging the root vertex of
T 1u with the root vertex of T
2
u . By construction, this tree Ug contains T
1
a and T
2
a for all a ∈ A as
intended. We now verify that all invariants are satisfied.
• Invariant 1. This invariant holds by an argument identical to that of the AND gate con-
struction.
• Invariant 2. A similar argument as with the AND gate will show that T ga does not depend
on bit assignment b. Likewise, tree T gb does not depend on bit assignment a; the construction
of universal subtree Ug is independent of a as detailed in its construction.
• Invariant 3. By our inductive hypothesis, it suffices to show that T ga is contained in T
g
b if
and only if T 1a is contained in T
1
b OR T
2
a is contained in T
2
b . The ‘if’ direction can be seen by
observing that if T 1a is contained in T
1
b , then we can align T
1
a with T
1
b and align T
2
a with Ug,
which is guaranteed to contain T 2a ; the case where T
2
a is contained in T
2
b is identical.
The ‘only if’ direction follows from a similar argument given for the AND construction. First
note that since trees T ga and T
g
b have the same height, every subtree isomorphism must map
the root vertex v0a of T
g
a to the root vertex v0b of T
g
b . Additionally, it is immediate from
construction that exactly one subtree T 1a or T
2
a can be aligned with universal subtree Ug.
Then we simply need to verify that there is no valid subtree isomorphism between T ga and
T gb that maps T
1
a to T
2
b or T
2
a to T
1
b . Suppose that T
1
a was mapped to a subtree of T
2
b (the
other case is symmetric). Then vertex v3a would map to vertex v
5
b , and the path of length
two hanging off v3a would have nowhere to map to. We conclude that subtree T
1
a must map
to subtree T 1b or subtree T
2
a must map to subtree T
2
b in any subtree isomorphism from T
g
a to
T gb . The invariant is maintained.
12
3.3 Completing the Reduction
The final trees are constructed using the technique provided in [1]. The construction is shown in
Figure 6 and described next.
• For the final tree TA, start with a complete binary tree where the number of leaves is the
smallest power of 2 that is greater or equal to N , say 2x. From each of the 2x leaves, attach
a path of length x. Let the first N leaves at the ends of these paths be numbered 1 to N .
For 1 ≤ i ≤ N , replace leaf i with root of Tai . For the remaining 2
x −N leaves at the end of
paths, replace the leaf with the roots of 2x −N copies of TaN .
• For the final tree TB, again start with a complete binary tree with 2
x leaves. From the first
2x − 1 leaves, attach a path of length x. Replace the end of each of the paths with the root
of a universal tree U , which is Ta with input bit assignment u = 0
m. From the remaining leaf
in the complete binary tree, replace this leaf with the root of another complete binary tree,
again with 2x leaves. Let the first N leaves of this second complete binary tree be numbered
1 to N . For 1 ≤ i ≤ N , replace leaf i with the root of Tbi .
To see why this works, consider that for TA to be isomorphic to a subtree of TB, the root of TA must
be mapped onto the root of TB . Then, one of TA’s 2
x paths hanging from the leaves of its complete
binary tree must traverse down the lower complete binary tree in TB . From here, a subtree rooted
at the end of one of these paths in TA must have to be isomorphic to one of the subtrees hanging
from the leaves of the second binary tree in TB. This is possible if and only if for some a ∈ A and
b ∈ B we have that Ta is isomorphic to a subtree of Tb. By the invariants proven above, such a pair
a ∈ A and b ∈ B exists iff the starting formula F evaluates to true on the assignment (a, b).
The final tree TA is of size O(Ns). This is because there are N trees Ta in TA, and each tree Ta
is of size O(s). The upper bound on the size of Ta follows from the fact that formula F has s gates,
and each gate contributes constantly many vertices to Ta. The final tree TB is of size O(Ns
2). To
see this, fix a particular assignment (a, b), and consider the tree Tb. Each AND gate contributes
a constant number of vertices to Tb. Each OR gate appends a universal subtree U of size at most
the size of Ta to Tb. Since the size of Ta is O(s) and there are s gates in formula F , we have that
Tb is of size O(s
2).
4 Discussion
The key property highlighted by the two reductions is that both problems we reduced to allow for
the construction of two independent objects OA and OB , where OA is constructed independently
from the partial input assignments in B, and OB is constructed independently from the partial
input assignments in A.
In order to construct these objects, both reductions start by fixing an input assignment (a, b).
Then, two new objects for each gate g are constructed using the objects for the circuits that are input
into g. The aim of this construction is to maintain the invariant that whichever desired property we
want our objects to have (e.g., the pattern occurring in a graph, or having an isomorphic subtree)
holds iff (a, b) satisfy the circuit with output gate g. This is accomplished by supposing (i) we are
adding the gate g = g1 ∗g2 where ∗ ∈ {∧,∨}, (ii) the objects O
g1
a and O
g1
b have the desired property
iff (a, b) evaluates to true on the circuit with output gate g1, and (iii) the objects O
g2
a and O
g2
b
13
have the desired property iff (a, b) evaluate to true on the circuit with output gate g2. The task is
then to construct Oga from only O
g1
a and O
g2
a , and O
g
b from only O
g1
b and O
g2
b , such that O
g
a and O
g
b
have the desired property iff g = g1 ∗ g2 evaluates to true. By the invariant, this is equivalent when
∗ = ∧ to Og1a and O
g1
b having the desired property, and O
g2
a and O
g2
b having the desired property.
In the case of ∗ = ∨, only one of the pairs Og1a , O
g1
b or O
g2
a , O
g2
b needs to have the property.
In the last step, the final objects OA and OB are constructed by combining all Oai , 1 ≤ i ≤ N
to form OA, and Obj , 1 ≤ j ≤ N to form OB . These final objects must allow for selection between
different partial assignments. Additionally, the final objects satisfy the desired property iff at least
one object pair Oai and Obj together satisfy the desired property.
The above outlines, on a high level, the approach used in reductions from Formula-SAT to
polynomial-time problems that appear here, and in [2, 28]. The techniques presented in [3] instead
start with the problem of the satisfiability of branching programs, but they work similarly in the
sense that they must model the logical gates AND and OR (this time connecting logical statements
about reachability). The authors also take similar steps in order to build two independent objects
based on a fixed input assignment (a, b).
References
[1] A. Abboud, A. Backurs, T. D. Hansen, V. V. Williams, and O. Zamir. Subtree isomorphism
revisited. ACM Trans. Algorithms, 14(3):27:1–27:23, 2018.
[2] A. Abboud and K. Bringmann. Tighter connections between formula-sat and shaving logs.
In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018,
July 9-13, 2018, Prague, Czech Republic, pages 8:1–8:18, 2018.
[3] A. Abboud, T. D. Hansen, V. V. Williams, and R. Williams. Simulating branching programs
with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of
the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge,
MA, USA, June 18-21, 2016, pages 375–388, 2016.
[4] T. Akutsu. A linear time pattern matching algorithm between a string and a tree. In Combi-
natorial Pattern Matching, 4th Annual Symposium, CPM 93, Padova, Italy, June 2-4, 1993,
Proceedings, pages 1–10, 1993.
[5] A. Amir, M. Lewenstein, and N. Lewenstein. Pattern matching in hypertext. J. Algorithms,
35(1):82–99, 2000.
[6] T. Bru¨ggemann and W. Kern. An improved local search algorithm for 3-sat. Electron. Notes
Discret. Math., 17:69–73, 2004.
[7] R. Chen. Satisfiability algorithms and lower bounds for boolean formulas over finite bases. In
Mathematical Foundations of Computer Science 2015 - 40th International Symposium, MFCS
2015, Milan, Italy, August 24-28, 2015, Proceedings, Part II, pages 223–234, 2015.
[8] R. Chen, V. Kabanets, A. Kolokolova, R. Shaltiel, and D. Zuckerman. Mining circuit lower
bound proofs for meta-algorithms. Comput. Complex., 24(2):333–392, 2015.
14
[9] M. Chung. O(nˆ(2.55)) time algorithms for the subgraph homeomorphism problem on trees.
J. Algorithms, 8(1):106–112, 1987.
[10] R. Cole and R. Hariharan. Tree pattern matching to subset matching in linear time. SIAM J.
Comput., 32(4):1056–1066, 2003.
[11] M. Equi, R. Grossi, V. Ma¨kinen, and A. I. Tomescu. On the complexity of string matching for
graphs. In C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, 46th Interna-
tional Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019,
Patras, Greece, volume 132 of LIPIcs, pages 55:1–55:15. Schloss Dagstuhl - Leibniz-Zentrum
fu¨r Informatik, 2019.
[12] T. D. Hansen, H. Kaplan, O. Zamir, and U. Zwick. Faster k -sat algorithms using biased-ppsz.
In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC
2019, Phoenix, AZ, USA, June 23-26, 2019, pages 578–589, 2019.
[13] R. Impagliazzo, W. Matthews, and R. Paturi. A satisfiability algorithm for ac0. In Proceedings
of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 961–972.
SIAM, 2012.
[14] R. Impagliazzo, R. Paturi, and S. Schneider. A satisfiability algorithm for sparse depth two
threshold circuits. In 54th Annual IEEE Symposium on Foundations of Computer Science,
FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 479–488, 2013.
[15] C. Jain, H. Zhang, Y. Gao, and S. Aluru. On the complexity of sequence to graph alignment. In
L. J. Cowen, editor, Research in Computational Molecular Biology - 23rd Annual International
Conference, RECOMB 2019, Washington, DC, USA, May 5-8, 2019, Proceedings, volume
11467 of Lecture Notes in Computer Science, pages 85–100. Springer, 2019.
[16] I. Komargodski, R. Raz, and A. Tal. Improved average-case lower bounds for demorgan
formula size. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS
2013, 26-29 October, 2013, Berkeley, CA, USA, pages 588–597, 2013.
[17] A. Lingas. An application of maximum bipartite c-matching to subtree isomorphism. In
CAAP’83, Trees in Algebra and Programming, 8th Colloquium, L’Aquila, Italy, March 9-11,
1983, Proceedings, pages 284–299, 1983.
[18] A. Lingas and M. Karpinski. Subtree isomorphism is NC reducible to bipartite perfect match-
ing. Inf. Process. Lett., 30(1):27–32, 1989.
[19] U. Manber and S. Wu. Approximate string matching with arbitrary costs for text and hy-
pertext. In Advances In Structural And Syntactic Pattern Recognition, pages 22–33. World
Scientific, 1992.
[20] B. Monien and E. Speckenmeyer. Solving satisfiability in less than 2n steps. Discret. Appl.
Math., 10(3):287–295, 1985.
[21] G. Navarro. Improved approximate pattern matching on hypertext. Theor. Comput. Sci.,
237(1-2):455–463, 2000.
15
[22] K. Park and D. K. Kim. String matching in hypertext. In Combinatorial Pattern Matching,
6th Annual Symposium, CPM 95, Espoo, Finland, July 5-7, 1995, Proceedings, pages 318–329,
1995.
[23] R. Paturi, P. Pudla´k, M. E. Saks, and F. Zane. An improved exponential-time algorithm for
k -sat. J. ACM, 52(3):337–364, 2005.
[24] M. Rautiainen and T. Marschall. Aligning sequences to general graphs in o (v+ me) time.
bioRxiv, page 216127, 2017.
[25] S. W. Reyner. An analysis of a good algorithm for the subtree problem. SIAM J. Comput.,
6(4):730–732, 1977.
[26] R. Rodosek. A new approach on solving 3-satisfiability. In Artificial Intelligence and Symbolic
Mathematical Computation, International Conference AISMC-3, Steyr, Austria, September
23-25, 1996, Proceedings, pages 197–212, 1996.
[27] T. Sakai, K. Seto, S. Tamaki, and J. Teruyama. A satisfiability algorithm for depth-2 circuits
with a symmetric gate at the top and AND gates at the bottom. Electronic Colloquium on
Computational Complexity (ECCC), 22:136, 2015.
[28] P. Schepper. Fine-grained complexity of regular expression pattern matching and membership.
CoRR, abs/2008.02769, 2020.
[29] U. Scho¨ning. A probabilistic algorithm for k -sat based on limited local search and restart.
Algorithmica, 32(4):615–623, 2002.
[30] K. Seto and S. Tamaki. A satisfiability algorithm and average-case hardness for formulas over
the full binary basis. Comput. Complex., 22(2):245–274, 2013.
[31] R. Shamir and D. Tsur. Faster subtree isomorphism. J. Algorithms, 33(2):267–280, 1999.
[32] S. Tamaki. A satisfiability algorithm for depth two circuits with a sub-quadratic number of
symmetric and threshold gates. Electronic Colloquium on Computational Complexity (ECCC),
23:100, 2016.
[33] R. M. Verma and S. W. Reyner. An analysis of a good algorithm for the subtree problem,
corrected. SIAM J. Comput., 18(5):906–908, 1989.
[34] R. Williams. Algorithms for circuits and circuits for algorithms: Connecting the tractable and
intractable. In Proceedings of the International Congress of Mathematicians, pages 659–682,
2014.
A Proving the implications of logarithmically faster algorithms
for Subtree Isomorphism
Theorem 3 ([3]). Let n ≤ S(n) ≤ 2o(n) be time constructible and monotone non-decreasing. Let
C be a class of circuits. Suppose there is an SAT algorithm for n-input circuits which are ANDs
of O(S(n)) arbitrary functions of three O(S(n))-size circuits from C, that runs in O(2n/n10) time.
Then ENP does not have S(n)-size circuits.
16
Theorem 4 ([3]). Suppose there is a satisfiability algorithm for bounded fan-in formulas of size
nk running in O(2n/nk) time, for all constants k > 0. Then NTIME[2O(n)] is not contained in
non-uniform NC1.
Corollary 1. The existence of a strongly subquadratic time algorithm for PMLG (or Subtree
Isomorphism) would imply the class ENP (1) does not have non-uniform 2o(n)-size Boolean formulas
and (2) does not have non-uniform o(n)-depth circuits of bounded fan-in. It also implies that
NTIME[2O(n)] is not in non-uniform NC.
Proof. Note that the condition in Theorem 3 that the SAT-algorithm works on n-input circuits
which are ANDs of O(S(n)) arbitrary functions of three O(S(n))-size circuits is trivially satis-
fied by a solver that works over Boolean formula. By Theorem 1 (Theorem 2 resp.), for circuits
(or equivalently formulas) of size S(n) = 2o(n), a strongly subquadric time algorithm for PMLG
(Subtree Isomorphism resp.) would imply a SAT algorithm running in time
O(n1+o(1) · |E||P |1−ε) = O(n1+o(1) · 2n−εn/2S(n)4)
which is O(2n/n10); the n1+o(1) factor is introduced when moving from a word size of Θ(log n)
to Θ(n). Thus, Theorem 3 implies (1). Part (2) is implied as well since a o(n)-depth circuit of
bounded fan-in can be expressed as a formula of size S(n) = 2o(n). The last statement follows from
Theorem 4 and the fact that on circuits of size nk, our subquadratic algorithm would run in time
O(n1+o(1) · 2n−εn/2n2k) which is O(2n/nk).
Corollary 2. If PMLG (or Subtree Isomorphism) can be solved in time O( |E||P |logc |E|) or O(
|E||P |
logc |P |)
( O( |T1||T2|logc |T1|) or O(
|T1||T2|
logc |T2|
) resp.) for all c = Θ(1), then NTIME[2O(n)] does not have non-uniform
polynomial-size log-depth circuits.
Proof. We prove this for PMLG, the proof for Subtree Isomorphism is similar. By Theorem 4,
it suffices to show that for all k, there exists an algorithm to check satisfiability of all bounded
fan-in formulas of size nk running in time O(2n/nk). Suppose that for all c = Θ(1), there exists
an algorithm running in time O( |E||P |logc |P |) or O(
|E||P |
logc |E|). Then by Theorem 1, if we let c > 4k+1 we
obtain an algorithm running in time
n1+o(1) · 2ns3
logc(2
n
2 s2)
=
n1+o(1) · 2nn3k
logc(2
n
2 n2k)
≤
n1+o(1) · 2nn3k(
n
2
)c = 2n+c
nc−3k−1−o(1)
= O
(
2n
nk
)
Corollary 4. ENP cannot be computed by non-uniform formulas of cubic size if PMLG (or Subtree
Isomorphism) can be solved in time O
(
|E|·|P |
log20+ε |E|
)
or O
(
|E|·|P |
log20+ε |P |
)
for ε > 0, where G is a
deterministic DAG of maximum degree three (or O
(
|T1|·|T2|
log20+ε |T1|
)
or O
(
|T1|·|T2|
log20+ε |T2|
)
for ε > 0 resp.).
Proof. Theorem 3 as given in [3] says that solving Formula-SAT in time O(2n/n10) on formulas of
size s = O(n3+ε) implies that there is a function in class ENP that cannot be computed by formulas
of size O(n3+ε). Then via a proof identical to that of Corollary 3, we have the above result.
17
