332 research outputs found
A Nearly Quadratic Bound for the Decision Tree Complexity of k-SUM
We show that the k-SUM problem can be solved by a linear decision tree of depth O(n^2 log^2 n),improving the recent bound O(n^3 log^3 n) of Cardinal et al. Our bound depends linearly on k, and allows us to conclude that the number of linear queries required to decide the n-dimensional Knapsack or SubsetSum problems is only O(n^3 log n), improving the currently best known bounds by a factor of n. Our algorithm extends to the RAM model, showing that the k-SUM problem can be solved in expected polynomial time, for any fixed k, with the above bound on the number of linear queries. Our approach relies on a new point-location mechanism, exploiting "Epsilon-cuttings" that are based on vertical decompositions in hyperplane arrangements in high dimensions.
A major side result of the analysis in this paper is a sharper bound on the complexity of the vertical decomposition of such an arrangement (in terms of its dependence on the dimension). We hope that this study will reveal further structural properties of vertical decompositions in hyperplane arrangements
Explicit model predictive control accuracy analysis
Model Predictive Control (MPC) can efficiently control constrained systems in
real-time applications. MPC feedback law for a linear system with linear
inequality constraints can be explicitly computed off-line, which results in an
off-line partition of the state space into non-overlapped convex regions, with
affine control laws associated to each region of the partition. An actual
implementation of this explicit MPC in low cost micro-controllers requires the
data to be "quantized", i.e. represented with a small number of memory bits. An
aggressive quantization decreases the number of bits and the controller
manufacturing costs, and may increase the speed of the controller, but reduces
accuracy of the control input computation. We derive upper bounds for the
absolute error in the control depending on the number of quantization bits and
system parameters. The bounds can be used to determine how many quantization
bits are needed in order to guarantee a specific level of accuracy in the
control input.Comment: 6 pages, 7 figures. Accepted to IEEE CDC 201
Threesomes, Degenerates, and Love Triangles
The 3SUM problem is to decide, given a set of real numbers, whether any
three sum to zero. It is widely conjectured that a trivial -time
algorithm is optimal and over the years the consequences of this conjecture
have been revealed. This 3SUM conjecture implies lower bounds on
numerous problems in computational geometry and a variant of the conjecture
implies strong lower bounds on triangle enumeration, dynamic graph algorithms,
and string matching data structures.
In this paper we refute the 3SUM conjecture. We prove that the decision tree
complexity of 3SUM is and give two subquadratic 3SUM
algorithms, a deterministic one running in
time and a randomized one running in time with
high probability. Our results lead directly to improved bounds for -variate
linear degeneracy testing for all odd . The problem is to decide, given
a linear function and a set , whether . We show the
decision tree complexity of this problem is .
Finally, we give a subcubic algorithm for a generalization of the
-product over real-valued matrices and apply it to the problem of
finding zero-weight triangles in weighted graphs. We give a
depth- decision tree for this problem, as well as an
algorithm running in time
Near-optimal Linear Decision Trees for k-SUM and Related Problems
We construct near-optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant
k
, we construct linear decision trees that solve the
k
-SUM problem on
n
elements using
O
(
n
log
2
n
) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two
k
-subsets; when viewed as linear queries, comparison queries are 2
k
-sparse and have only { â1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.
Our constructions are based on the notion of âinference dimension,â recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension
Subquadratic Algorithms for Some 3Sum-Hard Geometric Problems in the Algebraic Decision Tree Model
We present subquadratic algorithms in the algebraic decision-tree model for several 3Sum-hard geometric problems, all of which can be reduced to the following question: Given two sets A, B, each consisting of n pairwise disjoint segments in the plane, and a set C of n triangles in the plane, we want to count, for each triangle â â C, the number of intersection points between the segments of A and those of B that lie in â. The problems considered in this paper have been studied by Chan (2020), who gave algorithms that solve them, in the standard real-RAM model, in O((n2/log2 n) logO(1) log n) time. We present solutions in the algebraic decision-tree model whose cost is O(n60/31+Δ), for any Δ > 0. Our approach is based on a primal-dual range searching mechanism, which exploits the multi-level polynomial partitioning machinery recently developed by Agarwal, Aronov, Ezra, and Zahl (2020). A key step in the procedure is a variant of point location in arrangements, say of lines in the plane, which is based solely on the order type of the lines, a âhandicapâ that turns out to be beneficial for speeding up our algorithm.SCOPUS: cp.pinfo:eu-repo/semantics/publishe
Point Location and Active Learning: Learning Halfspaces Almost Optimally
Given a finite set and a binary linear classifier
, how many queries of the form are required
to learn the label of every point in ? Known as \textit{point location},
this problem has inspired over 35 years of research in the pursuit of an
optimal algorithm. Building on the prior work of Kane, Lovett, and Moran (ICALP
2018), we provide the first nearly optimal solution, a randomized linear
decision tree of depth , improving on the previous best
of from Ezra and Sharir (Discrete and Computational
Geometry, 2019). As a corollary, we also provide the first nearly optimal
algorithm for actively learning halfspaces in the membership query model. En
route to these results, we prove a novel characterization of Barthe's Theorem
(Inventiones Mathematicae, 1998) of independent interest. In particular, we
show that may be transformed into approximate isotropic position if and
only if there exists no -dimensional subspace with more than a
-fraction of , and provide a similar characterization for exact
isotropic position
Statistical Methods for Characterizing Genomic Heterogeneity in Mixed Samples
Recently, sequencing technologies have generated massive and heterogeneous data sets. However, interpretation of these data sets is a major barrier to understand genomic heterogeneity in complex diseases. In this dissertation, we develop a Bayesian statistical method for single nucleotide level analysis and a global optimization method for gene expression level analysis to characterize genomic heterogeneity in mixed samples. The detection of rare single nucleotide variants (SNVs) is important for understanding genetic heterogeneity using next-generation sequencing (NGS) data. Various computational algorithms have been proposed to detect variants at the single nucleotide level in mixed samples. Yet, the noise inherent in the biological processes involved in NGS technology necessitates the development of statistically accurate methods to identify true rare variants. At the single nucleotide level, we propose a Bayesian probabilistic model and a variational expectation maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of relatively low coverage (27x and 298x) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants. Characterization of heterogeneity in gene expression data is a critical challenge for personalized treatment and drug resistance due to intra-tumor heterogeneity. Mixed membership factorization has become popular for analyzing data sets that have within-sample heterogeneity. In recent years, several algorithms have been developed for mixed membership matrix factorization, but they only guarantee estimates from a local optimum. At the gene expression level, we derive a global optimization (GOP) algorithm that provides a guaranteed epsilon-global optimum for a sparse mixed membership matrix factorization problem for molecular subtype classification. We test the algorithm on simulated data and find the algorithm always bounds the global optimum across random initializations and explores multiple modes efficiently. The GOP algorithm is well-suited for parallel computations in the key optimization steps
- âŠ