13 research outputs found
Computing Bits of Algebraic Numbers
We initiate the complexity theoretic study of the problem of computing the
bits of (real) algebraic numbers. This extends the work of Yap on computing the
bits of transcendental numbers like \pi, in Logspace.
Our main result is that computing a bit of a fixed real algebraic number is
in C=NC1\subseteq Logspace when the bit position has a verbose (unary)
representation and in the counting hierarchy when it has a succinct (binary)
representation.
Our tools are drawn from elementary analysis and numerical analysis, and
include the Newton-Raphson method. The proof of our main result is entirely
elementary, preferring to use the elementary Liouville's theorem over the much
deeper Roth's theorem for algebraic numbers.
We leave the possibility of proving non-trivial lower bounds for the problem
of computing the bits of an algebraic number given the bit position in binary,
as our main open question. In this direction we show very limited progress by
proving a lower bound for rationals
Testing Uniformity of Stationary Distribution
A random walk on a directed graph gives a Markov chain on the vertices of the
graph. An important question that arises often in the context of Markov chain
is whether the uniform distribution on the vertices of the graph is a
stationary distribution of the Markov chain. Stationary distribution of a
Markov chain is a global property of the graph. In this paper, we prove that
for a regular directed graph whether the uniform distribution on the vertices
of the graph is a stationary distribution, depends on a local property of the
graph, namely if (u,v) is an directed edge then outdegree(u) is equal to
indegree(v).
This result also has an application to the problem of testing whether a given
distribution is uniform or "far" from being uniform. This is a well studied
problem in property testing and statistics. If the distribution is the
stationary distribution of the lazy random walk on a directed graph and the
graph is given as an input, then how many bits of the input graph do one need
to query in order to decide whether the distribution is uniform or "far" from
it? This is a problem of graph property testing and we consider this problem in
the orientation model (introduced by Halevy et al.). We reduce this problem to
test (in the orientation model) whether a directed graph is Eulerian. And using
result of Fischer et al. on query complexity of testing (in the orientation
model) whether a graph is Eulerian, we obtain bounds on the query complexity
for testing whether the stationary distribution is uniform
Efficient Compression Technique for Sparse Sets
Recent technological advancements have led to the generation of huge amounts
of data over the web, such as text, image, audio and video. Most of this data
is high dimensional and sparse, for e.g., the bag-of-words representation used
for representing text. Often, an efficient search for similar data points needs
to be performed in many applications like clustering, nearest neighbour search,
ranking and indexing. Even though there have been significant increases in
computational power, a simple brute-force similarity-search on such datasets is
inefficient and at times impossible. Thus, it is desirable to get a compressed
representation which preserves the similarity between data points. In this
work, we consider the data points as sets and use Jaccard similarity as the
similarity measure. Compression techniques are generally evaluated on the
following parameters --1) Randomness required for compression, 2) Time required
for compression, 3) Dimension of the data after compression, and 4) Space
required to store the compressed data. Ideally, the compressed representation
of the data should be such, that the similarity between each pair of data
points is preserved, while keeping the time and the randomness required for
compression as low as possible.
We show that the compression technique suggested by Pratap and Kulkarni also
works well for Jaccard similarity. We present a theoretical proof of the same
and complement it with rigorous experimentations on synthetic as well as
real-world datasets. We also compare our results with the state-of-the-art
"min-wise independent permutation", and show that our compression algorithm
achieves almost equal accuracy while significantly reducing the compression
time and the randomness
Improved Outlier Robust Seeding for k-means
The -means is a popular clustering objective, although it is inherently
non-robust and sensitive to outliers. Its popular seeding or initialization
called -means++ uses sampling and comes with a provable
approximation guarantee \cite{AV2007}. However, in the presence of adversarial
noise or outliers, sampling is more likely to pick centers from distant
outliers instead of inlier clusters, and therefore its approximation guarantees
\textit{w.r.t.} -means solution on inliers, does not hold.
Assuming that the outliers constitute a constant fraction of the given data,
we propose a simple variant in the sampling distribution, which makes it
robust to the outliers. Our algorithm runs in time, outputs
clusters, discards marginally more points than the optimal number of outliers,
and comes with a provable approximation guarantee.
Our algorithm can also be modified to output exactly clusters instead of
clusters, while keeping its running time linear in and . This is
an improvement over previous results for robust -means based on LP
relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and
\textit{robust -means++} \cite{DeshpandeKP20}. Our empirical results show
the advantage of our algorithm over -means++~\cite{AV2007}, uniform random
seeding, greedy sampling for means~\cite{tkmeanspp}, and robust
-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data
sets used in previous work. Our proposal is easily amenable to scalable,
faster, parallel implementations of -means++ \cite{Bahmani,BachemL017} and
is of independent interest for coreset constructions in the presence of
outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}
Minwise-Independent Permutations with Insertion and Deletion of Features
In their seminal work, Broder \textit{et. al.}~\citep{BroderCFM98} introduces
the algorithm that computes a low-dimensional sketch of
high-dimensional binary data that closely approximates pairwise Jaccard
similarity. Since its invention, has been commonly used by
practitioners in various big data applications. Further, the data is dynamic in
many real-life scenarios, and their feature sets evolve over time. We consider
the case when features are dynamically inserted and deleted in the dataset. We
note that a naive solution to this problem is to repeatedly recompute
with respect to the updated dimension. However, this is an
expensive task as it requires generating fresh random permutations. To the best
of our knowledge, no systematic study of is recorded in the
context of dynamic insertion and deletion of features. In this work, we
initiate this study and suggest algorithms that make the
sketches adaptable to the dynamic insertion and deletion of features. We show a
rigorous theoretical analysis of our algorithms and complement it with
extensive experiments on several real-world datasets. Empirically we observe a
significant speed-up in the running time while simultaneously offering
comparable performance with respect to running from scratch.
Our proposal is efficient, accurate, and easy to implement in practice
Efficient Sketching Algorithm for Sparse Binary Data
Recent advancement of the WWW, IOT, social network, e-commerce, etc. have
generated a large volume of data. These datasets are mostly represented by high
dimensional and sparse datasets. Many fundamental subroutines of common data
analytic tasks such as clustering, classification, ranking, nearest neighbour
search, etc. scale poorly with the dimension of the dataset. In this work, we
address this problem and propose a sketching (alternatively, dimensionality
reduction) algorithm -- \binsketch (Binary Data Sketch) -- for sparse binary
datasets. \binsketch preserves the binary version of the dataset after
sketching and maintains estimates for multiple similarity measures such as
Jaccard, Cosine, Inner-Product similarities, and Hamming distance, on the same
sketch. We present a theoretical analysis of our algorithm and complement it
with extensive experimentation on several real-world datasets. We compare the
performance of our algorithm with the state-of-the-art algorithms on the task
of mean-square-error and ranking. Our proposed algorithm offers a comparable
accuracy while suggesting a significant speedup in the dimensionality reduction
time, with respect to the other candidate algorithms. Our proposal is simple,
easy to implement, and therefore can be adopted in practice
One-Pass Additive-Error Subset Selection for ?_p Subspace Approximation
We consider the problem of subset selection for ?_p subspace approximation, that is, to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. Previously known subset selection algorithms based on volume sampling and adaptive sampling [Deshpande and Varadarajan, 2007], for the general case of p ? [1, ?), require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ?_p subspace approximation, for any p ? [1, ?). Earlier subset selection algorithms that give a one-pass multiplicative (1+?) approximation work under the special cases. Cohen et al. [Michael B. Cohen et al., 2017] gives a one-pass subset section that offers multiplicative (1+?) approximation guarantee for the special case of ?? subspace approximation. Mahabadi et al. [Sepideh Mahabadi et al., 2020] gives a one-pass noisy subset selection with (1+?) approximation guarantee for ?_p subspace approximation when p ? {1, 2}. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ? [1, ?)
Optical coherence tomography and subclinical optical neuritis in longitudinally extensive transverse myelitis
Objective: The aim is to compare the retinal nerve fiber layer (RNFL) thickness of longitudinally extensive transverse myelitis (LETM) eyes without previous optic neuritis with that of healthy control subjects. Methods: Over 20 LETM eyes and 20 normal control eyes were included in the study and subjected to optical coherence tomography to evaluate and compare the RNFL thickness. Result: Significant RNFL thinning was observed at 8 o'clock position in LETM eyes as compared to the control eyes (P = 0.038). No significant differences were seen in other RNFL measurements. Conclusion: Even in the absence of previous optic neuritis LETM can lead to subclinical axonal damage leading to focal RNFL thinning