24 research outputs found
Deterministic algorithms for skewed matrix products
Recently, Pagh presented a randomized approximation algorithm for the
multiplication of real-valued matrices building upon work for detecting the
most frequent items in data streams. We continue this line of research and
present new {\em deterministic} matrix multiplication algorithms.
Motivated by applications in data mining, we first consider the case of
real-valued, nonnegative -by- input matrices and , and show how to
obtain a deterministic approximation of the weights of individual entries, as
well as the entrywise -norm, of the product . The algorithm is simple,
space efficient and runs in one pass over the input matrices. For a user
defined the algorithm runs in time and space and returns an approximation of the
entries of within an additive factor of , where is the entrywise 1-norm of a matrix and
is the time required to sort real numbers in linear space.
Building upon a result by Berinde et al. we show that for skewed matrix
products (a common situation in many real-life applications) the algorithm is
more efficient and achieves better approximation guarantees than previously
known randomized algorithms.
When the input matrices are not restricted to nonnegative entries, we present
a new deterministic group testing algorithm detecting nonzero entries in the
matrix product with large absolute value. The algorithm is clearly outperformed
by randomized matrix multiplication algorithms, but as a byproduct we obtain
the first -time deterministic algorithm for matrix
products with nonzero entries
LoNe Sampler: Graph node embeddings by coordinated local neighborhood sampling
Local graph neighborhood sampling is a fundamental computational problem that
is at the heart of algorithms for node representation learning. Several works
have presented algorithms for learning discrete node embeddings where graph
nodes are represented by discrete features such as attributes of neighborhood
nodes. Discrete embeddings offer several advantages compared to continuous
word2vec-like node embeddings: ease of computation, scalability, and
interpretability. We present LoNe Sampler, a suite of algorithms for generating
discrete node embeddings by Local Neighborhood Sampling, and address two
shortcomings of previous work. First, our algorithms have rigorously understood
theoretical properties. Second, we show how to generate approximate explicit
vector maps that avoid the expensive computation of a Gram matrix for the
training of a kernel model. Experiments on benchmark datasets confirm the
theoretical findings and demonstrate the advantages of the proposed methods.Comment: Accepted to AAAI 2023. arXiv admin note: substantial text overlap
with arXiv:2102.0477
On Parallelizing Matrix Multiplication by the Column-Row Method
We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of
memory has become a popular topic in the last decade. Different variations of
the problem have been studied, depending on whether the graph edges are
provided in an arbitrary order or as incidence lists. However, with a few
exceptions, the algorithms have considered {\em insert-only} streams. We
present a new algorithm estimating the number of triangles in {\em dynamic}
graph streams where edges can be both inserted and deleted. We show that our
algorithm achieves better time and space complexity than previous solutions for
various graph classes, for example sparse graphs with a relatively small number
of triangles. Also, for graphs with constant transitivity coefficient, a common
situation in real graphs, this is the first algorithm achieving constant
processing time per edge. The result is achieved by a novel approach combining
sampling of vertex triples and sparsification of the input graph. In the course
of the analysis of the algorithm we present a lower bound on the number of
pairwise independent 2-paths in general graphs which might be of independent
interest. At the end of the paper we discuss lower bounds on the space
complexity of triangle counting algorithms that make no assumptions on the
structure of the graph.Comment: New version of a SWAT 2014 paper with improved result
An exact exponential time algorithm for counting bipartite cliques.
We present a simple exact algorithm for counting bicliques of given size in a bipartite graph on n vertices. We achieve running time of O(1.2491 n), improving upon known exact algorithms for finding and counting bipartite cliques