Search CORE

24 research outputs found

Deterministic algorithms for skewed matrix products

Author: Kutzkov Konstantin
Publication venue
Publication date: 20/09/2012
Field of study

Recently, Pagh presented a randomized approximation algorithm for the multiplication of real-valued matrices building upon work for detecting the most frequent items in data streams. We continue this line of research and present new {\em deterministic} matrix multiplication algorithms. Motivated by applications in data mining, we first consider the case of real-valued, nonnegative

n

-by-

n

input matrices

A

and

B

, and show how to obtain a deterministic approximation of the weights of individual entries, as well as the entrywise

p

-norm, of the product

AB

. The algorithm is simple, space efficient and runs in one pass over the input matrices. For a user defined

b \in (0, n^2)

the algorithm runs in time

O(nb + n\cdot\text{Sort}(n))

and space

O(n + b)

and returns an approximation of the entries of

AB

within an additive factor of

\|AB\|_{E1}/b

, where

\|C\|_{E1} = \sum_{i, j} |C_{ij}|

is the entrywise 1-norm of a matrix

C

and

\text{Sort}(n)

is the time required to sort

n

real numbers in linear space. Building upon a result by Berinde et al. we show that for skewed matrix products (a common situation in many real-life applications) the algorithm is more efficient and achieves better approximation guarantees than previously known randomized algorithms. When the input matrices are not restricted to nonnegative entries, we present a new deterministic group testing algorithm detecting nonzero entries in the matrix product with large absolute value. The algorithm is clearly outperformed by randomized matrix multiplication algorithms, but as a byproduct we obtain the first

O(n^{2 + \varepsilon})

-time deterministic algorithm for matrix products with

O(\sqrt{n})

nonzero entries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The IT University of Copenhagen's Repository

LoNe Sampler: Graph node embeddings by coordinated local neighborhood sampling

Author: Kutzkov Konstantin
Publication venue
Publication date: 28/11/2022
Field of study

Local graph neighborhood sampling is a fundamental computational problem that is at the heart of algorithms for node representation learning. Several works have presented algorithms for learning discrete node embeddings where graph nodes are represented by discrete features such as attributes of neighborhood nodes. Discrete embeddings offer several advantages compared to continuous word2vec-like node embeddings: ease of computation, scalability, and interpretability. We present LoNe Sampler, a suite of algorithms for generating discrete node embeddings by Local Neighborhood Sampling, and address two shortcomings of previous work. First, our algorithms have rigorously understood theoretical properties. Second, we show how to generate approximate explicit vector maps that avoid the expensive computation of a Gram matrix for the training of a kernel model. Experiments on benchmark datasets confirm the theoretical findings and demonstrate the advantages of the proposed methods.Comment: Accepted to AAAI 2023. arXiv admin note: substantial text overlap with arXiv:2102.0477

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On Parallelizing Matrix Multiplication by the Column-Row Method

Author: Andrea Campagna
Konstantin Kutzkov
Rasmus Pagh
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 19/11/2012
Field of study

We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.

arXiv.org e-Print Archive

CiteSeerX

Crossref

The IT University of Copenhagen's Repository

Triangle Counting in Dynamic Graph Streams

Author: A Pagh
A Pavan
CE Tsourakakis
I Kremer
JW Berry
Konstantin Kutzkov
L Becchetti
Laurent Bulteau
LJ Carter
M Pǎtraşcu
M Thorup
MN Kolountzakis
N Alon
R Albert
R Pagh
Rasmus Pagh
S Muthukrishnan
Vincent Froese
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/07/2015
Field of study

Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered {\em insert-only} streams. We present a new algorithm estimating the number of triangles in {\em dynamic} graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.Comment: New version of a SWAT 2014 paper with improved result

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

STRIP: stream learning of influence probabilities

Author: Kutzkov Konstantin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The IT University of Copenhagen's Repository

An exact exponential time algorithm for counting bipartite cliques.

Author: Kutzkov Konstantin
Publication venue
Publication date: 01/01/2012
Field of study

We present a simple exact algorithm for counting bicliques of given size in a bipartite graph on n vertices. We achieve running time of O(1.2491 n), improving upon known exact algorithms for finding and counting bipartite cliques

CiteSeerX

The IT University of Copenhagen's Repository

Improved counter based algorithms for frequent pairs mining in transactional data streams

Author: Kutzkov Konstantin
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

The IT University of Copenhagen's Repository