24 research outputs found

    Deterministic algorithms for skewed matrix products

    Get PDF
    Recently, Pagh presented a randomized approximation algorithm for the multiplication of real-valued matrices building upon work for detecting the most frequent items in data streams. We continue this line of research and present new {\em deterministic} matrix multiplication algorithms. Motivated by applications in data mining, we first consider the case of real-valued, nonnegative nn-by-nn input matrices AA and BB, and show how to obtain a deterministic approximation of the weights of individual entries, as well as the entrywise pp-norm, of the product ABAB. The algorithm is simple, space efficient and runs in one pass over the input matrices. For a user defined b(0,n2)b \in (0, n^2) the algorithm runs in time O(nb+nSort(n))O(nb + n\cdot\text{Sort}(n)) and space O(n+b)O(n + b) and returns an approximation of the entries of ABAB within an additive factor of ABE1/b\|AB\|_{E1}/b, where CE1=i,jCij\|C\|_{E1} = \sum_{i, j} |C_{ij}| is the entrywise 1-norm of a matrix CC and Sort(n)\text{Sort}(n) is the time required to sort nn real numbers in linear space. Building upon a result by Berinde et al. we show that for skewed matrix products (a common situation in many real-life applications) the algorithm is more efficient and achieves better approximation guarantees than previously known randomized algorithms. When the input matrices are not restricted to nonnegative entries, we present a new deterministic group testing algorithm detecting nonzero entries in the matrix product with large absolute value. The algorithm is clearly outperformed by randomized matrix multiplication algorithms, but as a byproduct we obtain the first O(n2+ε)O(n^{2 + \varepsilon})-time deterministic algorithm for matrix products with O(n)O(\sqrt{n}) nonzero entries

    LoNe Sampler: Graph node embeddings by coordinated local neighborhood sampling

    Full text link
    Local graph neighborhood sampling is a fundamental computational problem that is at the heart of algorithms for node representation learning. Several works have presented algorithms for learning discrete node embeddings where graph nodes are represented by discrete features such as attributes of neighborhood nodes. Discrete embeddings offer several advantages compared to continuous word2vec-like node embeddings: ease of computation, scalability, and interpretability. We present LoNe Sampler, a suite of algorithms for generating discrete node embeddings by Local Neighborhood Sampling, and address two shortcomings of previous work. First, our algorithms have rigorously understood theoretical properties. Second, we show how to generate approximate explicit vector maps that avoid the expensive computation of a Gram matrix for the training of a kernel model. Experiments on benchmark datasets confirm the theoretical findings and demonstrate the advantages of the proposed methods.Comment: Accepted to AAAI 2023. arXiv admin note: substantial text overlap with arXiv:2102.0477

    On Parallelizing Matrix Multiplication by the Column-Row Method

    Full text link
    We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.

    Triangle Counting in Dynamic Graph Streams

    Get PDF
    Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered {\em insert-only} streams. We present a new algorithm estimating the number of triangles in {\em dynamic} graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.Comment: New version of a SWAT 2014 paper with improved result

    STRIP: stream learning of influence probabilities

    Full text link

    An exact exponential time algorithm for counting bipartite cliques.

    Full text link
    We present a simple exact algorithm for counting bicliques of given size in a bipartite graph on n vertices. We achieve running time of O(1.2491 n), improving upon known exact algorithms for finding and counting bipartite cliques
    corecore