Search CORE

18,681 research outputs found

Cache-Oblivious Selection in Sorted X+Y Matrices

Author: de Berg Mark
Thite Shripad
Publication venue
Publication date: 01/01/2008
Field of study

Let X[0..n-1] and Y[0..m-1] be two sorted arrays, and define the mxn matrix A by A[j][i]=X[i]+Y[j]. Frederickson and Johnson gave an efficient algorithm for selecting the k-th smallest element from A. We show how to make this algorithm IO-efficient. Our cache-oblivious algorithm performs O((m+n)/B) IOs, where B is the block size of memory transfers

arXiv.org e-Print Archive

CiteSeerX

Pure OAI Repository

Caltech Authors

Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

Author: Curtin Ryan
Sanderson Conrad
Publication venue: 'MDPI AG'
Publication date: 22/07/2019
Field of study

Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage format. In addition, to achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by these issues, we present a user-friendly and open-source sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. This facilitates prototyping directly in C++ and aids the conversion of research code into production environments. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, Red-Black tree, coordinate list) depending on which format is best suited and/or available for specific operations, and (ii) a template-based meta-programming framework to automatically detect and optimise execution of common expression patterns. Empirical evaluations on large sparse matrices with various densities of non-zero elements demonstrate the advantages of the hybrid storage framework and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper arXiv:1805.0338

arXiv.org e-Print Archive

University of Queensland eSpace

Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

Author: Nunez-Iglesias
Stroustrup
Abrahams
Vandevoorde
Saad
Eaton
Duff
Bai
Lehoucq
Lanckriet
Cormen
Anderson
Davis
St. Laurent
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

VU Research Portal

EUR Research Repository

University of Melbourne Institutional Repository

University of Queensland eSpace

Provable Deterministic Leverage Score Sampling

Author: Boutsidis C.
Gittens A.
Hong Y. P.
Kunegis J.
Publication venue
Publication date: 02/06/2014
Field of study

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.Comment: 20th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

Crossref

Visualizing dimensionality reduction of systems biology data

Author: A Hyvaerinen
A Hyvaerinen
A Inselberg
A Inselberg
A Saeed
Albert Pritzkau
Andreas Lehrmann
Aydin C. Polatkan
DH Jeong
DJ Lockhart
F Battke
F Battke
GH Golub
H Abdi
H Hotelling
HF Kaiser
J Shendure
JB Tenenbaum
K Pearson
Kay Nieselt
KQ Weinberger
LK Saul
M Fontes
M Harrower
M Schena
Michael Huber
P Mannfolk
R Karbauskaite
R Tarjan
S Roweis
Z Zhang
Ö Altug-Teber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2012
Field of study

One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system and which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21

arXiv.org e-Print Archive

Crossref

Publikationsserver der Universität Tübingen

Attribute Value Reordering For Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP

arXiv.org e-Print Archive

CiteSeerX

R-libre

Archipel - Université du Québec à Montréal