Search CORE

5,259 research outputs found

Improved balanced incomplete factorization

Author: Bru García Rafael
Marín Mateos-Aparicio José
Mas Marí José
Tuma Miroslav
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2010
Field of study

[EN] . In this paper we improve the BIF algorithm which computes simultaneously the LU factors (direct factors) of a given matrix and their inverses (inverse factors). This algorithm was introduced in [R. Bru, J. Mar´ın, J. Mas, and M. T˚uma, SIAM J. Sci. Comput., 30 (2008), pp. 2302– 2318]. The improvements are based on a deeper understanding of the inverse Sherman–Morrison (ISM) decomposition, and they provide a new insight into the BIF decomposition. In particular, it is shown that a slight algorithmic reformulation of the basic algorithm implies that the direct and inverse factors numerically influence each other even without any dropping for incompleteness. Algorithmically, the nonsymmetric version of the improved BIF algorithm is formulated. Numerical experiments show very high robustness of the incomplete implementation of the algorithm used for preconditioning nonsymmetric linear systemsReceived by the editors January 26, 2009; accepted for publication (in revised form) by V. Simoncini June 1, 2010; published electronically August 12, 2010. This work was supported by Spanish grant MTM 2007-64477, by project IAA100300802 of the Grant Agency of the Academy of Sciences of the Czech Republic, and partially also by the International Collaboration Support M100300902 of AS CR.Bru García, R.; Marín Mateos-Aparicio, J.; Mas Marí, J.; Tuma, M. (2010). Improved balanced incomplete factorization. SIAM Journal on Matrix Analysis and Applications. 31(5):2431-2452. https://doi.org/10.1137/090747804S2431245231

RiuNet

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Minimizing Communication in Linear Algebra

Author: Blackford L. S.
Grey Ballard
James Demmel
Oded Schwartz
Olga Holtz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2009
Field of study

In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional

O(n^3)

algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as

\Omega

(#arithmetic operations /

\sqrt{M}

), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization,

LDL^T

factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Principles for problem aggregation and assignment in medium scale multiprocessors

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

NASA Technical Reports Server