5,259 research outputs found
Improved balanced incomplete factorization
[EN] . In this paper we improve the BIF algorithm which computes simultaneously the LU
factors (direct factors) of a given matrix and their inverses (inverse factors). This algorithm was
introduced in [R. Bru, J. Mar´ın, J. Mas, and M. T˚uma, SIAM J. Sci. Comput., 30 (2008), pp. 2302–
2318]. The improvements are based on a deeper understanding of the inverse Sherman–Morrison
(ISM) decomposition, and they provide a new insight into the BIF decomposition. In particular,
it is shown that a slight algorithmic reformulation of the basic algorithm implies that the direct
and inverse factors numerically influence each other even without any dropping for incompleteness.
Algorithmically, the nonsymmetric version of the improved BIF algorithm is formulated. Numerical
experiments show very high robustness of the incomplete implementation of the algorithm used for
preconditioning nonsymmetric linear systemsReceived by the editors January 26, 2009; accepted for publication (in revised form) by V. Simoncini June 1, 2010; published electronically August 12, 2010. This work was supported by Spanish grant MTM 2007-64477, by project IAA100300802 of the Grant Agency of the Academy of Sciences of the Czech Republic, and partially also by the International Collaboration Support M100300902 of AS CR.Bru GarcĂa, R.; MarĂn Mateos-Aparicio, J.; Mas MarĂ, J.; Tuma, M. (2010). Improved balanced incomplete factorization. SIAM Journal on Matrix Analysis and Applications. 31(5):2431-2452. https://doi.org/10.1137/090747804S2431245231
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Minimizing Communication in Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication
needed to perform dense, matrix-multiplication using the conventional
algorithm, where the input matrices were too large to fit in the small, fast
memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and
extended it to the parallel case. In both cases the lower bound may be
expressed as (#arithmetic operations / ), where M is the size
of the fast memory (or local memory in the parallel case). Here we generalize
these results to a much wider variety of algorithms, including LU
factorization, Cholesky factorization, factorization, QR factorization,
algorithms for eigenvalues and singular values, i.e., essentially all direct
methods of linear algebra. The proof works for dense or sparse matrices, and
for sequential or parallel algorithms. In addition to lower bounds on the
amount of data moved (bandwidth) we get lower bounds on the number of messages
required to move it (latency). We illustrate how to extend our lower bound
technique to compositions of linear algebra operations (like computing powers
of a matrix), to decide whether it is enough to call a sequence of simpler
optimal algorithms (like matrix multiplication) to minimize communication, or
if we can do better. We give examples of both. We also show how to extend our
lower bounds to certain graph theoretic problems.
We point out recently designed algorithms for dense LU, Cholesky, QR,
eigenvalue and the SVD problems that attain these lower bounds; implementations
of LU and QR show large speedups over conventional linear algebra algorithms in
standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
Principles for problem aggregation and assignment in medium scale multiprocessors
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior
- …