Search CORE

46 research outputs found

Combinatorial Preconditioners and Multilevel Solvers for Problems in Computer Vision and Image Processing

Author: A. Buades
B. Horn
B. Horn
E.G. Boman
J.W. Ruge
L. Grady
L. Grady
L. Grady
L. Grady
M. Bern
O. Axelsson
P. Bhat
P. Perona
R. Szeliski
R.R. Coifman
S. Yu
T. Chan
U. Trottenberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Author: C. Fowlkes
D. Achlioptas
D. Mavroeidis
D.A. Spielman
F. Fouss
H. Qiu
I. Koutis
L. Wang
P.G. Doyle
U. von Luxburg
W.Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to

O(n^3)

and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

arXiv.org e-Print Archive

Crossref

NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22 nm FD-SOI

Author: Benini L.
Schaffner M.
Schuiki F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Specialized coprocessors for Multiply-Accumulate (MAC) intensive workloads such as Deep Learning are becoming widespread in SoC platforms, from GPUs to mobile SoCs. In this paper we revisit NTX (an efficient accelerator developed for training Deep Neural Networks at scale) as a generalized MAC and reduction streaming engine. The architecture consists of a set of 32 bit floating-point streaming co-processors that are loosely coupled to a RISC-V core in charge of orchestrating data movement and computation. Post-layout results of a recent silicon implementation in 22 nm FD-SOI technology show the accelerator\u2019s capability to deliver up to 20 Gflop/s at 1.25 GHz and 168 mW. Based on these results we show that a version of NTX scaled down to 14 nm can achieve a 3 7 energy efficiency improvement over contemporary GPUs at 10.4 7 less silicon area, and a compute performance of 1.4 Tflop/s for training large state-of-the-art networks with full floating-point precision. An extended evaluation of MAC-intensive kernels shows that NTX can consistently achieve up to 87% of its peak performance across general reduction workloads beyond machine learning. Its modular architecture enables deployment at different scales ranging from high-performance GPU-class to low-power embedded scenario

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A nearly-mlogn time solver for SDD linear systems

Author: Koutis Ioannis
Miller Gary
Peng Richard
Publication venue
Publication date: 01/01/2011
Field of study

We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On input of an

n\times n

symmetric diagonally dominant matrix

A

with

m

non-zero entries and a vector

b

such that

A\bar{x} = b

for some (unknown) vector

\bar{x}

, our algorithm computes a vector

x

such that

||{x}-\bar{x}||_A < \epsilon ||\bar{x}||_A

{

||\cdot||_A

denotes the A-norm} in time

{\tilde O}(m\log n \log (1/\epsilon)).

The solver utilizes in a standard way a `preconditioning' chain of progressively sparser graphs. To claim the faster running time we make a two-fold improvement in the algorithm for constructing the chain. The new chain exploits previously unknown properties of the graph sparsification algorithm given in [Koutis,Miller,Peng, FOCS 2010], allowing for stronger preconditioning properties. We also present an algorithm of independent interest that constructs nearly-tight low-stretch spanning trees in time

\tilde{O}(m\log{n})

, a factor of

O(\log{n})

faster than the algorithm in [Abraham,Bartal,Neiman, FOCS 2008]. This speedup directly reflects on the construction time of the preconditioning chain.Comment: to appear in FOCS1

arXiv.org e-Print Archive

CiteSeerX

IC replacement using Gordian

Author: Στρουσίδου Τατιάνα Ν.
Publication venue
Publication date: 01/01/2015
Field of study

University of Thessaly Institutional Repository