Search CORE

1,211,401 research outputs found

An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation

Author: Curtin Ryan
Sanderson Conrad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher modelling accuracy than a previously well-established publically accessible implementation. The multi-threaded implementation is included as a user-friendly class in recent releases of the open source Armadillo C++ linear algebra library. The library is provided under the permissive Apache~2.0 license, allowing unencumbered use in commercial products

arXiv.org e-Print Archive

University of Queensland eSpace

A highly efficient multi-core algorithm for clustering extremely large datasets

Author: A Ben-Hur
A Bertoni
A Jain
AK Jain
AR Adl-Tabatabai
AWF Edwards
B Andreopoulos
B Chapman
C Herzeel
Consortium IH
D Lea
D Smirnov
DR Barr
E Levine
F Müller
G Dalgin
HA Kestler
HA Kestler
Hans A Kestler
HW Kuhn
J Fridlyand
J Handl
J Larus
J MacQueen
Johann M Kraus
JW Sammon
K Fukunaga
L Hubert
L Kuncheva
M Anderson
M Ng
MK Kerr
N Shavit
P Jaccard
P Sham
PA Bernstein
R Development Core Team
R Duan
R Graham
R Jonker
R Rajwar
R Tibshirani
R Xu
RC Gentleman
S Monti
S Peyton-Jones
S Selim
T Kohonen
T Lange
U Drepper
W Feng
W Gropp
W Rand
WJ Conover
X Gao
X Gao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ck-NN: A Clustered k-Nearest Neighbours Approach for Large-Scale Classification

Author: Emaduddin S.m.
Khan Ayaz H.
Ullah Rafi
Publication venue: Ediciones Universidad de Salamanca (España)
Publication date: 14/08/2019
Field of study

k-Nearest Neighbor (k-NN) is a non-parametric algorithm widely used for the estimation and classification of data points especially when the dataset is distributed in several classes. It is considered to be a lazy machine learning algorithm as most of the computations are done during the testing phase instead of performing this task during the training of data. Hence it is practically inefficient, infeasible and inapplicable while processing huge datasets i.e. Big Data. On the other hand, clustering techniques (unsupervised learning) greatly affect results if you do normalization or standardization techniques, difficult to determine "k" Value. In this paper, some novel techniques are proposed to be used as pre-state mechanism of state-of-the-art k-NN Classification Algorithm. Our proposed mechanism uses unsupervised clustering algorithm on large dataset before applying k-NN algorithm on different clusters that might running on single machine, multiple machines or different nodes of a cluster in distributed environment. Initially dataset, possibly having multi dimensions, is pass through clustering technique (K-Means) at master node or controller to find the number of clusters equal to the number of nodes in distributed systems or number of cores in system, and then each cluster will be assigned to exactly one node or one core and then applies k-NN locally, each core or node in clusters sends their best result and the selector choose best and nearest possible class from all options. We will be using one of the gold standard distributed framework. We believe that our proposed mechanism could be applied on big data. We also believe that the architecture can also be implemented on multi GPUs or FPGA to take flavor of k-NN on large or huge datasets where traditional k-NN is very slow

Gestion del Repositorio Documental de la Universidad de Salamanca

Large-scale hierarchical k-means for heterogeneous many-core supercomputers

Author: Fu Haohuan
Li Lideng
Tan Li
Thomson John
Wang Chenyu
Yang Guangwen
Yu Teng
Zhao Wenlai
Publication venue: 'Test accounts'
Publication date: 11/11/2018
Field of study

Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1, and EU Horizon 2020 grant Team-Play: ”Time, Energy and security Analysis for Multi/Many-core heterogenous PLAtforms” (ICT-779882, https://teamplay- h2020.eu)This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Photonic integrated circuits employing multi-core fiber for broadband radio beamsteering (Invited)

Author: Koonen A.M.J. (Ton)
Llorente Roberto
Morant Maria
Tangdiongga Eduward
Trinidad Ailee M.
Publication venue
Publication date: 29/08/2020
Field of study

This paper presents an optical beamforming network based on a photonic integrated circuit employing a weaklycoupled multi-core fiber to connect the different antenna elements. The proposed beamformer enables a centralized control of the resulting steering angle. By means of wavelength tuning, fast and dynamic configuration of the induced delay (and associated beam steering angle) is achieved remotely. The experimental results confirm high throughput transmission (> 10 Gbps) with electrical data signals with up to 3GHz bandwidth in the 24 GHz RF band (K-band). Wireless transmission of 16QAM-modulated, 1.5 GHz-wide signals is demonstrated in the laboratory from –26˚ to 33˚ providing a scanning range of 59˚

Pure OAI Repository

Scientific Application Acceleration Utilizing Heterogeneous Architectures

Author: Weill Edwin
Publication venue: Clemson University Libraries
Publication date: 01/12/2014
Field of study

Within the past decade, there have been substantial leaps in computer architectures to exploit the parallelism that is inherently present in many applications. The scientific community has benefited from the emergence of not only multi-core processors, but also other, less traditional architectures including general purpose graphical processing units (GPGPUs), field programmable gate arrays (FPGAs), and Intel\u27s many integrated cores (MICs) architecture (i.e. Xeon Phi). The popularity of the GPGPU has increased rapidly because of their ability to perform massive amounts of parallel computation quickly and at low cost with an ease of programmability. Also, with the addition of high-level programming interfaces for these devices, technical and non-technical individuals can interface with the device and rapidly obtain improved performance for many algorithms. Many applications can take advantage of the parallelism present in distributed computing and multithreading to achieve higher levels of performance for the computationally intensive parts of the application. The work presented in this thesis implements three applications for use in a performance study of the GPGPU architecture and multi-GPGPU systems. The first application study in this research is a K-Means clustering algorithm that categorizes each data point into the closest cluster. The second algorithm implemented is a spiking neural network algorithm that is used as a computational model for machine learning. The third, and final, study is the longest common subsequences problem, which attempts to enumerate comparisons between sequences (namely, DNA sequences). The results for the aforementioned applications with varying problem sizes and architectural configurations are presented and discussed in this thesis. The K-Means clustering algorithm achieved approximately 97x speedup when utilizing an architecture consisting of 32 CPU/GPGPU pairs. To achieve this substantial speedup, up to 750,000 data points were used with up 30,000 centroids (means). The spiking neural network algorithm resulted in speedups of about 33x for the entire algorithm and 160x for each iteration with a two-level network with 1000 total neurons (800 excitatory and 200 inhibitory neurons). The longest common subsequences problem achieved speedup of greater than 10x with 100 random sequences up to 500 characters in length. The maximum speedup values for each application were achieved by utilizing the GPGPU as well as multi-core devices simultaneously. The computations were scattered over multiple CPU/GPGPU pairs with the computationally intensive pieces of the algorithms offloaded onto the GPGPU device. The research in this thesis illustrates the ability to scale a heterogeneous cluster (i.e. CPUs and GPUs working collaboratively) for large-scale scientific application performance improvements. Each algorithm demonstrates slightly different types of computations and communications, which can be compared to other algorithms to predict how they would perform on an accelerator. The results show that substantial speedups can be achieved for scientific applications when utilizing the GPGPU and multi-core architectures

Clemson University: TigerPrints

Voronoi-based space partitioning for coordinated multi-robot exploration

Author: García García Miguel Ángel
Puig Valls Domenec
Solé Ribalta Albert
Wu Ling
Publication venue: 'Universidad de Alicante Servicio de Publicaciones'
Publication date: 01/01/2007
Field of study

Recent multi-robot exploration algorithms usually rely on occupancy grids as their core world representation. However, those grids are not appropriate for environments that are very large or whose boundaries are not well delimited from the beginning of the exploration. In contrast, polygonal representations do not have such limitations. Previously, the authors have proposed a new exploration algorithm based on partitioning unknown space into as many regions as available robots by applying K-Means clustering to an occupancy grid representation, and have shown that this approach leads to higher robot dispersion than other approaches, which is potentially beneficial for quick coverage of wide areas. In this paper, the original K-Means clustering applied over grid cells, which is the most expensive stage of the aforementioned exploration algorithm, is substituted for a Voronoi-based partitioning algorithm applied to polygons. The computational cost of the exploration algorithm is thus significantly reduced for large maps. An empirical evaluation and comparison of both partitioning approaches is presented.This work is partially supported by the Government of Spain under MCYT DPI2004-07993-C03-03. Ling Wu is supported by a FPI scholarship from the Spanish Ministry of Education and Science

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Biblos-e Archivo