Search CORE

212,277 research outputs found

A PARTAN-Accelerated Frank-Wolfe Algorithm for Large-Scale SVM Classification

Author: Frandi Emanuele
Nanculef Ricardo
Suykens Johan A. K.
Publication venue
Publication date: 05/02/2015
Field of study

Frank-Wolfe algorithms have recently regained the attention of the Machine Learning community. Their solid theoretical properties and sparsity guarantees make them a suitable choice for a wide range of problems in this field. In addition, several variants of the basic procedure exist that improve its theoretical properties and practical performance. In this paper, we investigate the application of some of these techniques to Machine Learning, focusing in particular on a Parallel Tangent (PARTAN) variant of the FW algorithm that has not been previously suggested or studied for this type of problems. We provide experiments both in a standard setting and using a stochastic speed-up technique, showing that the considered algorithms obtain promising results on several medium and large-scale benchmark datasets for SVM classification

arXiv.org e-Print Archive

Crossref

A survey of parallel hybrid applications to the permutation flow shop scheduling problem and similar problems

Author: González Rodríguez Pedro Luis
León Blanco José Miguel
Molina Pariente José Manuel
Pérez González Paz
Publication venue: ADINGOR
Publication date: 01/01/2009
Field of study

Parallel algorithms have focused an increased interest due to advantages in computation time and quality of solutions when applied to industrial engineering problems. This communication is a survey and classification of works in the field of hybrid algorithms implemented in parallel and applied to combinatorial optimization problems similar to the to the permutation flowshop problem with the objective of minimizing the makespan, Fm|prmu|Cmax according to the Graham notation, the travelling salesman problem (TSP), the quadratic assignment problem (QAP) and, in general, those whose solution can be expressed as a permutation

idUS. Depósito de Investigación Universidad de Sevilla

Genetic Algorithm Modeling with GPU Parallel Computing Technology

Author: Brescia Massimo
Cavuoti Stefano
Garofalo Mauro
Longo Giuseppe
Pescapé Antonio
Ventre Giorgio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability.Comment: 11 pages, 2 figures, refereed proceedings; Neural Nets and Surroundings, Proceedings of 22nd Italian Workshop on Neural Nets, WIRN 2012; Smart Innovation, Systems and Technologies, Vol. 19, Springe

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

Parallel Processing of Large Graphs

Author: Indyk Wojciech
Kajdanowicz Tomasz
Kazienko Przemyslaw
Publication venue
Publication date: 03/06/2013
Field of study

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.Comment: Preprint submitted to Future Generation Computer System

arXiv.org e-Print Archive

CiteSeerX

Box Drawings for Learning with Imbalanced Data

Author: Abe N.
Chawla N. V.
Qi Y.
Sniadecki J.
Wu G.
Publication venue
Publication date: 07/06/2014
Field of study

The vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly imbalanced classification problems. The classifiers constructed by both methods are created as unions of parallel axis rectangles around the positive examples, and thus have the benefit of being interpretable. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. Regularization is introduced to improve generalization performance. The second method uses an approximation in order to assist with scalability. Specifically, it follows a \textit{characterize then discriminate} approach, where the positive class is characterized first by boxes, and then each box boundary becomes a separate discriminative classifier. This method has the computational advantages that it can be easily parallelized, and considers only the relevant regions of feature space

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Distributed Correlation-Based Feature Selection in Spark

Author: Alonso-Betanzos Amparo
de-Marcos Luis
Palma-Mendoza Raul-Jose
Rodriguez Daniel
Publication venue: 'Elsevier BV'
Publication date: 31/01/2019
Field of study

CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Multiobjective Feature Selection of Microarray Data via Distributed Parallel Algorithms

Author: Andrew Simpson
Antonio
Bin Cao
Bolón-Canedo
Brest
Cao
Das
Deb
Emary
Georganos
Gong
Gu
Holland
Huang
Irfan Mehmood
Jianwei Zhao
Jun Qi
Kennedy
Khan Muhammad
Lazar
Lin
Mohamed Elhoseny
Oh
Onan
Peng Yang
Po Yang
Potter
Price
Shi
Storn
Vergara
Wang
Xin Liu
Xue
Xue
Xue
Zhang
Zhang
Zhang
Zhang
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

Many real-world problems are large scale and hence difficult to address. Due to the large number of features in microarray datasets, feature selection and classification are even more challenging. Although there are numerous features, not all features contribute to the classification, and some features are even impeditive. Through feature selection, a feature subset that contains only a small quantity of essential features is generated, which can increase the classification accuracy and significantly reduce the time consumption. In this paper, we construct a multiobjective feature selection model that simultaneously considers classification error, feature number and feature redundancy. For this model, we propose several distributed parallel algorithms through different encodings and an adaptive strategy. Additionally, to reduce the time consumption, various tactics are employed, including feature number constraint, distributed parallelism and sample-wise parallelism. For a batch of microarray datasets, the proposed algorithms are superior to several state-of-the-art multiobjective evolutionary algorithms in terms of both effectiveness and efficiency

LJMU Research Online (Liverpool John Moores University)

Crossref

White Rose Research Online