Search CORE

6 research outputs found

Scalable CAIM Discretization on Multiple GPUs Using Concurrent Kernels

Author: Cano Alberto
Cios Krzysztof J.
Ventura Soto S.
Publication venue
Publication date: 01/01/2017
Field of study

CAIM(Class-Attribute InterdependenceMaximization) is one of the stateof- the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a GPU-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities ofmodernGPUs. The CAIMGPU-basedmodel is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using 4 GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 hours to less than 2 minute

Repositorio Institucional de la Universidad de Córdoba

Introductory Chapter: Data Streams and Online Learning in Social Media

Author: Cano Alberto
Publication venue: 'IntechOpen'
Publication date: 19/02/2020
Field of study

IntechOpen

Crossref

LAIM discretization for multi-label data

Author: Cano Alberto
Gibaja Eva
Luna J.M.
Ventura Soto S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Multi-label learning is a challenging task in data mining which has attracted growing attention in recent years. Despite the fact that many multi-label datasets have continuous features, general algorithms developed specially to transform multi-label datasets with continuous attributes’ values into a finite number of intervals have not been proposed to date. Many classification algorithms require discrete values as the input and studies have shown that supervised discretization may improve classification performance. This paper presents a Label-Attribute Interdependence Maximization (LAIM) discretization method for multi-label data. LAIM is inspired in the discretization heuristic of CAIM for single-label classification. The maximization of the label-attribute interdependence is expected to improve labels prediction in data separated through disjoint intervals. The main aim of this paper is to present a discretization method specifically designed to deal with multi-label data and to analyze whether this can improve the performance of multi-label learning methods. To this end, the experimental analysis evaluates the performance of 12 multi-label learning algorithms (transformation, adaptation, and ensemble-based) on a series of 16 multi-label datasets with and without supervised and unsupervised discretization, showing that LAIM discretization improves the performance for many algorithms and measures

Repositorio Institucional de la Universidad de Córdoba