Search CORE

10,320 research outputs found

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

Mass assignment fuzzy ID3 with applications

Author: Baldwin JF
Lawry J
Martin TP
Publication venue
Publication date: 01/01/1997
Field of study

Explore Bristol Research

Recommended from our members

Class decomposition for GA-based classifier agents – A Pitt approach

Author: Guan SU
Zhu F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2004
Field of study

Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed

Brunel University Research Archive

ScholarBank@NUS

Autonomous clustering using rough set theory

Author: A. K. Jain
A. K. Jain
A. Skowron
B. J. F. Manly
B. S. Everitt
C. L. Bean
C. L. Bean
Chandra Kambhampati
Charlotte Bean
D. Dubois
E. W. Forgey
F. H. C. Marriott
F. Höppner
G. H. Ball
J. A. Hartigan
J. B. MacQueen
J. C. Bezdek
J. C. Dunn
J. H. Ward
J. Komorowski
J. S. R. Jang
M. R. Anderberg
M. S. Aldenderfer
M. S. Kamel
P. Sneath
R. C. Jancey
R. R. Sokal
R. R. Yegar
S. Sharma
S. Z. Selim
T. Okuzaki
T. Sorensen
Z. Pawlak
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

Repository@Hull - Worktribe

Crossref

Warwick Research Archives Portal Repository

Latent class analysis for segmenting preferences of investment bonds

Author: Advances in Business-Related Scientific Research Conference 2011
Camilleri Liberato
Francalanza Helena
Publication venue: GEA College
Publication date: 01/01/2011
Field of study

Market segmentation is a key component of conjoint analysis which addresses consumer preference heterogeneity. Members in a segment are assumed to be homogenous in their views and preferences when worthing an item but distinctly heterogenous to members of other segments. Latent class methodology is one of the several conjoint segmentation procedures that overcome the limitations of aggregate analysis and a-priori segmentation. The main benefit of Latent class models is that market segment membership and regression parameters of each derived segment are estimated simultaneously. The Latent class model presented in this paper uses mixtures of multivariate conditional normal distributions to analyze rating data, where the likelihood is maximized using the EM algorithm. The application focuses on customer preferences for investment bonds described by four attributes; currency, coupon rate, redemption term and price. A number of demographic variables are used to generate segments that are accessible and actionable.peer-reviewe

OAR@UM

Passively mode-locked laser using an entirely centred erbium-doped fiber

Author: Ahmad H
Awang N A
Harun S W
M Semangun Julian
R Azzuhri Saaidal
Zulkifli M Z
Publication venue: 'IOP Publishing'
Publication date: 01/01/2015
Field of study

This paper describes the setup and experimental results for an entirely centred erbium-doped fiber laser with passively mode-locked output. The gain medium of the ring laser cavity configuration comprises a 3 m length of two-core optical fiber, wherein an undoped outer core region of 9.38 μm diameter surrounds a 4.00 μm diameter central core region doped with erbium ions at 400 ppm concentration. The generated stable soliton mode-locking output has a central wavelength of 1533 nm and pulses that yield an average output power of 0.33 mW with a pulse energy of 31.8 pJ. The pulse duration is 0.7 ps and the measured output repetition rate of 10.37 MHz corresponds to a 96.4 ns pulse spacing in the pulse train

UTHM Institutional Repository

The International Islamic University Malaysia Repository

Computing fuzzy rough approximations in large scale information systems

Author: Asfoor Hasan
Cornelis Chris
De Cock Martine
Srinivasan Rajagopalan
Teredesai Ankur
Tolentino Matthew
Vasudevan Gayathri
Verbiest Nele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

University of Washington: UW Tacoma Digital Commons

Crossref

Ghent University Academic Bibliography

An incremental approach to genetic algorithms based classification

Author: Guan SU
Zhu F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

CiteSeerX

Crossref

Brunel University Research Archive

ScholarBank@NUS

On Distributed Fuzzy Decision Trees for Big Data

Author: Marcelloni Francesco
Pedrycz Witold
Segatori Armando
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. © 1993-2012 IEEE

Archivio della Ricerca - Università di Pisa