Search CORE

13,874 research outputs found

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

Automated construction of a hierarchy of self-organized neural network classifiers

Author: Ruda Harald
Snorrason Magnus
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/10/1996
Field of study

This paper documents an effort to design and implement a neural network-based, automatic classification system which dynamically constructs and trains a decision tree. The system is a combination of neural network and decision tree technology. The decision tree is constructed to partition a large classification problem into smaller problems. The neural network modules then solve these smaller problems. We used a variant of the Fuzzy ARTMAP neural network which can be trained much more quickly than traditional neural networks. The research extends the concept of self-organization from within the neural network to the overall structure of the dynamically constructed decision hierarchy. The primary advantage is avoidance of manual tedium and subjective bias in constructing decision hierarchies. Additionally, removing the need for manual construction of the hierarchy opens up a large class of potential classification applications. When tested on data from real-world images, the automatically generated hierarchies performed slightly better than an intuitive (handbuilt) hierarchy. Because the neural networks at the nodes of the decision hierarchy are solving smaller problems, generalization performance can really be improved if the number of features used to solve these problems is reduced. Algorithms for automatically selecting which features to use for each individual classification module were also implemented. We were able to achieve the same level of performance as in previous manual efforts, but in an efficient, automatic manner. The technology developed has great potential in a number of commercial areas, including data mining, pattern recognition, and intelligent interfaces for personal computer applications. Sample applications include: fraud detection, bankruptcy prediction, data mining agent, scalable object recognition system, email agent, resource librarian agent, and a decision aid agent

Boston University Institutional Repository (OpenBU)

A Takagi-Sugeno fuzzy rule-based model for soil moisture retrieval from SAR under soil roughness uncertainty

Author: De Baets Bernard
Verhoest Niko
Vernieuwe Hilde
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Autonomous clustering using rough set theory

Author: A. K. Jain
A. K. Jain
A. Skowron
B. J. F. Manly
B. S. Everitt
C. L. Bean
C. L. Bean
Chandra Kambhampati
Charlotte Bean
D. Dubois
E. W. Forgey
F. H. C. Marriott
F. Höppner
G. H. Ball
J. A. Hartigan
J. B. MacQueen
J. C. Bezdek
J. C. Dunn
J. H. Ward
J. Komorowski
J. S. R. Jang
M. R. Anderberg
M. S. Aldenderfer
M. S. Kamel
P. Sneath
R. C. Jancey
R. R. Sokal
R. R. Yegar
S. Sharma
S. Z. Selim
T. Okuzaki
T. Sorensen
Z. Pawlak
Z. Pawlak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

Repository@Hull - Worktribe

Crossref

Warwick Research Archives Portal Repository

Exploratory Analysis of Multivariate Data (Unsupervised Image Segmentation and Data Driven Linear and Nonlinear Decomposition)

Author: Hilger Klaus Baggesen
Publication venue
Publication date: 01/03/2002
Field of study

Online Research Database In Technology

Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients

Author: Blyuss Oleg
Cherepanova Valeriia
Crnogorac-Jurcevic Tatjana
Duffy Stephen W
Kiseleva Elena M
Munblit Daniel
Prytomanova Olga M
Zaikin Alexey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2019
Field of study

© The Author(s) 2019. Published by Springer Nature on behalf of Cancer Research UK.BACKGROUND: An accurate and simple risk prediction model that would facilitate earlier detection of pancreatic adenocarcinoma (PDAC) is not available at present. In this study, we compare different algorithms of risk prediction in order to select the best one for constructing a biomarker-based risk score, PancRISK. METHODS: Three hundred and seventy-nine patients with available measurements of three urine biomarkers, (LYVE1, REG1B and TFF1) using retrospectively collected samples, as well as creatinine and age, were randomly split into training and validation sets, following stratification into cases (PDAC) and controls (healthy patients). Several machine learning algorithms were used, and their performance characteristics were compared. The latter included AUC (area under ROC curve) and sensitivity at clinically relevant specificity. RESULTS: None of the algorithms significantly outperformed all others. A logistic regression model, the easiest to interpret, was incorporated into a PancRISK score and subsequently evaluated on the whole data set. The PancRISK performance could be even further improved when CA19-9, commonly used PDAC biomarker, is added to the model. CONCLUSION: PancRISK score enables easy interpretation of the biomarker panel data and is currently being tested to confirm that it can be used for stratification of patients at risk of developing pancreatic cancer completely non-invasively, using urine samples.Peer reviewe

Crossref

UCL Discovery

Queen Mary Research Online

King's Research Portal

University of Hertfordshire Research Archive