13,874 research outputs found
On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems
We present a new distributed fuzzy partitioning method to reduce the
complexity of multi-way fuzzy decision trees in Big Data classification
problems. The proposed algorithm builds a fixed number of fuzzy sets for all
variables and adjusts their shape and position to the real distribution of
training data. A two-step process is applied : 1) transformation of the
original distribution into a standard uniform distribution by means of the
probability integral transform. Since the original distribution is generally
unknown, the cumulative distribution function is approximated by computing the
q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy
partition in the transformed attribute space using a fixed number of equally
distributed triangular membership functions. Despite the aforementioned
transformation, the definition of every fuzzy set in the original space can be
recovered by applying the inverse cumulative distribution function (also known
as quantile function). The experimental results reveal that the proposed
methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT)
induction algorithm to maintain classification accuracy with up to 6 million
fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData
Congress). arXiv admin note: text overlap with arXiv:1902.0935
Automated construction of a hierarchy of self-organized neural network classifiers
This paper documents an effort to design and implement a neural network-based, automatic classification system which dynamically constructs and trains a decision tree. The system is a combination of neural network and decision tree technology. The decision tree is constructed to partition a large classification problem into smaller problems. The neural network modules then solve these smaller problems. We used a variant of the Fuzzy ARTMAP neural network which can be trained much more quickly than traditional neural networks. The research extends the concept of self-organization from within the neural network to the overall structure of the dynamically constructed decision hierarchy. The primary advantage is avoidance of manual tedium and subjective bias in constructing decision hierarchies. Additionally, removing the need for manual construction of the hierarchy opens up a large class of potential classification applications. When tested on data from real-world images, the automatically generated hierarchies performed slightly better than an intuitive (handbuilt) hierarchy. Because the neural networks at the nodes of the decision hierarchy are solving smaller problems, generalization performance can really be improved if the number of features used to solve these problems is reduced. Algorithms for automatically selecting which features to use for each individual classification module were also implemented. We were able to achieve the same level of performance as in previous manual efforts, but in an efficient, automatic manner. The technology developed has great potential in a number of commercial areas, including data mining, pattern recognition, and intelligent interfaces for personal computer applications. Sample applications include: fraud detection, bankruptcy prediction, data mining agent, scalable object recognition system, email agent, resource librarian agent, and a decision aid agent
Autonomous clustering using rough set theory
This paper proposes a clustering technique that minimises the need for subjective
human intervention and is based on elements of rough set theory. The proposed algorithm is
unified in its approach to clustering and makes use of both local and global data properties to
obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and
results from three data sets of single and mixed attribute types are used to illustrate the
technique and establish its efficiency
Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients
© The Author(s) 2019. Published by Springer Nature on behalf of Cancer Research UK.BACKGROUND: An accurate and simple risk prediction model that would facilitate earlier detection of pancreatic adenocarcinoma (PDAC) is not available at present. In this study, we compare different algorithms of risk prediction in order to select the best one for constructing a biomarker-based risk score, PancRISK. METHODS: Three hundred and seventy-nine patients with available measurements of three urine biomarkers, (LYVE1, REG1B and TFF1) using retrospectively collected samples, as well as creatinine and age, were randomly split into training and validation sets, following stratification into cases (PDAC) and controls (healthy patients). Several machine learning algorithms were used, and their performance characteristics were compared. The latter included AUC (area under ROC curve) and sensitivity at clinically relevant specificity. RESULTS: None of the algorithms significantly outperformed all others. A logistic regression model, the easiest to interpret, was incorporated into a PancRISK score and subsequently evaluated on the whole data set. The PancRISK performance could be even further improved when CA19-9, commonly used PDAC biomarker, is added to the model. CONCLUSION: PancRISK score enables easy interpretation of the biomarker panel data and is currently being tested to confirm that it can be used for stratification of patients at risk of developing pancreatic cancer completely non-invasively, using urine samples.Peer reviewe
- …