Search CORE

5,287 research outputs found

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Author: Francesco Marcelloni
Michela Fazzolari
Pietro Ducange
Publication venue
Publication date: 10/03/2020
Field of study

AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

Open Access Repository

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database

Learning nonlinear monotone classifiers using the Choquet Integral

Author: Fallah Tehrani Ali
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2014
Field of study

In der jüngeren Vergangenheit hat das Lernen von Vorhersagemodellen, die eine monotone Beziehung zwischen Ein- und Ausgabevariablen garantieren, wachsende Aufmerksamkeit im Bereich des maschinellen Lernens erlangt. Besonders für flexible nichtlineare Modelle stellt die Gewährleistung der Monotonie eine große Herausforderung für die Umsetzung dar. Die vorgelegte Arbeit nutzt das Choquet Integral als mathematische Grundlage für die Entwicklung neuer Modelle für nichtlineare Klassifikationsaufgaben. Neben den bekannten Einsatzgebieten des Choquet-Integrals als flexible Aggregationsfunktion in multi-kriteriellen Entscheidungsverfahren, findet der Formalismus damit Eingang als wichtiges Werkzeug für Modelle des maschinellen Lernens. Neben dem Vorteil, Monotonie und Flexibilität auf elegante Weise mathematisch vereinbar zu machen, bietet das Choquet-Integral Möglichkeiten zur Quantifizierung von Wechselwirkungen zwischen Gruppen von Attributen der Eingabedaten, wodurch interpretierbare Modelle gewonnen werden können. In der Arbeit werden konkrete Methoden für das Lernen mit dem Choquet Integral entwickelt, welche zwei unterschiedliche Ansätze nutzen, die Maximum-Likelihood-Schätzung und die strukturelle Risikominimierung. Während der erste Ansatz zu einer Verallgemeinerung der logistischen Regression führt, wird der zweite mit Hilfe von Support-Vektor-Maschinen realisiert. In beiden Fällen wird das Lernproblem imWesentlichen auf die Parameter-Identifikation von Fuzzy-Maßen für das Choquet Integral zurückgeführt. Die exponentielle Anzahl von Freiheitsgraden zur Modellierung aller Attribut-Teilmengen stellt dabei besondere Herausforderungen im Hinblick auf Laufzeitkomplexität und Generalisierungsleistung. Vor deren Hintergrund werden die beiden Ansätze praktisch bewertet und auch theoretisch analysiert. Zudem werden auch geeignete Verfahren zur Komplexitätsreduktion und Modellregularisierung vorgeschlagen und untersucht. Die experimentellen Ergebnisse sind auch für anspruchsvolle Referenzprobleme im Vergleich mit aktuellen Verfahren sehr gut und heben die Nützlichkeit der Kombination aus Monotonie und Flexibilität des Choquet Integrals in verschiedenen Ansätzen des maschinellen Lernens hervor

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

CPS Data Streams Analytics based on Machine Learning for Cloud and Fog Computing: A Survey

Author: Chao Kuo-Ming
Fei Xiang
James Anne
Lewandowski Jacek
Sanchez-Anguix Victor
Shah Nazaraf
Usman Zahid
Verba Nandor
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Cloud and Fog computing has emerged as a promising paradigm for the Internet of things (IoT) and cyber-physical systems (CPS). One characteristic of CPS is the reciprocal feedback loops between physical processes and cyber elements (computation, software and networking), which implies that data stream analytics is one of the core components of CPS. The reasons for this are: (i) it extracts the insights and the knowledge from the data streams generated by various sensors and other monitoring components embedded in the physical systems; (ii) it supports informed decision making; (iii) it enables feedback from the physical processes to the cyber counterparts; (iv) it eventually facilitates the integration of cyber and physical systems. There have been many successful applications of data streams analytics, powered by machine learning techniques, to CPS systems. Thus, it is necessary to have a survey on the particularities of the application of machine learning techniques to the CPS domain. In particular, we explore how machine learning methods should be deployed and integrated in cloud and fog architectures for better fulfilment of the requirements, e.g. mission criticality and time criticality, arising in CPS domains. To the best of our knowledge, this paper is the ﬁrst to systematically study machine learning techniques for CPS data stream analytics from various perspectives, especially from a perspective that leads to the discussion and guidance of how the CPS machine learning methods should be deployed in a cloud and fog architecture

Crossref

Nottingham Trent Institutional Repository (IRep)

Coventry University Pure Portal

An overview of decision table literature 1982-1995.

Author: Vanthienen Jan
Verhelle M
Publication venue
Publication date
Field of study

This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

Research Papers in Economics

Recommended from our members

Context-awareness for mobile sensing: a survey and future directions

Author: Leung Kin K
Leung Victor C M
Liu Chi Harold
Moreno Wilfrido
Sheng Zhengguo
Yurur Ozgur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The evolution of smartphones together with increasing computational power have empowered developers to create innovative context-aware applications for recognizing user related social and cognitive activities in any situation and at any location. The existence and awareness of the context provides the capability of being conscious of physical environments or situations around mobile device users. This allows network services to respond proactively and intelligently based on such awareness. The key idea behind context-aware applications is to encourage users to collect, analyze and share local sensory knowledge in the purpose for a large scale community use by creating a smart network. The desired network is capable of making autonomous logical decisions to actuate environmental objects, and also assist individuals. However, many open challenges remain, which are mostly arisen due to the middleware services provided in mobile devices have limited resources in terms of power, memory and bandwidth. Thus, it becomes critically important to study how the drawbacks can be elaborated and resolved, and at the same time better understand the opportunities for the research community to contribute to the context-awareness. To this end, this paper surveys the literature over the period of 1991-2014 from the emerging concepts to applications of context-awareness in mobile platforms by providing up-to-date research and future research directions. Moreover, it points out the challenges faced in this regard and enlighten them by proposing possible solutions

Sussex Research Online

Adaptive Feature Engineering Modeling for Ultrasound Image Classification for Decision Support

Author: Mugasa Hatwib
Publication venue: Louisiana Tech Digital Commons
Publication date: 01/10/2019
Field of study

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually significantly underrepresented compared to the non-target class. This makes it difficult to train standard classification models like Support Vector Machine (SVM), Decision Trees, and Nearest Neighbor techniques on biomedical datasets because they assume an equal class distribution or an equal misclassification cost. Resampling techniques by either oversampling the minority class or under-sampling the majority class have been proposed to mitigate the class imbalance problem but with minimal success. We propose a method of resolving the class imbalance problem with the design of a novel data-adaptive feature engineering model for extracting, selecting, and transforming textural features into a feature space that is inherently relevant to the application domain. We hypothesize that by maximizing the variance and preserving as much variability in well-engineered features prior to applying a classifier model will boost the differentiation of the thyroid nodules (benign or malignant) through effective model building. Our proposed a hybrid approach of applying Regression and Rule-Based techniques to build our Feature Engineering and a Bayesian Classifier respectively. In the Feature Engineering model, we transformed images pixel intensity values into a high dimensional structured dataset and fitting a regression analysis model to estimate relevant kernel parameters to be applied to the proposed filter method. We adopted an Elastic Net Regularization path to control the maximum log-likelihood estimation of the Regression model. Finally, we applied a Bayesian network inference to estimate a subset for the textural features with a significant conditional dependency in the classification of the thyroid lesion. This is performed to establish the conditional influence on the textural feature to the random factors generated through our feature engineering model and to evaluate the success criterion of our approach. The proposed approach was tested and evaluated on a public dataset obtained from thyroid cancer ultrasound diagnostic data. The analyses of the results showed that the classification performance had a significant improvement overall for accuracy and area under the curve when then proposed feature engineering model was applied to the data. We show that a high performance of 96.00% accuracy with a sensitivity and specificity of 99.64%) and 90.23% respectively was achieved for a filter size of 13 × 13

Louisiana Tech Digital Commons

Analysis of Atrial Electrograms

Author: Schilling Christopher
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2012
Field of study

This work provides methods to measure and analyze features of atrial electrograms - especially complex fractionated atrial electrograms (CFAEs) - mathematically. Automated classification of CFAEs into clinical meaningful classes is applied and the newly gained electrogram information is visualized on patient specific 3D models of the atria. Clinical applications of the presented methods showed that quantitative measures of CFAEs reveal beneficial information about the underlying arrhythmia

KITopen

Directory of Open Access Books (DOAB)