Search CORE

10 research outputs found

Unanimous Voting using Support Vector Machines

Author: Nalbantov G.I.
Smirnov E.N.
Sprinkhuizen-Kuyper I.G.
Publication venue: s.n.
Publication date: 01/01/2005
Field of study

ARTS repository - University of Groningen

Transductive Learning for Spatial Data Classification

Author: A. Appice
A. Frank
A. Gammerman
A. Mukerjee
D. Malerba
D. Malerba
D. Malerba
D. Malerba
D. McIver
F. Esposito
G. Góra
J. Han
J. Sander
J.A. Robinson
K. Koperski
K.P. Bennett
L. Džeroski
L. Raedt De
L. Raedt De
M. Ceci
M. Ceci
M. Ceci
M. Ester
M. Krogel
M. Kukar
M.-A. Krogel
M.J. Egenhofer
N. Lavrač
P. Legendre
R.S. Michalski
S. Muggleton
S. Shekhar
S. Shekhar
S. Shekhar
T. Joachims
T. Joachims
T. Mitchell
V. Vapnik
V. Vapnik
W. Klösgen
Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented

Crossref

Archivio istituzionale della ricerca - Università di Bari

Kent Academic Repository

Development of a Machine Learning-Based Financial Risk Control System

Author: Hu Zhigang
Publication venue: DigitalCommons@USU
Publication date: 01/05/2022
Field of study

With the gradual end of the COVID-19 outbreak and the gradual recovery of the economy, more and more individuals and businesses are in need of loans. This demand brings business opportunities to various financial institutions, but also brings new risks. The traditional loan application review is mostly manual and relies on the business experience of the auditor, which has the disadvantages of not being able to process large quantities and being inefficient. Since the traditional audit processing method is no longer suitable some other method of reducing the rate of non-performing loans and detecting fraud in applications is urgently needed by financial institutions. In this project, a financial risk control model is built by using various machine learning algorithms. The model is used to replace the traditional manual approach to review loan applications. It improves the speed of review as well as the accuracy and approval rate of the review. Machine learning algorithms were also used in this project to create a loan user scorecard system that better reflects changes in user information compared to the credit card systems used by financial institutions today. In this project, the data imbalance problem and the performance improvement problem are also explored

DigitalCommons@USU

Content-based Information Retrieval via Nearest Neighbor Search

Author: Huang Yinjie
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2016
Field of study

Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process. On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data\u27s similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches. On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

Contributions to evaluation of machine learning models. Applicability domain of classification models

Author: Rado Omesaad A.M.
Publication venue: Faculty of Engineering and Informatics
Publication date: 01/01/2019
Field of study

Artificial intelligence (AI) and machine learning (ML) present some application opportunities and challenges that can be framed as learning problems. The performance of machine learning models depends on algorithms and the data. Moreover, learning algorithms create a model of reality through learning and testing with data processes, and their performance shows an agreement degree of their assumed model with reality. ML algorithms have been successfully used in numerous classification problems. With the developing popularity of using ML models for many purposes in different domains, the validation of such predictive models is currently required more formally. Traditionally, there are many studies related to model evaluation, robustness, reliability, and the quality of the data and the data-driven models. However, those studies do not consider the concept of the applicability domain (AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This work investigates the robustness of ML classification models from the applicability domain perspective. A standard definition of applicability domain regards the spaces in which the model provides results with specific reliability. The main aim of this study is to investigate the connection between the applicability domain approach and the classification model performance. We are examining the usefulness of assessing the AD for the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using three approaches, and these approaches are conducted in three various attempts: firstly, assessing the applicability domain for the classification model; secondly, investigating the robustness of the classification model based on the applicability domain approach; thirdly, selecting an optimal model using Pareto optimality. The experiments in this work are illustrated by considering different machine learning algorithms for binary and multi-class classifications for healthcare datasets from public benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the classification of data in the classification stage. The feature selection method is applied to choose features for classification. The obtained classifiers are used in the third approach for selection of models using Pareto optimality. The second approach is implemented using three steps; namely, building classification model; generating synthetic data; and evaluating the obtained results. The results obtained from the study provide an understanding of how the proposed approach can help to define the model’s robustness and the applicability domain, for providing reliable outputs. These approaches open opportunities for classification data and model management. The proposed algorithms are implemented through a set of experiments on classification accuracy of instances, which fall in the domain of the model. For the first approach, by considering all the features, the highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds average. For the robustness of the classification model based on the applicability domain approach, the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is 0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality, the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average. This research investigates critical aspects of the applicability domain as related to the robustness of classification ML algorithms. However, the performance of machine learning techniques depends on the degree of reliable predictions of the model. In the literature, the robustness of the ML model can be defined as the ability of the model to provide the testing error close to the training error. Moreover, the properties can describe the stability of the model performance when being tested on the new datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and tested the use of this concept with some case studies on health-related public benchmark datasets.Ministry of Higher Education in Liby

Bradford Scholars

Reliability of Extreme Learning Machines

Author: Neumann Klaus
Publication venue: Bielefeld University Library
Publication date: 01/01/2014
Field of study

Neumann K. Reliability of Extreme Learning Machines. Bielefeld: Bielefeld University Library; 2014.The reliable application of machine learning methods becomes increasingly important in challenging engineering domains. In particular, the application of extreme learning machines (ELM) seems promising because of their apparent simplicity and the capability of very efficient processing of large and high-dimensional data sets. However, the ELM paradigm is based on the concept of single hidden-layer neural networks with randomly initialized and fixed input weights and is thus inherently unreliable. This black-box character usually repels engineers from application in potentially safety critical tasks. The problem becomes even more severe since, in principle, only sparse and noisy data sets can be provided in such domains. The goal of this thesis is therefore to equip the ELM approach with the abilities to perform in a reliable manner. This goal is approached in three aspects by enhancing the robustness of ELMs to initializations, make ELMs able to handle slow changes in the environment (i.e. input drifts), and allow the incorporation of continuous constraints derived from prior knowledge. It is shown in several diverse scenarios that the novel ELM approach proposed in this thesis ensures a safe and reliable application while simultaneously sustaining the full modeling power of data-driven methods

Publications at Bielefeld University

Proceedings. 16. Workshop Computational Intelligence, Dortmund, 29. Nov.-1. Dez. 2006

Author: Mikut Ralf
Reischl Markus
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2006
Field of study

These proceedings contain the papers of the 16th Workshop Computational Intelligence. It was organized by the Working Group 5.14 of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) and the Working Group Fuzzy-Systems and Soft-Computing of the Gesellschaft für Informatik (GI)

KITopen

Histograms: An educational eye

Author: Boels Léone Bernadette Martha Maria
Publication venue: 'The Graduate School of the Humanities, Utrecht University'
Publication date: 20/09/2023
Field of study

Many high-school students are not able to draw justified conclusions from statistical data in histograms. A literature review showed that most misinterpretations of histograms are related to difficulties with two statistical key concepts: data and distribution. The review also pointed to a lack of knowledge about students’ strategies when solving histogram tasks. As the literature provided little guidance for the design of lesson materials, several studies were conducted in preparation. In a first study, five solution strategies were found through qualitative analysis of students’ gazes when solving histograms and case-value plot tasks. Quantitative analysis of several histogram tasks through a mathematical model and a machine learning algorithm confirmed these results, which implied that these strategies could reliably and automatically be identified. Literature also suggested that dotplot tasks can support students’ learning to interpret histograms. Therefore, gazes on histogram tasks were compared before and after students solved dotplot tasks. The "after" tasks contained more gazes associated with correct strategies and fewer gazes associated with incorrect strategies. Although answers did not improve significantly, students’ verbal descriptions suggest that some students changed to a correct strategy. Newly designed materials thus started with dotplot tasks. From the previous studies, we conjectured that students lacked embodied experiences with actions related to histograms. Designed from an embodied instrumentation perspective, the tested materials provide starting points for scaling up. Together, the studies address the knowledge gaps identified in the literature. The studies contribute to knowledge about learning histograms and use in statistics education of eye-tracking research, interpretable models and machine learning algorithms, and embodied instrumentation design

Utrecht University Repository

Reliable Classifications with Machine Learning

Author: D. E. Rumelhart
G. A. Diamond
I. J. Taneja
I. Kononenko
J. Ortega
K. M. Ting
K. Nigam
M. Kukar
M. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref