2,799 research outputs found
Re-considering the status quo: Improving calibration of land use change models through validation of transition potential predictions
The increasing complexity of the dynamics captured in Land Use and Land Cover (LULC) change modelling has made model behaviour less transparent and calibration more extensive. For cellular automata models in particular, this is compounded by the fact that validation is typically performed indirectly, using final simulated change maps; rather than directly considering the probabilistic predictions of transition potential. This study demonstrates that evaluating transition potential predictions provides detail into model behaviour and performance that cannot be obtained from simulated map comparison alone. This is illustrated by modelling LULC transitions in Switzerland using both Logistic Regression and Random Forests. The results emphasize the need for LULC modellers to explicitly consider the performance of individual transition models independently to ensure robust predictions. Additionally, this study highlights the potential for predictor variable selection as a means to improve transition model generalizability and parsimony, which is beneficial for simulating future LULC change
A Comprehensive Analysis on Risk Prediction of Heart Disease using Machine Learning Models
Most of the deaths worldwide are caused by heart disease and the disease has become a major cause of morbidity for many people. In order to prevent such deaths, the mortality rate can be greatly reduced through regular monitoring and early detection of heart disease. Heart disease diagnosis has grown to be a challenging task in the field of clinically provided data analysis. Predicting heart disease is a highly demanding and challenging task with pure accuracy, but it is easy to figure out using advanced Machine Learning (ML) techniques. A Machine Learning approach has been shown to predict heart disease in this approach. By doing this, the disease can be predicted early and the mortality rate and severity can be reduced. The application of machine learning techniques is advancing significantly in the medical field. Interpreting these analyzes in this methodology, which has been shown to specifically aim to discover important features of heart disease by providing ML algorithms for predicting heart disease, has resulted in improved predictive accuracy. The model is trained using classification algorithms such as Decision Tree (DT), K-Nearest Neighbors (K-NN), Random Forest (RF), Support Vector Machine (SVM). The performance of these four algorithms is quantified in different aspects such as accuracy, precision, recall and specificity. SVM has been shown to provide the best performance in this approach for different algorithms although the accuracy varies in different cases
Breast cancer classification using machine learning techniques: a comparative study
Background: The second leading deadliest disease affecting women worldwide, after lung cancer, is breast cancer. Traditional approaches for breast cancer diagnosis suffer from time consumption and some human errors in classification. To deal with this problems, many research works based on machine learning techniques are proposed. These approaches show their effectiveness in data classification in many fields, especially in healthcare.
Methods: In this cross sectional study, we conducted a practical comparison between the most used machine learning algorithms in the literature. We applied kernel and linear support vector machines, random forest, decision tree, multi-layer perceptron, logistic regression, and k-nearest neighbors for breast cancer tumors classification. The used dataset is Wisconsin diagnosis Breast Cancer.
Results: After comparing the machine learning algorithms efficiency, we noticed that multilayer perceptron and logistic regression gave the best results with an accuracy of 98% for breast cancer classification.
Conclusion: Machine learning approaches are extensively used in medical prediction and decision support systems. This study showed that multilayer perceptron and logistic regression algorithms are performant ( good accuracy specificity and sensitivity) compared to the other evaluated algorithms
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system
A number of representation schemes have been presented for use within
learning classifier systems, ranging from binary encodings to neural networks.
This paper presents results from an investigation into using discrete and fuzzy
dynamical system representations within the XCSF learning classifier system. In
particular, asynchronous random Boolean networks are used to represent the
traditional condition-action production system rules in the discrete case and
asynchronous fuzzy logic networks in the continuous-valued case. It is shown
possible to use self-adaptive, open-ended evolution to design an ensemble of
such dynamical systems within XCSF to solve a number of well-known test
problems
Efficient transfer entropy analysis of non-stationary neural time series
Information theory allows us to investigate information processing in neural
systems in terms of information transfer, storage and modification. Especially
the measure of information transfer, transfer entropy, has seen a dramatic
surge of interest in neuroscience. Estimating transfer entropy from two
processes requires the observation of multiple realizations of these processes
to estimate associated probability density functions. To obtain these
observations, available estimators assume stationarity of processes to allow
pooling of observations over time. This assumption however, is a major obstacle
to the application of these estimators in neuroscience as observed processes
are often non-stationary. As a solution, Gomez-Herrero and colleagues
theoretically showed that the stationarity assumption may be avoided by
estimating transfer entropy from an ensemble of realizations. Such an ensemble
is often readily available in neuroscience experiments in the form of
experimental trials. Thus, in this work we combine the ensemble method with a
recently proposed transfer entropy estimator to make transfer entropy
estimation applicable to non-stationary time series. We present an efficient
implementation of the approach that deals with the increased computational
demand of the ensemble method's practical application. In particular, we use a
massively parallel implementation for a graphics processing unit to handle the
computationally most heavy aspects of the ensemble method. We test the
performance and robustness of our implementation on data from simulated
stochastic processes and demonstrate the method's applicability to
magnetoencephalographic data. While we mainly evaluate the proposed method for
neuroscientific data, we expect it to be applicable in a variety of fields that
are concerned with the analysis of information transfer in complex biological,
social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON
JIDT: An information-theoretic toolkit for studying the dynamics of complex systems
Complex systems are increasingly being viewed as distributed information
processing systems, particularly in the domains of computational neuroscience,
bioinformatics and Artificial Life. This trend has resulted in a strong uptake
in the use of (Shannon) information-theoretic measures to analyse the dynamics
of complex systems in these fields. We introduce the Java Information Dynamics
Toolkit (JIDT): a Google code project which provides a standalone, (GNU GPL v3
licensed) open-source code implementation for empirical estimation of
information-theoretic measures from time-series data. While the toolkit
provides classic information-theoretic measures (e.g. entropy, mutual
information, conditional mutual information), it ultimately focusses on
implementing higher-level measures for information dynamics. That is, JIDT
focusses on quantifying information storage, transfer and modification, and the
dynamics of these operations in space and time. For this purpose, it includes
implementations of the transfer entropy and active information storage, their
multivariate extensions and local or pointwise variants. JIDT provides
implementations for both discrete and continuous-valued data for each measure,
including various types of estimator for continuous data (e.g. Gaussian,
box-kernel and Kraskov-Stoegbauer-Grassberger) which can be swapped at run-time
due to Java's object-oriented polymorphism. Furthermore, while written in Java,
the toolkit can be used directly in MATLAB, GNU Octave, Python and other
environments. We present the principles behind the code design, and provide
several examples to guide users.Comment: 37 pages, 4 figure
Active Learning for Reducing Labeling Effort in Text Classification Tasks
Labeling data can be an expensive task as it is usually performed manually by
domain experts. This is cumbersome for deep learning, as it is dependent on
large labeled datasets. Active learning (AL) is a paradigm that aims to reduce
labeling effort by only using the data which the used model deems most
informative. Little research has been done on AL in a text classification
setting and next to none has involved the more recent, state-of-the-art Natural
Language Processing (NLP) models. Here, we present an empirical study that
compares different uncertainty-based algorithms with BERT as the used
classifier. We evaluate the algorithms on two NLP classification datasets:
Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore
heuristics that aim to solve presupposed problems of uncertainty-based AL;
namely, that it is unscalable and that it is prone to selecting outliers.
Furthermore, we explore the influence of the query-pool size on the performance
of AL. Whereas it was found that the proposed heuristics for AL did not improve
performance of AL; our results show that using uncertainty-based AL with
BERT outperforms random sampling of data. This difference in
performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference
on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine
Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to
BNAIC/BENELEARN, adds several improvements including a more thorough
discussion of related work plus an extended discussion section. 28 pages
including references and appendice
- …