33,700 research outputs found
A support vector-based interval type-2 fuzzy system
In this paper, a new fuzzy regression model that is supported by support vector regression is presented. Type-2 fuzzy systems are able to tackle applications that have significant uncertainty. However general type-2 fuzzy systems are more complex than type-1 fuzzy systems. Support vector machines are similar to fuzzy systems in that they can also model systems that are non-linear in nature. In the proposed model the consequent parameters of type-2 fuzzy rules are learnt using support vector regression and an efficient closed-form type reduction strategy is used to simplify the computations. Support vector regression improved the generalisation performance of the fuzzy rule-based system in which the fuzzy rules were a set of interpretable IF-THEN rules. The performance of the proposed model was demonstrated by conducting case studies for the non-linear system approximation and prediction of chaotic time series. The model yielded promising results and the simulation results are compared to the results published in the area
A support vector-based interval type-2 fuzzy system
In this paper, a new fuzzy regression model that is supported by support vector regression is presented. Type-2 fuzzy systems are able to tackle applications that have significant uncertainty. However general type-2 fuzzy systems are more complex than type-1 fuzzy systems. Support vector machines are similar to fuzzy systems in that they can also model systems that are non-linear in nature. In the proposed model the consequent parameters of type-2 fuzzy rules are learnt using support vector regression and an efficient closed-form type reduction strategy is used to simplify the computations. Support vector regression improved the generalisation performance of the fuzzy rule-based system in which the fuzzy rules were a set of interpretable IF-THEN rules. The performance of the proposed model was demonstrated by conducting case studies for the non-linear system approximation and prediction of chaotic time series. The model yielded promising results and the simulation results are compared to the results published in the area
Machine learning approaches for tomato crop yield prediction in precision agriculture
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe objective of this project was to apply ML techniques to predict processing tomato crop yield given information on soil properties, weather conditions, and applied fertilizers. Besides being robust enough for predicting tomato productivity, the model needed to be interpretable and transparent for the business. The models assessed were Decision Trees Regression, ensemble bagging models like Random Forest Regression, and boosting techniques like Gradient Boosting Regression, and Support Vector Regression. Overall, Gradient Boosting and Support Vector models presented the best performance. For improving the predictive power, we combined the predictions of our two best models into a stacked approach with a Ridge Regression as the final model. The generalization error of the final chosen model on new data was 9.02 ton/ha for the MAE metric, 9.5% for the MAPE, and 13.5 ton/ha for the RMSE. This means that our model can predict tomato crop yield with an approximate error of 9 ton/ha. Even though our final model was complex and not intrinsically interpretable, we were able to apply model-agnostic interpretation methods like the SHAP summary plot to better understand the feature importance and feature effects, and the Accumulated Local Effects (ALE) plot, to explain how features influence the outcome of the model on average. In general, the objectives of the project were accomplished and the company was satisfied with the result of the model and its interpretation
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Deep Reinforcement Learning (DRL) has achieved impressive success in many
applications. A key component of many DRL models is a neural network
representing a Q function, to estimate the expected cumulative reward following
a state-action pair. The Q function neural network contains a lot of implicit
knowledge about the RL problems, but often remains unexamined and
uninterpreted. To our knowledge, this work develops the first mimic learning
framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to
approximate neural network predictions. An LMUT is learned using a novel
on-line algorithm that is well-suited for an active play setting, where the
mimic learner observes an ongoing interaction between the neural net and the
environment. Empirical evaluation shows that an LMUT mimics a Q function
substantially better than five baseline methods. The transparent tree structure
of an LMUT facilitates understanding the network's learned knowledge by
analyzing feature influence, extracting rules, and highlighting the
super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201
Early hospital mortality prediction using vital signals
Early hospital mortality prediction is critical as intensivists strive to
make efficient medical decisions about the severely ill patients staying in
intensive care units. As a result, various methods have been developed to
address this problem based on clinical records. However, some of the laboratory
test results are time-consuming and need to be processed. In this paper, we
propose a novel method to predict mortality using features extracted from the
heart signals of patients within the first hour of ICU admission. In order to
predict the risk, quantitative features have been computed based on the heart
rate signals of ICU patients. Each signal is described in terms of 12
statistical and signal-based features. The extracted features are fed into
eight classifiers: decision tree, linear discriminant, logistic regression,
support vector machine (SVM), random forest, boosted trees, Gaussian SVM, and
K-nearest neighborhood (K-NN). To derive insight into the performance of the
proposed method, several experiments have been conducted using the well-known
clinical dataset named Medical Information Mart for Intensive Care III
(MIMIC-III). The experimental results demonstrate the capability of the
proposed method in terms of precision, recall, F1-score, and area under the
receiver operating characteristic curve (AUC). The decision tree classifier
satisfies both accuracy and interpretability better than the other classifiers,
producing an F1-score and AUC equal to 0.91 and 0.93, respectively. It
indicates that heart rate signals can be used for predicting mortality in
patients in the ICU, achieving a comparable performance with existing
predictions that rely on high dimensional features from clinical records which
need to be processed and may contain missing information.Comment: 11 pages, 5 figures, preprint of accepted paper in IEEE&ACM CHASE
2018 and published in Smart Health journa
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
- …