93 research outputs found
Pre-emphasizing Binarized Ensembles to Improve Classification Performance
14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)
Navigation of brain networks
Understanding the mechanisms of neural communication in large-scale brain
networks remains a major goal in neuroscience. We investigated whether
navigation is a parsimonious routing model for connectomics. Navigating a
network involves progressing to the next node that is closest in distance to a
desired destination. We developed a measure to quantify navigation efficiency
and found that connectomes in a range of mammalian species (human, mouse and
macaque) can be successfully navigated with near-optimal efficiency (>80% of
optimal efficiency for typical connection densities). Rewiring network topology
or repositioning network nodes resulted in 45%-60% reductions in navigation
performance. Specifically, we found that brain networks cannot be progressively
rewired (randomized or clusterized) to result in topologies with significantly
improved navigation performance. Navigation was also found to: i) promote a
resource-efficient distribution of the information traffic load, potentially
relieving communication bottlenecks; and, ii) explain significant variation in
functional connectivity. Unlike prevalently studied communication strategies in
connectomics, navigation does not mandate biologically unrealistic assumptions
about global knowledge of network topology. We conclude that the wiring and
spatial embedding of brain networks is conducive to effective decentralized
communication. Graph-theoretic studies of the connectome should consider
measures of network efficiency and centrality that are consistent with
decentralized models of neural communication
Assessing the reliability of species distribution projections in climate change research
Aim: Forecasting changes in species distribution under future scenarios is one of the most prolific areas of application for species distribution models (SDMs). However, no consensus yet exists on the reliability of such models for drawing conclusions on speciesâ distribution response to changing climate. In this study, we provide an overview of common modelling practices in the field and assess the reliability of model predictions using a virtual species approach. Location: Global. Methods: We first review papers published between 2015 and 2019. Then, we use a virtual species approach and three commonly applied SDM algorithms (GLM, MaxEnt and random forest) to assess the estimated and actual predictive performance of models parameterized with different modelling settings and violations of modelling assumptions. Results: Most SDM papers relied on single models (65%) and small samples (N < 50, 62%), used presence-only data (85%), binarized models' output (74%) and used a split-sample validation (94%). Our simulation reveals that the split-sample validation tends to be over-optimistic compared to the real performance, whereas spatial block validation provides a more honest estimate, except when datasets are environmentally biased. The binarization of predicted probabilities of presence reduces modelsâ predictive ability considerably. Sample size is one of the main predictors of the real model accuracy, but has little influence on estimated accuracy. Finally, the inclusion of ecologically irrelevant predictors and the violation of modelling assumptions increases estimated accuracy but decreases real accuracy of model projections, leading to biased estimates of range contraction and expansion. Main conclusions: Our ability to predict future species distribution is low on average, particularly when modelsâ predictions are binarized. A robust validation by spatially independent samples is required, but does not rule out inflation of model accuracy by assumption violation. Our findings call for caution in the application and interpretation of SDM projections under different climates
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging
problem in supervised learning as standard classification algorithms are
designed to handle balanced class distributions. While different strategies
exist to tackle this problem, methods which generate artificial data to achieve
a balanced class distribution are more versatile than modifications to the
classification algorithm. Such techniques, called oversamplers, modify the
training data, allowing any classifier to be used with class-imbalanced
datasets. Many algorithms have been proposed for this task, but most are
complex and tend to generate unnecessary noise. This work presents a simple and
effective oversampling method based on k-means clustering and SMOTE
oversampling, which avoids the generation of noise and effectively overcomes
imbalances between and within classes. Empirical results of extensive
experiments with 71 datasets show that training data oversampled with the
proposed method improves classification results. Moreover, k-means SMOTE
consistently outperforms other popular oversampling methods. An implementation
is made available in the python programming language.Comment: 19 pages, 8 figure
Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using Machine Learning Techniques
GestationalâŻdiabetesâŻmellitus andâŻhypertensive disorders in pregnancy are serious maternal health conditions with immediate and lifelong mother-child health consequences. These obstetric pathologies have been widely investigated, but mostly in silos, while studies focusing on their simultaneous occurrence rarely exist. This is especially the case in the machine learning domain. This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised machine learning predictive model for the binary classification of co-occurring gestationalâŻdiabetesâŻmellitus andâŻhypertensive disorders in pregnancy in a cohort of otherwise healthy pregnant women. To accomplish the stated aims, this study analyzed an extract (n=4624, n_features=38) of a labelled maternal perinatal dataset (n=9967, n_fields=79) collected by the PeriData.NetÂź database from a participating community hospital in Southeast Wisconsin between 2013 and 2018. The datasets were named, âWiseSampleâ and âWiseSubsetâ respectively in this study. Thirty-three models were constructed with the six supervised machine learning algorithms explored on the extracted dataset: logistic regression, random forest, decision tree, support vector machine, StackingClassifier, and KerasClassifier, which is a deep learning classification algorithm; all were evaluated using the StratifiedKfold cross-validation (k=10) method. The Synthetic Minority Oversampling Technique was applied to the training data to resolve the class imbalance that was noted in the sub-sample at the preprocessing phase. A wide range of evidence-based feature selection techniques were used to identify the best predictors of the comorbidity under investigation. Multiple model performance evaluation metrics that were employed to quantitatively evaluate and compare model performance quality include accuracy, F1, precision, recall, and the area under the receiver operating characteristic curve. Support Vector Machine objectively emerged as the most generalizable model for identifying the gravidae in WiseSubset who may develop concurrent gestational diabetes mellitus and hypertensive disorders in pregnancy, scoring 100.00% (mean) in recall. The model consisted of 9 predictors extracted by the recursive feature elimination with cross-validation with random forest. Finding from this study show that appropriate machine learning methods can reliably predict comorbid gestational diabetes and hypertensive disorders in pregnancy, using readily available routine prenatal attributes. Six of the nine most predictive factors of the comorbidity were also in the top 6 selections of at least one other feature selection method examined. The six predictors are healthy weight prepregnancy BMI, motherâs educational status, husbandâs educational status, husbandâs occupation in one year before the current pregnancy, motherâs blood group, and motherâs age range between 34 and 44 years. Insight from this analysis would support clinical decision making of obstetric experts when they are caring for 1.) nulliparous women, since they would have no obstetric history that could prompt their care providers for feto-maternal medical surveillance; and 2.) the experienced mothers with no obstetric history suggestive of any of the disease(s) under this study. Hence, among other benefits, the artificial-intelligence-backed tool designed in this research would likely improve maternal and child care quality outcomes
Automatic machine learning:methods, systems, challenges
This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself
- âŠ