93 research outputs found

    Pre-emphasizing Binarized Ensembles to Improve Classification Performance

    Get PDF
    14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)

    Navigation of brain networks

    Get PDF
    Understanding the mechanisms of neural communication in large-scale brain networks remains a major goal in neuroscience. We investigated whether navigation is a parsimonious routing model for connectomics. Navigating a network involves progressing to the next node that is closest in distance to a desired destination. We developed a measure to quantify navigation efficiency and found that connectomes in a range of mammalian species (human, mouse and macaque) can be successfully navigated with near-optimal efficiency (>80% of optimal efficiency for typical connection densities). Rewiring network topology or repositioning network nodes resulted in 45%-60% reductions in navigation performance. Specifically, we found that brain networks cannot be progressively rewired (randomized or clusterized) to result in topologies with significantly improved navigation performance. Navigation was also found to: i) promote a resource-efficient distribution of the information traffic load, potentially relieving communication bottlenecks; and, ii) explain significant variation in functional connectivity. Unlike prevalently studied communication strategies in connectomics, navigation does not mandate biologically unrealistic assumptions about global knowledge of network topology. We conclude that the wiring and spatial embedding of brain networks is conducive to effective decentralized communication. Graph-theoretic studies of the connectome should consider measures of network efficiency and centrality that are consistent with decentralized models of neural communication

    Assessing the reliability of species distribution projections in climate change research

    Get PDF
    Aim: Forecasting changes in species distribution under future scenarios is one of the most prolific areas of application for species distribution models (SDMs). However, no consensus yet exists on the reliability of such models for drawing conclusions on species’ distribution response to changing climate. In this study, we provide an overview of common modelling practices in the field and assess the reliability of model predictions using a virtual species approach. Location: Global. Methods: We first review papers published between 2015 and 2019. Then, we use a virtual species approach and three commonly applied SDM algorithms (GLM, MaxEnt and random forest) to assess the estimated and actual predictive performance of models parameterized with different modelling settings and violations of modelling assumptions. Results: Most SDM papers relied on single models (65%) and small samples (N < 50, 62%), used presence-only data (85%), binarized models' output (74%) and used a split-sample validation (94%). Our simulation reveals that the split-sample validation tends to be over-optimistic compared to the real performance, whereas spatial block validation provides a more honest estimate, except when datasets are environmentally biased. The binarization of predicted probabilities of presence reduces models’ predictive ability considerably. Sample size is one of the main predictors of the real model accuracy, but has little influence on estimated accuracy. Finally, the inclusion of ecologically irrelevant predictors and the violation of modelling assumptions increases estimated accuracy but decreases real accuracy of model projections, leading to biased estimates of range contraction and expansion. Main conclusions: Our ability to predict future species distribution is low on average, particularly when models’ predictions are binarized. A robust validation by spatially independent samples is required, but does not rule out inflation of model accuracy by assumption violation. Our findings call for caution in the application and interpretation of SDM projections under different climates

    Oversampling for Imbalanced Learning Based on K-Means and SMOTE

    Full text link
    Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

    Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using Machine Learning Techniques

    Get PDF
    Gestational diabetes mellitus and hypertensive disorders in pregnancy are serious maternal health conditions with immediate and lifelong mother-child health consequences. These obstetric pathologies have been widely investigated, but mostly in silos, while studies focusing on their simultaneous occurrence rarely exist. This is especially the case in the machine learning domain. This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised machine learning predictive model for the binary classification of co-occurring gestational diabetes mellitus and hypertensive disorders in pregnancy in a cohort of otherwise healthy pregnant women. To accomplish the stated aims, this study analyzed an extract (n=4624, n_features=38) of a labelled maternal perinatal dataset (n=9967, n_fields=79) collected by the PeriData.Net¼ database from a participating community hospital in Southeast Wisconsin between 2013 and 2018. The datasets were named, “WiseSample” and “WiseSubset” respectively in this study. Thirty-three models were constructed with the six supervised machine learning algorithms explored on the extracted dataset: logistic regression, random forest, decision tree, support vector machine, StackingClassifier, and KerasClassifier, which is a deep learning classification algorithm; all were evaluated using the StratifiedKfold cross-validation (k=10) method. The Synthetic Minority Oversampling Technique was applied to the training data to resolve the class imbalance that was noted in the sub-sample at the preprocessing phase. A wide range of evidence-based feature selection techniques were used to identify the best predictors of the comorbidity under investigation. Multiple model performance evaluation metrics that were employed to quantitatively evaluate and compare model performance quality include accuracy, F1, precision, recall, and the area under the receiver operating characteristic curve. Support Vector Machine objectively emerged as the most generalizable model for identifying the gravidae in WiseSubset who may develop concurrent gestational diabetes mellitus and hypertensive disorders in pregnancy, scoring 100.00% (mean) in recall. The model consisted of 9 predictors extracted by the recursive feature elimination with cross-validation with random forest. Finding from this study show that appropriate machine learning methods can reliably predict comorbid gestational diabetes and hypertensive disorders in pregnancy, using readily available routine prenatal attributes. Six of the nine most predictive factors of the comorbidity were also in the top 6 selections of at least one other feature selection method examined. The six predictors are healthy weight prepregnancy BMI, mother’s educational status, husband’s educational status, husband’s occupation in one year before the current pregnancy, mother’s blood group, and mother’s age range between 34 and 44 years. Insight from this analysis would support clinical decision making of obstetric experts when they are caring for 1.) nulliparous women, since they would have no obstetric history that could prompt their care providers for feto-maternal medical surveillance; and 2.) the experienced mothers with no obstetric history suggestive of any of the disease(s) under this study. Hence, among other benefits, the artificial-intelligence-backed tool designed in this research would likely improve maternal and child care quality outcomes

    Convolutional Methods for Music Analysis

    Get PDF

    Automatic machine learning:methods, systems, challenges

    Get PDF

    Automatic machine learning:methods, systems, challenges

    Get PDF
    This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself
    • 

    corecore