8 research outputs found

    Échantillonnage progressif guidĂ© pour stabiliser la courbe d'apprentissage

    Get PDF
    National audienceL'un des enjeux de l'apprentissage artificiel est de pouvoir fonctionner avec des volumes de données toujours plus grands. Bien qu'il soit généralement admis que plus un ensemble d'apprentissage est large et plus les résultats sont performants, il existe des limites à la masse d'informations qu'un algorithme d'apprentissage peut manipuler. Pour résoudre ce problÚme, nous proposons d'améliorer la méthode d'échantillonnage progressif en guidant la construction d'un ensemble d'apprentissage réduit à partir d'un large ensemble de données. L'apprentissage à partir de l'ensemble réduit doit conduire à des performances similaires à l'apprentissage effectué avec l'ensemble complet. Le guidage de l'échantillonnage s'appuie sur une connaissance a priori qui accélÚre la convergence de l'algorithme. Cette approche présente trois avantages : 1) l'ensemble d'apprentissage réduit est composé des cas les plus représentatifs de l'ensemble complet; 2) la courbe d'apprentissage est stabilisée; 3) la détection de convergence est accélérée. L'application de cette méthode à des données classiques et à des données provenant d'unités de soins intensifs révÚle qu'il est possible de réduire de façon significative un ensemble d'apprentissage sans diminuer la performance de l'apprentissage

    Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

    Full text link
    Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.Comment: 29 pages, 12 figure

    Apprentissage d'arbre de décision pour le pilotage en ligne d'algorithmes de détection sur les électrocardiogrammes

    Get PDF
    National audienceLe nombre d'algorithmes de traitement du signal (compression, reconnaissance des formes, etc.) grandit progressivement ce qui rend de plus en plus difficile le choix de l'algorithme le plus adaptĂ© Ă  une tĂąche particuliĂšre. Ceci est particuliĂšrement vrai pour l'analyse automatique des Ă©lectrocardiogrammes (ECG) notamment pour la dĂ©tection des complexes QRS. Bien que chaque algorithme de la littĂ©rature se comporte de maniĂšre satisfaisante dans des situations normales, il existe des contextes oĂč un algorithme est plus adaptĂ© que les autres, notamment en prĂ©sence de bruit. Nous proposons une mĂ©thode de sĂ©lection qui choisit, en ligne, l'algorithme le plus adaptĂ© au contexte courant du signal Ă  traiter. Les rĂšgles de sĂ©lection sont acquises par arbre de dĂ©cision sur les rĂ©sultats de performance de 7 algorithmes testĂ©s dans 130 contextes diffĂ©rents. Les rĂ©sultats montrent la supĂ©rioritĂ© de l'approche proposĂ©e sur les algorithmes utilisĂ©s sĂ©parĂ©ment. En outre, les performances des rĂšgles de sĂ©lection apprises sont trĂšs proches de celles des rĂšgles acquises par expertise, ce qui conforte notre approche

    Machine learning ensemble method for discovering knowledge from big data

    Get PDF
    Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved

    Cooperative Training in Multiple Classifier Systems

    Get PDF
    Multiple classifier system has shown to be an effective technique for classification. The success of multiple classifiers does not entirely depend on the base classifiers and/or the aggregation technique. Other parameters, such as training data, feature attributes, and correlation among the base classifiers may also contribute to the success of multiple classifiers. In addition, interaction of these parameters with each other may have an impact on multiple classifiers performance. In the present study, we intended to examine some of these interactions and investigate further the effects of these interactions on the performance of classifier ensembles. The proposed research introduces a different direction in the field of multiple classifiers systems. We attempt to understand and compare ensemble methods from the cooperation perspective. In this thesis, we narrowed down our focus on cooperation at training level. We first developed measures to estimate the degree and type of cooperation among training data partitions. These evaluation measures enabled us to evaluate the diversity and correlation among a set of disjoint and overlapped partitions. With the aid of properly selected measures and training information, we proposed two new data partitioning approaches: Cluster, De-cluster, and Selection (CDS) and Cooperative Cluster, De-cluster, and Selection (CO-CDS). In the end, a comprehensive comparative study was conducted where we compared our proposed training approaches with several other approaches in terms of robustness of their usage, resultant classification accuracy and classification stability. Experimental assessment of CDS and CO-CDS training approaches validates their robustness as compared to other training approaches. In addition, this study suggests that: 1) cooperation is generally beneficial and 2) classifier ensembles that cooperate through sharing information have higher generalization ability compared to the ones that do not share training information

    SPS, Chennai 2 Distributed learning with bagging-like performance

    No full text
    10 Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A 11 simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural 12 network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in 13 performance equivalent to, or better than, bootstrap aggregates (bags). Many applications (e.g., protein structure 14 prediction) involve use of datasets that are too large to handle in the memory of the typical computer. Hence, bagging 15 with samples the size of the data is impractical. Our results indicate that, in such applications, the simple approach of 16 creating a committee of n classifiers from disjoint partitions each of size 1=n (which will be memory resident during 17 learning) in a distributed way results in a classifier which has a bagging-like performance gain. The use of distributed 18 disjoint partitions in learning is significantly less complex and faster than bagging
    corecore