63 research outputs found

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Full text link
    The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tradeoff in neural networks with respect to network width, unlike was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky evidence used to support this claim in neural networks, we measure bias and variance in the modern setting. We find that both bias and variance can decrease as the number of parameters grows. To better understand this, we introduce a new decomposition of the variance to disentangle the effects of optimization and data sampling. We also provide theoretical analysis in a simplified setting that is consistent with our empirical findings

    evidence from high school students’ grades

    Get PDF
    Costa-Mendes, R., Cruz-Jesus, F., Oliveira, T., & Castelli, M. (2022). Academic achievement critical factors and the bias and variance decomposition: evidence from high school students’ grades. In Papers of 6th Canadian International Conference on Advances in Education, Teaching & Technology 2022: Papers proceedings (pp. 54-62). (International Multidisciplinary Research Journal; Vol. Special Issue, No. Conferences - Proceedings). Unique Conferences Canada. https://imrjournal.info/2022/EduTeach2022Proceedings1.pdfThis study is centered on the sources of machine learning bias in the prediction of students’ grades. The dataset comprises 29,788 Portuguese high school teacher final grades corresponding to 10,364 public high school students’ academic paths (from the 10th to the 11th grades). We use an artificial neural network to perform the tasks. In the experimental phase, we undertake a bias and variance decomposition when predicting the 11th year students’ grades. Two different implementations are used, a critical implementation that comprises only academic achievement critical factors and a lagged implementation where the preceding teacher grade is appended. The critical implementation has a higher machine learning bias, notwithstanding the higher critical factors’ contribution. The lagged implementation, on the other hand, has a smaller bias, but a smaller critical factors’ contribution. It is possible for a machine learning model to have a reduced bias and simultaneously a little critical factors’ contribution, simply by accessing information about the historical value of the target variable. The education stakeholders should therefore be aware of the critical quality of the model in use. In defining policies and choosing the variables to influence, predictive models with low biases and built upon the critical factors information are indispensable. A machine learning model based on the critical factors produces more consistent estimates of their effects on AA. They are therefore suitable models to assist in policymaking. On the other hand, if the goal is to obtain a simple set of predictions, the use of target variable historical values is appropriate.publishersversionpublishe

    Lower bounds for the trade-off between bias and mean absolute deviation

    Get PDF
    In nonparametric statistics, rate-optimal estimators typically balance bias and stochastic error. The recent work on overparametrization raises the question whether rate-optimal estimators exist that do not obey this trade-off. In this work we consider pointwise estimation in the Gaussian white noise model with regression function f in a class of β-Hölder smooth functions. Let ’worst-case’ refer to the supremum over all functions f in the Hölder class. It is shown that any estimator with worst-case bias ≲n −β/(2β+1)≕ψ n must necessarily also have a worst-case mean absolute deviation that is lower bounded by ≳ψ n. To derive the result, we establish abstract inequalities relating the change of expectation for two probability measures to the mean absolute deviation.</p

    Improved customer choice predictions using ensemble methods

    Get PDF
    In this paper various ensemble learning methods from machinelearning and statistics are considered and applied to the customerchoice modeling problem. The application of ensemble learningusually improves the prediction quality of flexible models likedecision trees and thus leads to improved predictions. We giveexperimental results for two real-life marketing datasets usingdecision trees, ensemble versions of decision trees and thelogistic regression model, which is a standard approach for thisproblem. The ensemble models are found to improve upon individualdecision trees and outperform logistic regression.Next, an additive decomposition of the prediction error of amodel, the bias/variance decomposition, is considered. A modelwith a high bias lacks the flexibility to fit the data well. Ahigh variance indicates that a model is instable with respect todifferent datasets. Decision trees have a high variance componentand a low bias component in the prediction error, whereas logisticregression has a high bias component and a low variance component.It is shown that ensemble methods aim at minimizing the variancecomponent in the prediction error while leaving the bias componentunaltered. Bias/variance decompositions for all models for bothcustomer choice datasets are given to illustrate these concepts.brand choice;data mining;boosting;choice models;Bias/Variance decomposition;Bagging;CART;ensembles

    Automatic identification of the number of food items in a meal using clustering techniques based on the monitoring of swallowing and chewing

    Get PDF
    The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy \u3e95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions

    Teaching a neural network modeling socio-economic development of the region

    Get PDF
    The article is devoted to the formation of an array of data for the construction of an artificial neural network, designed to search for relationships between social and economic parameters of the development of regions of the Russian Federation. The relevance of research in this area is confirmed both by a large number of studies in the field of regional comparativistics and by the limited methods used in this kind of research, often limited to descriptive methods and basic techniques of parametric statistics. Under these conditions, the expansion of the mathematical apparatus and the more active introduction of information technologies (including in the area of Big Data analysis and the construction of predictive models based on artificial neural networks) can be viable. At the same time, however, it should be noted that the resources of an individual research team may be (and most likely will be) insufficient to create their own software solution for the implementation of machine learning algorithms from scratch. The use of third-party cloud-based software platforms (primarily IBM and Google infrastructures) allows to bypass the problem of the research team’s lack of expensive material and technical base, however they impose a number of limitations dictated by the requirements of the existing machine learning algorithms and the specific architecture provided platforms This puts the research team in front of the need to prepare the accumulated data set for processing: reducing the dimension, checking the data for compliance with the platform requirements and eliminating potential problem areas: “data leaks”, “learning distortions” and others. The paper was reported to the section “Sociology of Digital Society: Structures, Processes, Governance” of the International Conference Session “Public Administration and Development of Russia: National Goals and Institutions”
    • …
    corecore