46,421 research outputs found

    Deep Neural Networks for Estimation and Inference

    Full text link
    We study deep neural networks and their use in semiparametric inference. We establish novel rates of convergence for deep feedforward neural nets. Our new rates are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second-step inference after first-step estimation with deep learning, a result also new to the literature. Our estimation rates and semiparametric inference results handle the current standard architecture: fully connected feedforward neural networks (multi-layer perceptrons), with the now-common rectified linear unit activation function and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed-width, very deep networks. We establish nonasymptotic bounds for these deep nets for a general class of nonparametric regression-type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, such as treatment effects, expected welfare, and decomposition effects. Inference in many other semiparametric contexts can be readily obtained. We demonstrate the effectiveness of deep learning with a Monte Carlo analysis and an empirical application to direct mail marketing

    Deep Neural Networks for Choice Analysis: A Statistical Learning Theory Perspective

    Full text link
    While researchers increasingly use deep neural networks (DNN) to analyze individual choices, overfitting and interpretability issues remain as obstacles in theory and practice. By using statistical learning theory, this study presents a framework to examine the tradeoff between estimation and approximation errors, and between prediction and interpretation losses. It operationalizes the DNN interpretability in the choice analysis by formulating the metrics of interpretation loss as the difference between true and estimated choice probability functions. This study also uses the statistical learning theory to upper bound the estimation error of both prediction and interpretation losses in DNN, shedding light on why DNN does not have the overfitting issue. Three scenarios are then simulated to compare DNN to binary logit model (BNL). We found that DNN outperforms BNL in terms of both prediction and interpretation for most of the scenarios, and larger sample size unleashes the predictive power of DNN but not BNL. DNN is also used to analyze the choice of trip purposes and travel modes based on the National Household Travel Survey 2017 (NHTS2017) dataset. These experiments indicate that DNN can be used for choice analysis beyond the current practice of demand forecasting because it has the inherent utility interpretation, the flexibility of accommodating various information formats, and the power of automatically learning utility specification. DNN is both more predictive and interpretable than BNL unless the modelers have complete knowledge about the choice task, and the sample size is small. Overall, statistical learning theory can be a foundation for future studies in the non-asymptotic data regime or using high-dimensional statistical models in choice analysis, and the experiments show the feasibility and effectiveness of DNN for its wide applications to policy and behavioral analysis

    How Well Can Generative Adversarial Networks Learn Densities: A Nonparametric View

    Full text link
    We study in this paper the rate of convergence for learning densities under the Generative Adversarial Networks (GAN) framework, borrowing insights from nonparametric statistics. We introduce an improved GAN estimator that achieves a faster rate, through simultaneously leveraging the level of smoothness in the target density and the evaluation metric, which in theory remedies the mode collapse problem reported in the literature. A minimax lower bound is constructed to show that when the dimension is large, the exponent in the rate for the new GAN estimator is near optimal. One can view our results as answering in a quantitative way how well GAN learns a wide range of densities with different smoothness properties, under a hierarchy of evaluation metrics. As a byproduct, we also obtain improved generalization bounds for GAN with deeper ReLU discriminator network.Comment: 21 page

    Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo

    Full text link
    Artificial intelligence (AI) has achieved superhuman performance in a growing number of tasks, but understanding and explaining AI remain challenging. This paper clarifies the connections between machine-learning algorithms to develop AIs and the econometrics of dynamic structural models through the case studies of three famous game AIs. Chess-playing Deep Blue is a calibrated value function, whereas shogi-playing Bonanza is an estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy network" is a deep neural network implementation of Hotz and Miller's (1993) conditional choice probability estimation; its "reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional choice simulation method. Relaxing these AIs' implicit econometric assumptions would improve their structural interpretability

    Controlling Risk of Web Question Answering

    Full text link
    Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need. This could be achieved by applying machine reading comprehension (MRC) models over the retrieved passages to extract answers with respect to the search query. With the development of deep learning techniques, state-of-the-art MRC performances have been achieved by recent deep methods. However, existing studies on MRC seldom address the predictive uncertainty issue, i.e., how likely the prediction of an MRC model is wrong, leading to uncontrollable risks in real-world Web QA applications. In this work, we first conduct an in-depth investigation over the risk of Web QA. We then introduce a novel risk control framework, which consists of a qualify model for uncertainty estimation using the probe idea, and a decision model for selectively output. For evaluation, we introduce risk-related metrics, rather than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA. The empirical results over both the real-world Web QA dataset and the academic MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieva

    Statistical Learning Theory for Location Fingerprinting in Wireless LANs

    Get PDF
    In this paper, techniques and algorithms developed in the framework of statistical learning theory are analyzed and applied to the problem of determining the location of a wireless device by measuring the signal strengths from a set of access points (location fingerprinting). Statistical Learning Theory provides a rich theoretical basis for the development of models starting from a set of examples. Signal strength measurement is part of the normal operating mode of wireless equipment, in particular Wi-Fi, so that no custom hardware is required. The proposed techniques, based on the Support Vector Machine paradigm, have been implemented and compared, on the same data set, with other approaches considered in the literature. Tests performed in a real-world environment show that results are comparable, with the advantage of a low algorithmic complexity in the normal operating phase. Moreover, the algorithm is particularly suitable for classification, where it outperforms the other techniques

    Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models

    Full text link
    Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time series data is useful to enable condition-based maintenance and ensure high operational availability of equipment. We propose a novel deep learning based approach for Prognostics with Uncertainty Quantification that is useful in scenarios where: (i) access to labeled failure data is scarce due to rarity of failures (ii) future operational conditions are unobserved and (iii) inherent noise is present in the sensor readings. All three scenarios mentioned are unavoidable sources of uncertainty in the RUL estimation process often resulting in unreliable RUL estimates. To address (i), we formulate RUL estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep Long Short Term Memory (LSTM) network based approach to learn the OR function. We show that LSTM-OR naturally allows for incorporation of censored operational instances in training along with the failed instances, leading to more robust learning. To address (ii), we propose a simple yet effective approach to quantify predictive uncertainty in the RUL estimation models by training an ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan engine benchmark datasets, we demonstrate that LSTM-OR is significantly better than the commonly used deep metric regression based approaches for RUL estimation, especially when failed training instances are scarce. Further, our uncertainty quantification approach yields high quality predictive uncertainty estimates while also leading to improved RUL estimates compared to single best LSTM-OR models.Comment: Accepted at International Journal of Prognostics and Health Management (IJPHM), 201

    Smooth Pinball Neural Network for Probabilistic Forecasting of Wind Power

    Full text link
    Uncertainty analysis in the form of probabilistic forecasting can significantly improve decision making processes in the smart power grid for better integrating renewable energy sources such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. This paper analyzes the effectiveness of a novel approach for nonparametric probabilistic forecasting of wind power that combines a smooth approximation of the pinball loss function with a neural network architecture and a weighting initialization scheme to prevent the quantile cross over problem. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 10%, to 90% prediction intervals which are evaluated using a quantile score and reliability measures. Benchmark models such as the persistence and climatology distributions, multiple quantile regression, and support vector quantile regression are used for comparison where results demonstrate the proposed approach leads to improved performance while preventing the problem of overlapping quantile estimates

    Memorized Sparse Backpropagation

    Full text link
    Neural network learning is typically slow since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of information contained in unpropagated gradients. To tackle these problems, in this work, we present a unified sparse backpropagation framework and provide a detailed analysis of its theoretical characteristics. Analysis reveals that when applied to a multilayer perceptron, our framework essentially performs gradient descent using an estimated gradient similar enough to the true gradient, resulting in convergence in probability under certain conditions. Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next learning. The experiments demonstrate that the proposed MSBP is able to effectively alleviate the information loss in traditional sparse backpropagation while achieving comparable acceleration

    A Nonparametric Ensemble Binary Classifier and its Statistical Properties

    Full text link
    In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency and upper bound of an important parameter of the proposed classifier are shown. Numerical evidence is also provided using various real life data sets to assess the performance of the model. Our proposed nonparametric ensemble classifier doesn't suffer from the `curse of dimensionality' and can be used in a wide variety of feature selection cum classification problems. Performance of the proposed model is quite better when compared to many other state-of-the-art models used for similar situations
    • …
    corecore