46,421 research outputs found
Deep Neural Networks for Estimation and Inference
We study deep neural networks and their use in semiparametric inference. We
establish novel rates of convergence for deep feedforward neural nets. Our new
rates are sufficiently fast (in some cases minimax optimal) to allow us to
establish valid second-step inference after first-step estimation with deep
learning, a result also new to the literature. Our estimation rates and
semiparametric inference results handle the current standard architecture:
fully connected feedforward neural networks (multi-layer perceptrons), with the
now-common rectified linear unit activation function and a depth explicitly
diverging with the sample size. We discuss other architectures as well,
including fixed-width, very deep networks. We establish nonasymptotic bounds
for these deep nets for a general class of nonparametric regression-type loss
functions, which includes as special cases least squares, logistic regression,
and other generalized linear models. We then apply our theory to develop
semiparametric inference, focusing on causal parameters for concreteness, such
as treatment effects, expected welfare, and decomposition effects. Inference in
many other semiparametric contexts can be readily obtained. We demonstrate the
effectiveness of deep learning with a Monte Carlo analysis and an empirical
application to direct mail marketing
Deep Neural Networks for Choice Analysis: A Statistical Learning Theory Perspective
While researchers increasingly use deep neural networks (DNN) to analyze
individual choices, overfitting and interpretability issues remain as obstacles
in theory and practice. By using statistical learning theory, this study
presents a framework to examine the tradeoff between estimation and
approximation errors, and between prediction and interpretation losses. It
operationalizes the DNN interpretability in the choice analysis by formulating
the metrics of interpretation loss as the difference between true and estimated
choice probability functions. This study also uses the statistical learning
theory to upper bound the estimation error of both prediction and
interpretation losses in DNN, shedding light on why DNN does not have the
overfitting issue. Three scenarios are then simulated to compare DNN to binary
logit model (BNL). We found that DNN outperforms BNL in terms of both
prediction and interpretation for most of the scenarios, and larger sample size
unleashes the predictive power of DNN but not BNL. DNN is also used to analyze
the choice of trip purposes and travel modes based on the National Household
Travel Survey 2017 (NHTS2017) dataset. These experiments indicate that DNN can
be used for choice analysis beyond the current practice of demand forecasting
because it has the inherent utility interpretation, the flexibility of
accommodating various information formats, and the power of automatically
learning utility specification. DNN is both more predictive and interpretable
than BNL unless the modelers have complete knowledge about the choice task, and
the sample size is small. Overall, statistical learning theory can be a
foundation for future studies in the non-asymptotic data regime or using
high-dimensional statistical models in choice analysis, and the experiments
show the feasibility and effectiveness of DNN for its wide applications to
policy and behavioral analysis
How Well Can Generative Adversarial Networks Learn Densities: A Nonparametric View
We study in this paper the rate of convergence for learning densities under
the Generative Adversarial Networks (GAN) framework, borrowing insights from
nonparametric statistics. We introduce an improved GAN estimator that achieves
a faster rate, through simultaneously leveraging the level of smoothness in the
target density and the evaluation metric, which in theory remedies the mode
collapse problem reported in the literature. A minimax lower bound is
constructed to show that when the dimension is large, the exponent in the rate
for the new GAN estimator is near optimal. One can view our results as
answering in a quantitative way how well GAN learns a wide range of densities
with different smoothness properties, under a hierarchy of evaluation metrics.
As a byproduct, we also obtain improved generalization bounds for GAN with
deeper ReLU discriminator network.Comment: 21 page
Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo
Artificial intelligence (AI) has achieved superhuman performance in a growing
number of tasks, but understanding and explaining AI remain challenging. This
paper clarifies the connections between machine-learning algorithms to develop
AIs and the econometrics of dynamic structural models through the case studies
of three famous game AIs. Chess-playing Deep Blue is a calibrated value
function, whereas shogi-playing Bonanza is an estimated value function via
Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy
network" is a deep neural network implementation of Hotz and Miller's (1993)
conditional choice probability estimation; its "reinforcement-learning value
network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional
choice simulation method. Relaxing these AIs' implicit econometric assumptions
would improve their structural interpretability
Controlling Risk of Web Question Answering
Web question answering (QA) has become an indispensable component in modern
search systems, which can significantly improve users' search experience by
providing a direct answer to users' information need. This could be achieved by
applying machine reading comprehension (MRC) models over the retrieved passages
to extract answers with respect to the search query. With the development of
deep learning techniques, state-of-the-art MRC performances have been achieved
by recent deep methods. However, existing studies on MRC seldom address the
predictive uncertainty issue, i.e., how likely the prediction of an MRC model
is wrong, leading to uncontrollable risks in real-world Web QA applications. In
this work, we first conduct an in-depth investigation over the risk of Web QA.
We then introduce a novel risk control framework, which consists of a qualify
model for uncertainty estimation using the probe idea, and a decision model for
selectively output. For evaluation, we introduce risk-related metrics, rather
than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA.
The empirical results over both the real-world Web QA dataset and the academic
MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development
in Information Retrieva
Statistical Learning Theory for Location Fingerprinting in Wireless LANs
In this paper, techniques and algorithms developed in the framework of statistical learning theory are analyzed and applied to the problem of determining the location of a wireless device by measuring the signal strengths from a set of access points (location fingerprinting). Statistical Learning Theory provides a rich theoretical basis for the development of models starting from a set of examples. Signal strength measurement is part of the normal operating mode of wireless equipment, in particular Wi-Fi, so that no custom hardware is required. The proposed techniques, based on the Support Vector Machine paradigm, have been implemented and compared, on the same data set, with other approaches considered in the literature. Tests performed in a real-world environment show that results are comparable, with the advantage of a low algorithmic complexity in the normal operating phase. Moreover, the algorithm is particularly suitable for classification, where it outperforms the other techniques
Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models
Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time
series data is useful to enable condition-based maintenance and ensure high
operational availability of equipment. We propose a novel deep learning based
approach for Prognostics with Uncertainty Quantification that is useful in
scenarios where: (i) access to labeled failure data is scarce due to rarity of
failures (ii) future operational conditions are unobserved and (iii) inherent
noise is present in the sensor readings. All three scenarios mentioned are
unavoidable sources of uncertainty in the RUL estimation process often
resulting in unreliable RUL estimates. To address (i), we formulate RUL
estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep
Long Short Term Memory (LSTM) network based approach to learn the OR function.
We show that LSTM-OR naturally allows for incorporation of censored operational
instances in training along with the failed instances, leading to more robust
learning. To address (ii), we propose a simple yet effective approach to
quantify predictive uncertainty in the RUL estimation models by training an
ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan
engine benchmark datasets, we demonstrate that LSTM-OR is significantly better
than the commonly used deep metric regression based approaches for RUL
estimation, especially when failed training instances are scarce. Further, our
uncertainty quantification approach yields high quality predictive uncertainty
estimates while also leading to improved RUL estimates compared to single best
LSTM-OR models.Comment: Accepted at International Journal of Prognostics and Health
Management (IJPHM), 201
Smooth Pinball Neural Network for Probabilistic Forecasting of Wind Power
Uncertainty analysis in the form of probabilistic forecasting can
significantly improve decision making processes in the smart power grid for
better integrating renewable energy sources such as wind. Whereas point
forecasting provides a single expected value, probabilistic forecasts provide
more information in the form of quantiles, prediction intervals, or full
predictive densities. This paper analyzes the effectiveness of a novel approach
for nonparametric probabilistic forecasting of wind power that combines a
smooth approximation of the pinball loss function with a neural network
architecture and a weighting initialization scheme to prevent the quantile
cross over problem. A numerical case study is conducted using publicly
available wind data from the Global Energy Forecasting Competition 2014.
Multiple quantiles are estimated to form 10%, to 90% prediction intervals which
are evaluated using a quantile score and reliability measures. Benchmark models
such as the persistence and climatology distributions, multiple quantile
regression, and support vector quantile regression are used for comparison
where results demonstrate the proposed approach leads to improved performance
while preventing the problem of overlapping quantile estimates
Memorized Sparse Backpropagation
Neural network learning is typically slow since backpropagation needs to
compute full gradients and backpropagate them across multiple layers. Despite
its success of existing work in accelerating propagation through sparseness,
the relevant theoretical characteristics remain unexplored and we empirically
find that they suffer from the loss of information contained in unpropagated
gradients. To tackle these problems, in this work, we present a unified sparse
backpropagation framework and provide a detailed analysis of its theoretical
characteristics. Analysis reveals that when applied to a multilayer perceptron,
our framework essentially performs gradient descent using an estimated gradient
similar enough to the true gradient, resulting in convergence in probability
under certain conditions. Furthermore, a simple yet effective algorithm named
memorized sparse backpropagation (MSBP) is proposed to remedy the problem of
information loss by storing unpropagated gradients in memory for the next
learning. The experiments demonstrate that the proposed MSBP is able to
effectively alleviate the information loss in traditional sparse
backpropagation while achieving comparable acceleration
A Nonparametric Ensemble Binary Classifier and its Statistical Properties
In this work, we propose an ensemble of classification trees (CT) and
artificial neural networks (ANN). Several statistical properties including
universal consistency and upper bound of an important parameter of the proposed
classifier are shown. Numerical evidence is also provided using various real
life data sets to assess the performance of the model. Our proposed
nonparametric ensemble classifier doesn't suffer from the `curse of
dimensionality' and can be used in a wide variety of feature selection cum
classification problems. Performance of the proposed model is quite better when
compared to many other state-of-the-art models used for similar situations
- …