15 research outputs found
Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator
We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are then fixed throughout the rest of the algorithm. For convenience, we refer to the fixed partitions as fixed mini-batches. Then for each computation iteration, the gradients are sequentially calculated on each fixed mini-batch. Because the size of fixed mini-batches is typically much smaller than the whole sample size, it can be easily computed. This leads to much reduced computation cost for each computational iteration. It makes FMGD computationally efficient and practically more feasible. To demonstrate the theoretical properties of FMGD, we start with a linear regression model with a constant learning rate. We study its numerical convergence and statistical efficiency properties. We find that sufficiently small learning rates are necessarily required for both numerical convergence and statistical efficiency. Nevertheless, an extremely small learning rate might lead to painfully slow numerical convergence. To solve the problem, a diminishing learning rate scheduling strategy (Gitman et al., 2019) can be used. This leads to the FMGD estimator with faster numerical convergence and better statistical efficiency. Finally, the FMGD algorithms with random shuffling and a general loss function are also studied.</p
Summary statistics of explanatory variables.
<p>For each continuous variable, we show its mean, standard deviation, median, minimum, and maximum value, while for each categorical variable, we show the percentage of requests for each level of the variable. N = 15,339,333.</p
Estimated regression coefficients.
<p>The dependent variable <i>Y</i><sub><i>ik</i></sub> is a binary variable that indicates whether driver <i>k</i> responded to request <i>k</i>.</p
Covariance Regression Analysis
<p>This article introduces covariance regression analysis for a <i>p</i>-dimensional response vector. The proposed method explores the regression relationship between the <i>p</i>-dimensional covariance matrix and auxiliary information. We study three types of estimators: maximum likelihood, ordinary least squares, and feasible generalized least squares estimators. Then, we demonstrate that these regression estimators are consistent and asymptotically normal. Furthermore, we obtain the high dimensional and large sample properties of the corresponding covariance matrix estimators. Simulation experiments are presented to demonstrate the performance of both regression and covariance matrix estimates. An example is analyzed from the Chinese stock market to illustrate the usefulness of the proposed covariance regression model. Supplementary materials for this article are available online.</p
Response rate and spatio-temporal demand and supply intensities.
<p>(a) The driver response rate declines steadily as the spatio-temporal demand intensity increases, (b) while it initially increases but then becomes stable as the spatio-temporal supply intensity increases.</p
Response rate over time.
<p>The driver response rate is relatively low in the early peak hours (i.e., 06:00–10:00), the late peak hours (i.e., 16:00–18:00), and the evenings (i.e., 20:00–22:00), but considerably higher during the midnight and non-peak hours (e.g., 10:00–16:00).</p
Description of explanatory variables.
<p>There are five sets of explanatory variables, corresponding to spatio-temporal supply-demand intensities, economic incentives, request characteristics, driver characteristics, and the time factor.</p
Response rate and economic incentives.
<p>(a) Passenger premium seems to have a “U-shaped” relationship with the driver response rate. (b) The presence (vs. absence) of a firm subsidy has a large positive impact on the driver response rate, which stabilizes when the amount of subsidy further increases.</p
Response rate and request characteristics.
<p>(a) The driver response rate increases steadily as the geographical distance rises. (b) Regarding the number of repeated submissions, the response rate is the highest when a request is submitted only once to the ride-hailing platform, whereas it is comparatively low for repeatedly submitted requests.</p
The ROC curve.
<p>FPR represents the false positive rate, and TPR is the true positive rate. The ROC curve is close to the upper left corner, indicating that the model’s predictive ability is good.</p