3 research outputs found
On Low-rank Trace Regression under General Sampling Distribution
A growing number of modern statistical learning problems involve estimating a
large number of parameters from a (smaller) number of noisy observations. In a
subset of these problems (matrix completion, matrix compressed sensing, and
multi-task learning) the unknown parameters form a high-dimensional matrix B*,
and two popular approaches for the estimation are convex relaxation of
rank-penalized regression or non-convex optimization. It is also known that
these estimators satisfy near optimal error bounds under assumptions on rank,
coherence, or spikiness of the unknown matrix.
In this paper, we introduce a unifying technique for analyzing all of these
problems via both estimators that leads to short proofs for the existing
results as well as new results. Specifically, first we introduce a general
notion of spikiness for B* and consider a general family of estimators and
prove non-asymptotic error bounds for the their estimation error. Our approach
relies on a generic recipe to prove restricted strong convexity for the
sampling operator of the trace regression. Second, and most notably, we prove
similar error bounds when the regularization parameter is chosen via K-fold
cross-validation. This result is significant in that existing theory on
cross-validated estimators do not apply to our setting since our estimators are
not known to satisfy their required notion of stability. Third, we study
applications of our general results to four subproblems of (1) matrix
completion, (2) multi-task learning, (3) compressed sensing with Gaussian
ensembles, and (4) compressed sensing with factored measurements. For (1), (3),
and (4) we recover matching error bounds as those found in the literature, and
for (2) we obtain (to the best of our knowledge) the first such error bound. We
also demonstrate how our frameworks applies to the exact recovery problem in
(3) and (4).Comment: 32 pages, 1 figur
PAC-Bayes bounds for stable algorithms with instance-dependent priors
PAC-Bayes bounds have been proposed to get risk estimates based on a training
sample. In this paper the PAC-Bayes approach is combined with stability of the
hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting
is used with a Gaussian prior centered at the expected output. Thus a novelty
of our paper is using priors defined in terms of the data-generating
distribution. Our main result estimates the risk of the randomized algorithm in
terms of the hypothesis stability coefficients. We also provide a new bound for
the SVM classifier, which is compared to other known bounds experimentally.
Ours appears to be the first stability-based bound that evaluates to
non-trivial values.Comment: 16 pages, discussion of theory and experiments in the main body,
detailed proofs and experimental details in the appendice
Investigation of Fluid Phase Behavior in Shale Reservoirs Using Equation of State, Molecular Simulation and Machine Learning
Fluid phase behavior in shale reservoirs differs significantly from phase behavior in conventional reservoirs due to the strong interactions between fluid and boundary in nanopores. In this study, we applied equation-of-state (EOS) modeling, machine learning (ML) technique and molecular simulation to investigate fluid phase behavior in shale reservoirs.
One common issue observed in liquid-rich shale (LRS) production is that oil recovery of LRS reservoirs is much lower compared to oil recovery from a conventional reservoir with the same drawdown. To understand this phenomenon, EOS modeling is developed to analyze the fluid compositions in the bulk and confined regions. Our simulation results indicate that hydrocarbons distribute heterogeneously with respect to pore size on a nanoscale. The leaner bulk composition leads to the reduction in oil recovery from LRS reservoirs. Although EOS modeling can accurately simulate fluid phase behavior in shale reservoirs, the required simulation time is much longer than that for models of conventional reservoirs. To solve this problem, ML techniques were applied to accelerate the phase-equilibrium calculations in the EOS modeling. In contrast to previous models designed for a specific type of hydrocarbon, we have developed a generalized, ML-assisted phase-equilibrium calculation model that is suitable for shale reservoirs. In total, the average CPU time required for phase-equilibrium calculation using the generalized ML-assisted phase-equilibrium model was reduced by more than two orders of magnitude while maintaining an accuracy of 97%. With the development of shale oil and gas, depleted shale gas reservoirs may be attractive candidates for hydrogen (H2) storage. Molecular simulation was used to investigate the potential for H2 storage in depleted shale gas reservoirs. The results of the simulation suggest that a higher proportion of H2 exists in the bulk region. Because fluid is mainly produced from the bulk region, the high percentage of H2 in bulk fluid would lead to high purity of H2 during the recovery process. This work contributes to the understanding and application of fluid phase behavior in shale reservoirs