3 research outputs found

    On Low-rank Trace Regression under General Sampling Distribution

    Full text link
    A growing number of modern statistical learning problems involve estimating a large number of parameters from a (smaller) number of noisy observations. In a subset of these problems (matrix completion, matrix compressed sensing, and multi-task learning) the unknown parameters form a high-dimensional matrix B*, and two popular approaches for the estimation are convex relaxation of rank-penalized regression or non-convex optimization. It is also known that these estimators satisfy near optimal error bounds under assumptions on rank, coherence, or spikiness of the unknown matrix. In this paper, we introduce a unifying technique for analyzing all of these problems via both estimators that leads to short proofs for the existing results as well as new results. Specifically, first we introduce a general notion of spikiness for B* and consider a general family of estimators and prove non-asymptotic error bounds for the their estimation error. Our approach relies on a generic recipe to prove restricted strong convexity for the sampling operator of the trace regression. Second, and most notably, we prove similar error bounds when the regularization parameter is chosen via K-fold cross-validation. This result is significant in that existing theory on cross-validated estimators do not apply to our setting since our estimators are not known to satisfy their required notion of stability. Third, we study applications of our general results to four subproblems of (1) matrix completion, (2) multi-task learning, (3) compressed sensing with Gaussian ensembles, and (4) compressed sensing with factored measurements. For (1), (3), and (4) we recover matching error bounds as those found in the literature, and for (2) we obtain (to the best of our knowledge) the first such error bound. We also demonstrate how our frameworks applies to the exact recovery problem in (3) and (4).Comment: 32 pages, 1 figur

    PAC-Bayes bounds for stable algorithms with instance-dependent priors

    Full text link
    PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients. We also provide a new bound for the SVM classifier, which is compared to other known bounds experimentally. Ours appears to be the first stability-based bound that evaluates to non-trivial values.Comment: 16 pages, discussion of theory and experiments in the main body, detailed proofs and experimental details in the appendice

    Investigation of Fluid Phase Behavior in Shale Reservoirs Using Equation of State, Molecular Simulation and Machine Learning

    Get PDF
    Fluid phase behavior in shale reservoirs differs significantly from phase behavior in conventional reservoirs due to the strong interactions between fluid and boundary in nanopores. In this study, we applied equation-of-state (EOS) modeling, machine learning (ML) technique and molecular simulation to investigate fluid phase behavior in shale reservoirs. One common issue observed in liquid-rich shale (LRS) production is that oil recovery of LRS reservoirs is much lower compared to oil recovery from a conventional reservoir with the same drawdown. To understand this phenomenon, EOS modeling is developed to analyze the fluid compositions in the bulk and confined regions. Our simulation results indicate that hydrocarbons distribute heterogeneously with respect to pore size on a nanoscale. The leaner bulk composition leads to the reduction in oil recovery from LRS reservoirs. Although EOS modeling can accurately simulate fluid phase behavior in shale reservoirs, the required simulation time is much longer than that for models of conventional reservoirs. To solve this problem, ML techniques were applied to accelerate the phase-equilibrium calculations in the EOS modeling. In contrast to previous models designed for a specific type of hydrocarbon, we have developed a generalized, ML-assisted phase-equilibrium calculation model that is suitable for shale reservoirs. In total, the average CPU time required for phase-equilibrium calculation using the generalized ML-assisted phase-equilibrium model was reduced by more than two orders of magnitude while maintaining an accuracy of 97%. With the development of shale oil and gas, depleted shale gas reservoirs may be attractive candidates for hydrogen (H2) storage. Molecular simulation was used to investigate the potential for H2 storage in depleted shale gas reservoirs. The results of the simulation suggest that a higher proportion of H2 exists in the bulk region. Because fluid is mainly produced from the bulk region, the high percentage of H2 in bulk fluid would lead to high purity of H2 during the recovery process. This work contributes to the understanding and application of fluid phase behavior in shale reservoirs
    corecore