12 research outputs found

    Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation

    Full text link
    Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.Comment: 30 pages, 17 figures, Submitted to Computational Geosciences Journa

    Mitigation of Spatial Nonstationarity with Vision Transformers

    Full text link
    Spatial nonstationarity, the location variance of features' statistical distributions, is ubiquitous in many natural settings. For example, in geological reservoirs rock matrix porosity varies vertically due to geomechanical compaction trends, in mineral deposits grades vary due to sedimentation and concentration processes, in hydrology rainfall varies due to the atmosphere and topography interactions, and in metallurgy crystalline structures vary due to differential cooling. Conventional geostatistical modeling workflows rely on the assumption of stationarity to be able to model spatial features for the geostatistical inference. Nevertheless, this is often not a realistic assumption when dealing with nonstationary spatial data and this has motivated a variety of nonstationary spatial modeling workflows such as trend and residual decomposition, cosimulation with secondary features, and spatial segmentation and independent modeling over stationary subdomains. The advent of deep learning technologies has enabled new workflows for modeling spatial relationships. However, there is a paucity of demonstrated best practice and general guidance on mitigation of spatial nonstationarity with deep learning in the geospatial context. We demonstrate the impact of two common types of geostatistical spatial nonstationarity on deep learning model prediction performance and propose the mitigation of such impacts using self-attention (vision transformer) models. We demonstrate the utility of vision transformers for the mitigation of nonstationarity with relative errors as low as 10%, exceeding the performance of alternative deep learning methods such as convolutional neural networks. We establish best practice by demonstrating the ability of self-attention networks for modeling large-scale spatial relationships in the presence of commonly observed geospatial nonstationarity

    Optimal Placement of Public Electric Vehicle Charging Stations Using Deep Reinforcement Learning

    Full text link
    The placement of charging stations in areas with developing charging infrastructure is a critical component of the future success of electric vehicles (EVs). In Albany County in New York, the expected rise in the EV population requires additional charging stations to maintain a sufficient level of efficiency across the charging infrastructure. A novel application of Reinforcement Learning (RL) is able to find optimal locations for new charging stations given the predicted charging demand and current charging locations. The most important factors that influence charging demand prediction include the conterminous traffic density, EV registrations, and proximity to certain types of public buildings. The proposed RL framework can be refined and applied to cities across the world to optimize charging station placement.Comment: 25 pages with 12 figures. Shankar Padmanabhan and Aidan Petratos provided equal contributio

    Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy

    No full text
    Highlights • Our data split method handles spatial autocorrelation and imposes prediction fairness. • The sets impose fair algorithms with similar difficulty in all machine learning steps. • Kriging variance is a surrogate of spatial prediction difficulty. • The resulting training and test sets are compatible with any machine learning model. Machine learning supports prediction and inference in multivariate and complex datasets where observations are spatially related to one another. Frequently, these datasets depict spatial autocorrelation that violates the assumption of identically and independently distributed data. Overlooking this correlation result in over-optimistic models that fail to account for the geographical configuration of data. Furthermore, although different data split methods account for spatial autocorrelation, these methods are inflexible, and the parameter training and hyperparameter tuning of the machine learning model is set with a different prediction difficulty than the planned real-world use of the model. In other words, it is an unfair training-testing process. We present a novel method that considers spatial autocorrelation and planned real-world use of the spatial prediction model to design a fair train-test split. Demonstrations include two examples of the planned real-world use of the model using a realistic multivariate synthetic dataset and the analysis of 148 wells from an undisclosed Equinor play. First, the workflow applies the semivariogram model of the target to compute the simple kriging variance as a proxy of spatial estimation difficulty based on the spatial data configuration. Second, the workflow employs a modified rejection sampling to generate a test set with similar prediction difficulty as the planned real-world use of the model. Third, we compare 100 test sets' realizations to the model's planned real-world use, using probability distributions and two divergence metrics: the Jensen-Shannon distance and the mean squared error. The analysis ranks the spatial fair train-test split method as the only one to replicate the difficulty (i.e., kriging variance) compared to the validation set approach and spatial cross-validation. Moreover, the proposed method outperforms the validation set approach, yielding a minor mean percentage error when predicting a target feature in an undisclosed Equinor play using a random forest model. The resulting outputs are training and test sets ready for model fit and assessment with any machine learning algorithm. Thus, the proposed workflow offers spatial aware sets ready for predictive machine learning problems with similar estimation difficulty as the planned real-world use of the model and compatible with any spatial data analysis task

    Training Images from Process-Imitating Methods : An Application to the Lower Namoi Aquifer, Murray-Darling Basin, Australia

    No full text
    The lack of a suitable training image is one of the main limitations of the application of multiple-point statistics (MPS) for the characterization of heterogeneity in real case studies. Process-imitating facies modeling techniques can potentially provide training images. However, the parameterization of these process-imitating techniques is not straightforward. Moreover, reproducing the resulting heterogeneous patterns with standard MPS can be challenging. Here the statistical properties of the paleoclimatic data set are used to select the best parameter sets for the process-imitating methods. The data set is composed of 278 lithological logs drilled in the lower Namoi catchment, New South Wales, Australia. A good understanding of the hydrogeological connectivity of this aquifer is needed to tackle groundwater management issues. The spatial variability of the facies within the lithological logs and calculated models is measured using fractal dimension, transition probability, and vertical facies proportion. To accommodate the vertical proportions trend of the data set, four different training images are simulated. The grain size is simulated alongside the lithological codes and used as an auxiliary variable in the direct sampling implementation of MPS. In this way, one can obtain conditional MPS simulations that preserve the quality and the realism of the training images simulated with the process-imitating method. The main outcome of this study is the possibility of obtaining MPS simulations that respect the statistical properties observed in the real data set and honor the observed conditioning data, while preserving the complex heterogeneity generated by the process-imitating method. In addition, it is demonstrated that an equilibrium of good fit among all the statistical properties of the data set should be considered when selecting a suitable set of parameters for the process-imitating simulations
    corecore