Search CORE

9 research outputs found

Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation

Author: Mabadeje Ademide O.
Pyrcz Michael J.
Publication venue
Publication date: 15/08/2023
Field of study

Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.Comment: 30 pages, 17 figures, Submitted to Computational Geosciences Journa

arXiv.org e-Print Archive

Recommended from our members

Improved integration of information to reduce subsurface model bias

Author: Mabadeje Ademide O.
Publication venue
Publication date: 17/07/2024
Field of study

Subsurface modeling deals with data-related issues like cognitive and sampling biases, and model-related challenges including statistical assumptions, misspecification, and algorithmic biases. These challenges introduce four critical implications during subsurface modeling. Firstly, subsurface sampling is subject to sampling bias, which compromises statistical representativeness. Secondly, analog selection methodologies rely on multivariate statistics and expert judgment that overlook spatial information and data dimensionality. Thirdly, subsurface inferential workflows that utilize dimensionality reduction seldom provide repeatable frameworks that maintain model stability and are invariant to Euclidean transformations. Lastly, deep learning methods for dimensionality reduction, characterized as black-box models, lack interpretability and robust evaluation metrics, increasing susceptibility to algorithmic bias. Consequently, neglecting these challenges in subsurface modeling could lead to erroneous predictions, inconsistent inferences, diminished model reliability, and suboptimal decision-making that impacts project economics. This dissertation integrates information within subsurface models to reduce model bias and significantly improve their accuracy, robustness, and generalizability. First, I create spatial declustering methods to debias spatial datasets with single and multiscale preferential sampling in stationary populations. Second, I introduce a novel geostatistics-based machine learning method for identifying subsurface resource analogs that integrate spatial information in subsurface datasets with high dimensionality. Next, I efficiently combine machine learning and computational geometry methods to stabilize lower dimensional spaces for uncertainty quantification and interpretation. Finally, I create a methodology to assess, evaluate, and interpret the stability of deep learning latent feature spaces. These novel methodologies demonstrate the importance of improved techniques for information integration in subsurface modeling and show better results over naïve methods. This results in objective sampling debiasing in spatial stationary populations with single or multiple data scales, improving statistical representativity. Also, the results show better generalization and accurate identification of spatial analogs in high-dimensional datasets. Moreover, the methods yield Euclidean transformation-invariant lower-dimensional spaces, ensuring unique and repeatable solutions that improve model reliability and interpretability, for rational comparisons. Finally, the results indicate that deep learning models for dimensionality reduction exhibit algorithmic biases and instabilities, including sample, structural, and inferential instability, affecting their reliability and interpretability. Together, these innovations ultimately reduce model bias and significantly improve subsurface modeling.Petroleum and Geosystems Engineerin

Texas ScholarWorks

Metodología de visualización de datos utilizando métodos espectrales y basados en divergencias para la reducción interactiva de la dimensión

Author: Anaya Isaza Andres Javier
Publication venue: Maestría en Ingeniería de Sistemas y Computación
Publication date: 01/01/2018
Field of study

Las tareas de reconocimiento de patrones aplican métodos que evolucionan de manera equivalente al crecimiento de los datos, alcanzando métricas eficientes en términos de optimización y rendimiento computacional aplicado a exploración, selección y representación de datos. No obstante, los resultados brindados por dichos métodos y herramientas podrían resultar ambiguos y/o abstractos para el usuario, haciendo que su aplicación sea compleja, aun mas si no cuentan con un conocimiento previo de los datos. Tener un conocimiento a priori garantiza en el mayor de los casos la correcta selección del modelo, así como también algoritmos y métodos adecuados. Sin embargo, en datos masivos, donde este conocimiento es escaso y poco factible, los procesos de interpretación podrían ser arduos para los usuarios, especialmente, para aquellos usuarios no expertos. En consecuencia, han surgido diversos problemas que debe enfrentar el reconocimiento de patrones, entre los más importantes se encuentran: La reducción de dimensión, la interacción con grandes volúmenes de información, la interpretación y la visualización de los datos. Lo anterior puede enmarcar conceptos de controlabilidad e interacción que son propiedades, en su mayoría, ausentes en las investigaciones típicas dentro del campo de reducción de dimensión. Esta tesis presenta un nuevo enfoque de visualización de datos, basada en la mezcla interactiva de resultados de los métodos de reducción de dimensional dad (RD). Tal mezcla es una suma ponderada, cuyos factores de ponderación son definidos por el usuario a través de una interfaz visual e intuitiva. Además, el espacio de representación de baja dimensión producida por métodos de (RD) se representan gráficamente mediante diagramas de dispersión alimentados a través de una visualización de datos interactiva controlada. Para ello, se calculan las distancias entre pares por similitud y se emplean para definir el grafico a representar en el diagrama de dispersión..

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Metodología de visualización de datos utilizando métodos espectrales y basados en divergencias para la reducción interactiva de la dimensión

Author: Anaya Isaza Andres Javier
Publication venue: Maestría en Ingeniería de Sistemas y Computación
Publication date: 01/01/2018
Field of study

Repositorio academico de la Universidad Tecnológica de Pereira

Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)

Author: Barnaghi P
Fathy Y
Tafazolli R
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/03/2018
Field of study

Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data are usually multi-variant streams that are heterogeneous, sporadic, multi-modal, and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types, and qualities. Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery, and ranking mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data. However, the existing IoT data indexing and discovery approaches are complex or centralised, which hinders their scalability. The primary objective of this article is to provide a holistic overview of the state-of-the-art on indexing, discovery, and ranking of IoT data. The article aims to pave the way for researchers to design, develop, implement, and evaluate techniques and approaches for on-line large-scale distributed IoT applications and services

UCL Discovery

University of Surrey

Surrey Research Insight

Low dimension hierarchical subspace modelling of high dimensional data

Author: Samko Oksana
Publication venue
Publication date
Field of study

Building models of high-dimensional data in a low dimensional space has become extremely popular in recent years. Motion tracking, facial animation, stock market tracking, digital libraries and many other different models have been built and tuned to specific application domains. However, when the underlying structure of the original data is unknown, the modelling of such data is still an open question. The problem is of interest as capturing and storing large amounts of high dimensional data has become trivial, yet the capability to process, interpret, and use this data is limited. In this thesis, we introduce novel algorithms for modelling high dimensional data with an unknown structure, which allows us to represent the data with good accuracy and in a compact manner. This work presents a novel fully automated dynamic hierarchical algorithm, together with a novel automatic data partitioning method to work alongside existing specific models (talking head, human motion). Our algorithm is applicable to hierarchical data visualisation and classification, meaningful pattern extraction and recognition, and new data sequence generation. Also during our work we investigated problems related to low dimensional data representation: automatic optimal input parameter estimation, and robustness against noise and outliers. We show the potential of our modelling with many data domains: talking head, motion, audio, etc. and we believe that it has good potential in adapting to other domains

Online Research @ Cardiff

Adaptive Regression Methods with Application to Streaming Financial Data

Author: Tsagaris Theodoros
Tsagaris Theodoros
Publication venue: Mathematics, Imperial College London
Publication date: 01/12/2010
Field of study

This thesis is concerned with the analysis of adaptive incremental regression algorithms for data streams. The development of these algorithms is motivated by issues pertaining to financial data streams, data which are very noisy, non-stationary and exhibit high degrees of dependence. These incremental regression techniques are subsequently used to develop efficient and adaptive algorithms for portfolio allocation. We develop a number of temporally incremental regression algorithms that have the following attributes; efficiency: the algorithms are iterative, robustness: the algorithms have a built-in safeguard for outliers and/or use regularisation techniques to alleviate for estimation error, and adaptiveness: the algorithms estimation is adaptive to the underlying streaming data. These algorithms make use of known regression techniques: EWRLS (Exponentially Weighted Recursive Least Squares), TSVD (Truncated Singular Value Decomposition) and FLS (Flexible Least Squares). We focus more of our attention on a proposed robust version of EWRLS algorithm, denoted R-EWRLS, and assess its robustness using a purpose built simulation engine. This simulation engine is able to generate correlated data streams whose drift and correlation change over time and can be subjected to randomly generated outliers whose magnitudes and directions vary. The R-EWRLS algorithm is developed further to allow for a self-tuned forgetting factor in the formulation. The forgetting factor is an important tool to account for non-stationarity in the data through an exponential decay profile which assigns more weight to the more recent data. The new algorithm is assessed against the R-EWRLS algorithm using various performance measures. A number of applications with real data from equities and foreign exchange are used. Various measures are computed to compare our algorithms to established portfolio allocation techniques. The results are promising and in many cases outperform benchmark allocation techniques

Spiral - Imperial College Digital Repository

Nonlinear Manifold Learning for Data Stream

Author: Anil K. Jain
Martin H. C. Law
Nan Zhang
Publication venue
Publication date: 01/01/2004
Field of study

There has been a renewed interest in understanding the structure of high dimensional data set based on manifold learning. Examples include ISOMAP [25], LLE [20] and Laplacian Eigenmap [2] algorithms. Most of these algorithms operate in a "batch" mode and cannot be applied e#ciently for a data stream. We propose an incremental version of ISOMAP. Our experiments not only demonstrate the accuracy and e#ciency of the proposed algorithm, but also reveal interesting behavior of the ISOMAP as the size of available data increases

CiteSeerX