9 research outputs found

    Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation

    Full text link
    Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.Comment: 30 pages, 17 figures, Submitted to Computational Geosciences Journa

    Metodolog铆a de visualizaci贸n de datos utilizando m茅todos espectrales y basados en divergencias para la reducci贸n interactiva de la dimensi贸n

    Get PDF
    Las tareas de reconocimiento de patrones aplican m茅todos que evolucionan de manera equivalente al crecimiento de los datos, alcanzando m茅tricas eficientes en t茅rminos de optimizaci贸n y rendimiento computacional aplicado a exploraci贸n, selecci贸n y representaci贸n de datos. No obstante, los resultados brindados por dichos m茅todos y herramientas podr铆an resultar ambiguos y/o abstractos para el usuario, haciendo que su aplicaci贸n sea compleja, aun mas si no cuentan con un conocimiento previo de los datos. Tener un conocimiento a priori garantiza en el mayor de los casos la correcta selecci贸n del modelo, as铆 como tambi茅n algoritmos y m茅todos adecuados. Sin embargo, en datos masivos, donde este conocimiento es escaso y poco factible, los procesos de interpretaci贸n podr铆an ser arduos para los usuarios, especialmente, para aquellos usuarios no expertos. En consecuencia, han surgido diversos problemas que debe enfrentar el reconocimiento de patrones, entre los m谩s importantes se encuentran: La reducci贸n de dimensi贸n, la interacci贸n con grandes vol煤menes de informaci贸n, la interpretaci贸n y la visualizaci贸n de los datos. Lo anterior puede enmarcar conceptos de controlabilidad e interacci贸n que son propiedades, en su mayor铆a, ausentes en las investigaciones t铆picas dentro del campo de reducci贸n de dimensi贸n. Esta tesis presenta un nuevo enfoque de visualizaci贸n de datos, basada en la mezcla interactiva de resultados de los m茅todos de reducci贸n de dimensional dad (RD). Tal mezcla es una suma ponderada, cuyos factores de ponderaci贸n son definidos por el usuario a trav茅s de una interfaz visual e intuitiva. Adem谩s, el espacio de representaci贸n de baja dimensi贸n producida por m茅todos de (RD) se representan gr谩ficamente mediante diagramas de dispersi贸n alimentados a trav茅s de una visualizaci贸n de datos interactiva controlada. Para ello, se calculan las distancias entre pares por similitud y se emplean para definir el grafico a representar en el diagrama de dispersi贸n..

    Metodolog铆a de visualizaci贸n de datos utilizando m茅todos espectrales y basados en divergencias para la reducci贸n interactiva de la dimensi贸n

    Get PDF
    Las tareas de reconocimiento de patrones aplican m茅todos que evolucionan de manera equivalente al crecimiento de los datos, alcanzando m茅tricas eficientes en t茅rminos de optimizaci贸n y rendimiento computacional aplicado a exploraci贸n, selecci贸n y representaci贸n de datos. No obstante, los resultados brindados por dichos m茅todos y herramientas podr铆an resultar ambiguos y/o abstractos para el usuario, haciendo que su aplicaci贸n sea compleja, aun mas si no cuentan con un conocimiento previo de los datos. Tener un conocimiento a priori garantiza en el mayor de los casos la correcta selecci贸n del modelo, as铆 como tambi茅n algoritmos y m茅todos adecuados. Sin embargo, en datos masivos, donde este conocimiento es escaso y poco factible, los procesos de interpretaci贸n podr铆an ser arduos para los usuarios, especialmente, para aquellos usuarios no expertos. En consecuencia, han surgido diversos problemas que debe enfrentar el reconocimiento de patrones, entre los m谩s importantes se encuentran: La reducci贸n de dimensi贸n, la interacci贸n con grandes vol煤menes de informaci贸n, la interpretaci贸n y la visualizaci贸n de los datos. Lo anterior puede enmarcar conceptos de controlabilidad e interacci贸n que son propiedades, en su mayor铆a, ausentes en las investigaciones t铆picas dentro del campo de reducci贸n de dimensi贸n. Esta tesis presenta un nuevo enfoque de visualizaci贸n de datos, basada en la mezcla interactiva de resultados de los m茅todos de reducci贸n de dimensional dad (RD). Tal mezcla es una suma ponderada, cuyos factores de ponderaci贸n son definidos por el usuario a trav茅s de una interfaz visual e intuitiva. Adem谩s, el espacio de representaci贸n de baja dimensi贸n producida por m茅todos de (RD) se representan gr谩ficamente mediante diagramas de dispersi贸n alimentados a trav茅s de una visualizaci贸n de datos interactiva controlada. Para ello, se calculan las distancias entre pares por similitud y se emplean para definir el grafico a representar en el diagrama de dispersi贸n..

    Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)

    Get PDF
    Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data are usually multi-variant streams that are heterogeneous, sporadic, multi-modal, and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types, and qualities. Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery, and ranking mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data. However, the existing IoT data indexing and discovery approaches are complex or centralised, which hinders their scalability. The primary objective of this article is to provide a holistic overview of the state-of-the-art on indexing, discovery, and ranking of IoT data. The article aims to pave the way for researchers to design, develop, implement, and evaluate techniques and approaches for on-line large-scale distributed IoT applications and services

    Low dimension hierarchical subspace modelling of high dimensional data

    Get PDF
    Building models of high-dimensional data in a low dimensional space has become extremely popular in recent years. Motion tracking, facial animation, stock market tracking, digital libraries and many other different models have been built and tuned to specific application domains. However, when the underlying structure of the original data is unknown, the modelling of such data is still an open question. The problem is of interest as capturing and storing large amounts of high dimensional data has become trivial, yet the capability to process, interpret, and use this data is limited. In this thesis, we introduce novel algorithms for modelling high dimensional data with an unknown structure, which allows us to represent the data with good accuracy and in a compact manner. This work presents a novel fully automated dynamic hierarchical algorithm, together with a novel automatic data partitioning method to work alongside existing specific models (talking head, human motion). Our algorithm is applicable to hierarchical data visualisation and classification, meaningful pattern extraction and recognition, and new data sequence generation. Also during our work we investigated problems related to low dimensional data representation: automatic optimal input parameter estimation, and robustness against noise and outliers. We show the potential of our modelling with many data domains: talking head, motion, audio, etc. and we believe that it has good potential in adapting to other domains

    Adaptive Regression Methods with Application to Streaming Financial Data

    No full text
    This thesis is concerned with the analysis of adaptive incremental regression algorithms for data streams. The development of these algorithms is motivated by issues pertaining to financial data streams, data which are very noisy, non-stationary and exhibit high degrees of dependence. These incremental regression techniques are subsequently used to develop efficient and adaptive algorithms for portfolio allocation. We develop a number of temporally incremental regression algorithms that have the following attributes; efficiency: the algorithms are iterative, robustness: the algorithms have a built-in safeguard for outliers and/or use regularisation techniques to alleviate for estimation error, and adaptiveness: the algorithms estimation is adaptive to the underlying streaming data. These algorithms make use of known regression techniques: EWRLS (Exponentially Weighted Recursive Least Squares), TSVD (Truncated Singular Value Decomposition) and FLS (Flexible Least Squares). We focus more of our attention on a proposed robust version of EWRLS algorithm, denoted R-EWRLS, and assess its robustness using a purpose built simulation engine. This simulation engine is able to generate correlated data streams whose drift and correlation change over time and can be subjected to randomly generated outliers whose magnitudes and directions vary. The R-EWRLS algorithm is developed further to allow for a self-tuned forgetting factor in the formulation. The forgetting factor is an important tool to account for non-stationarity in the data through an exponential decay profile which assigns more weight to the more recent data. The new algorithm is assessed against the R-EWRLS algorithm using various performance measures. A number of applications with real data from equities and foreign exchange are used. Various measures are computed to compare our algorithms to established portfolio allocation techniques. The results are promising and in many cases outperform benchmark allocation techniques

    Nonlinear Manifold Learning for Data Stream

    No full text
    There has been a renewed interest in understanding the structure of high dimensional data set based on manifold learning. Examples include ISOMAP [25], LLE [20] and Laplacian Eigenmap [2] algorithms. Most of these algorithms operate in a "batch" mode and cannot be applied e#ciently for a data stream. We propose an incremental version of ISOMAP. Our experiments not only demonstrate the accuracy and e#ciency of the proposed algorithm, but also reveal interesting behavior of the ISOMAP as the size of available data increases
    corecore