26 research outputs found

    A Survey on Distributed Fibre Optic Sensor Data Modelling Techniques and Machine Learning Algorithms for Multiphase Fluid Flow Estimation

    Get PDF
    Real-time monitoring of multiphase fluid flows with distributed fibre optic sensing has the potential to play a major role in industrial flow measurement applications. One such application is the optimization of hydrocarbon production to maximize short-term income, and prolong the operational lifetime of production wells and the reservoir. While the measurement technology itself is well understood and developed, a key remaining challenge is the establishment of robust data analysis tools that are capable of providing real-time conversion of enormous data quantities into actionable process indicators. This paper provides a comprehensive technical review of the data analysis techniques for distributed fibre optic technologies, with a particular focus on characterizing fluid flow in pipes. The review encompasses classical methods, such as the speed of sound estimation and Joule-Thomson coefficient, as well as their data-driven machine learning counterparts, such as Convolutional Neural Network (CNN), Support Vector Machine (SVM), and Ensemble Kalman Filter (EnKF) algorithms. The study aims to help end-users establish reliable, robust, and accurate solutions that can be deployed in a timely and effective way, and pave the wave for future developments in the field.publishedVersio

    Recurrent Neural Networks for Artifact Correction in HRV Data During Physical Exercise

    Get PDF
    In this paper, we propose the use of recurrent neural networks (RNNs) for artifact correction and analysis of heart rate variability (HRV) data. HRV can be a valuable metric for determining the function of the heart and the autonomic nervous system. When measured during exercise, motion artifacts present a significant challenge. Several methods for artifact correction have previously been proposed, none of them applying machine learning, and each presenting some limitations regarding an accurate representation of HRV metrics. RNNs offer the ability to capture patterns that might otherwise not be detected, yielding predictions where no prior physiological assumptions are needed. A hyperparameter search has been carried out to determine the best network configuration and the most important hyperparameters. The approach was tested on two extensive multi-subject data sets, one from a recreational bicycle race and the other from a laboratory experiment. The results demonstrate that RNNs outperform by order of magnitude existing methods with respect to the calculation of derived HRV metrics. However, they are not able to accurately fill in individual missing RR intervals in sequence. Future research should pursue improvements in the prediction of RR interval lengths and reduction in necessary training data.publishedVersio

    Optimizing support vector machines and autoregressive integrated moving average methods for heart rate variability data correction

    Get PDF
    Heart rate variability (HRV) is the variation in time between successive heartbeats and can be used as an indirect measure of autonomic nervous system (ANS) activity. During physical exercise, movement of the measuring device can cause artifacts in the HRV data, severely affecting the analysis of the HRV data. Current methods used for data artifact correction perform insufficiently when HRV is measured during exercise. In this paper we propose the use of autoregressive integrated moving average (ARIMA) and support vector regression (SVR) for HRV data artifact correction. Since both methods are only trained on previous data points, they can be applied not only for correction (i.e., gap filling), but also prediction (i.e., forecasting future values). Our paper describes: • why HRV is difficult to predict and why ARIMA and SVR might be valuable options. • finding the best hyperparameters for using ARIMA and SVR to correct HRV data, including which criterion to use for choosing the best model. • which correction method should be used given the data at hand.publishedVersio

    Extended approach to sum of absolute differences method for improved identification of periods in biomedical time series

    Get PDF
    Time series are a common data type in biomedical applications. Examples include heart rate, power output, and ECG. One of the typical analysis methods is to determine longest period a subject spent over a given heart rate threshold. While it might seem simple to find and measure such periods, biomedical data are often subject to significant noise and physiological artifacts. As a result, simple threshold calculations might not provide correct or expected results. A common way to improve such calculations is to use moving average filter. Length of the window is often determined using sum of absolute differences for various windows sizes. However, for real life biomedical data such approach might lead to extremely long windows that undesirably remove physiological information from the data. In this paper, we: • propose a new approach to finding windows length using zero-points of third gradient (jerk) of Sum of Absolute Differences method; • demonstrate how these points can be used to determine periods and area over a given threshold with and without uncertainty. We demonstrate validity of this approach on the PAMAP2 Physical Activity Monitoring Data Set, an open dataset from the UCI Machine Learning Repository, as well as on the PhysioNet Simultaneous Physiological Measurements dataset. It shows that first zero-point usually falls at around 8 and 5 second window length respectively, while second zero-point usually falls between 16 and 24 and 8–16 s respectively. The value for the first zero-point can remove simple measurement errors when data are recorded once every few seconds. The value for the second zero-point corresponds well with what is known about physiological response of heart to changing load.publishedVersio

    Reference Dataset for Rate of Penetration Benchmarking

    Get PDF
    In recent years, there were multiple papers published related to rate of penetration prediction using machine learning vastly outperforming analytical methods. There are models proposed reportedly achieving R2 values as high as 0.996. Unfortunately, it is most often impossible to independently verify these claims as the input data is rarely accessible to others. To solve this problem, this paper presents a database derived from Equinor's public Volve dataset that will serve as a benchmark for rate of penetration prediction methods. By providing a partially processed dataset with unambiguous testing scenarios, scientists can perform machine learning research on a level playing field. This in turn will both discourage publication of methods tested in a substandard manner as well as promote exploration of truly superior solutions. A set of seven wells with nearly 200–000 samples and twelve common attributes is proposed together with reference results from common machine learning algorithms. Data and relevant source code are published on the pages of University of Stavanger and GitHub.publishedVersio

    Data-intensive systems: principles and fundamentals using Hadoop and Spark

    No full text

    Methods for preprocessing time and distance series data from personal monitoring devices

    Get PDF
    There is a need to develop more advanced tools to improve guidance on physical exercise to reduce risk of adverse events and improve benefits of exercise. Vast amounts of data are generated continuously by Personal Monitoring Devices (PMDs) from sports events, biomedical experiments, and fitness self-monitoring that may be used to guide physical exercise. Most of these data are sampled as time- or distance-series. However, the inherent high-dimensionality of exercise data is a challenge during processing. As a result, current data analysis from PMDs seldomly extends beyond aggregates. Common challanges are: • alterations in data density comparing the time- and the distance domain; • large intra and interindividual variations in the relationship between numerical data and physiological properties; • alterations in temporal statistical properties of data derived from exercise of different exercise durations. These challenges are currently unresolved leading to suboptimal analytic models. In this paper, we present algorithms and approaches to address these problems, allowing the analysis of complete PMD datasets, rather than having to rely on cumulative statistics. Our suggested approaches permit effective application of established Symbolic Aggregate Approximation modeling and newer deep learning models, such as LSTM.publishedVersio

    Extended approach to sum of absolute differences method for improved identification of periods in biomedical time series

    No full text
    Time series are a common data type in biomedical applications. Examples include heart rate, power output, and ECG. One of the typical analysis methods is to determine longest period a subject spent over a given heart rate threshold. While it might seem simple to find and measure such periods, biomedical data are often subject to significant noise and physiological artifacts. As a result, simple threshold calculations might not provide correct or expected results. A common way to improve such calculations is to use moving average filter. Length of the window is often determined using sum of absolute differences for various windows sizes. However, for real life biomedical data such approach might lead to extremely long windows that undesirably remove physiological information from the data. In this paper, we: • propose a new approach to finding windows length using zero-points of third gradient (jerk) of Sum of Absolute Differences method; • demonstrate how these points can be used to determine periods and area over a given threshold with and without uncertainty. We demonstrate validity of this approach on the PAMAP2 Physical Activity Monitoring Data Set, an open dataset from the UCI Machine Learning Repository, as well as on the PhysioNet Simultaneous Physiological Measurements dataset. It shows that first zero-point usually falls at around 8 and 5 second window length respectively, while second zero-point usually falls between 16 and 24 and 8–16 s respectively. The value for the first zero-point can remove simple measurement errors when data are recorded once every few seconds. The value for the second zero-point corresponds well with what is known about physiological response of heart to changing load

    Artifact Correction in Short-Term HRV during Strenuous Physical Exercise

    Get PDF
    Heart rate variability (HRV) analysis can be a useful tool to detect underlying heart or even general health problems. Currently, such analysis is usually performed in controlled or semi-controlled conditions. Since many of the typical HRV measures are sensitive to data quality, manual artifact correction is common in literature, both as an exclusive method or in addition to various filters. With proliferation of Personal Monitoring Devices with continuous HRV analysis an opportunity opens for HRV analysis in a new setting. However, current artifact correction approaches have several limitations that hamper the analysis of real-life HRV data. To address this issue we propose an algorithm for automated artifact correction that has a minimal impact on HRV measures, but can handle more artifacts than existing solutions. We verify this algorithm based on two datasets. One collected during a recreational bicycle race and another one in a laboratory, both using a PMD in form of a GPS watch. Data include direct measurement of electrical myocardial signals using chest straps and direct measurements of power using a crank sensor (in case of race dataset), both paired with the watch. Early results suggest that the algorithm can correct more artifacts than existing solutions without a need for manual support or parameter tuning. At the same time, the error introduced to HRV measures for peak correction and shorter gaps is similar to the best existing solution (Kubios-inspired threshold-based cubic interpolation) and better than commonly used median filter. For longer gaps, cubic interpolation can in some cases result in lower error in HRV measures, but the shape of the curve it generates matches ground truth worse than our algorithm. It might suggest that further development of the proposed algorithm may also improve these results.publishedVersio

    Impact of data pre-processing techniques on recurrent neural network performance in context of real-time drilling logs in an automated prediction framework

    No full text
    Recurrent neural networks (RNN), which are able to capture temporal natures of a signal, are becoming more common in machine learning applied to petroleum engineering, particularly drilling. With this technology come requirements and caveats related to the input data that play a significant role on resultant models. This paper explores how data pre-processing and attribute selection techniques affect the RNN models’ performance. Re-sampling and down-sampling methods are compared; imputation strategies, a problem generally omitted in published research, are explored and a method to select either last observation carried forward or linear interpolation is introduced and explored in terms of model accuracy. Case studies are performed on real-time drilling logs from the open Volve dataset published by Equinor. For a realistic evaluation, a semi-automated process is proposed for data preparation and model training and evaluation which employs a continuous learning approach for machine learning model updating, where the training dataset is being built continuously while the well is being made. This allows for accurate benchmarking of data pre-processing methods. Included is a previously developed and updated branched custom neural network architecture that includes both recurrent elements as well as row-wise regression elements. Source code for the implementation is published on GitHub.publishedVersio
    corecore