29 research outputs found
A Survey on Distributed Fibre Optic Sensor Data Modelling Techniques and Machine Learning Algorithms for Multiphase Fluid Flow Estimation
Real-time monitoring of multiphase fluid flows with distributed fibre optic sensing has the potential to play a major role in industrial flow measurement applications. One such application is the optimization of hydrocarbon production to maximize short-term income, and prolong the operational lifetime of production wells and the reservoir. While the measurement technology itself is well understood and developed, a key remaining challenge is the establishment of robust data analysis tools that are capable of providing real-time conversion of enormous data quantities into actionable process indicators. This paper provides a comprehensive technical review of the data analysis techniques for distributed fibre optic technologies, with a particular focus on characterizing fluid flow in pipes. The review encompasses classical methods, such as the speed of sound estimation and Joule-Thomson coefficient, as well as their data-driven machine learning counterparts, such as Convolutional Neural Network (CNN), Support Vector Machine (SVM), and Ensemble Kalman Filter (EnKF) algorithms. The study aims to help end-users establish reliable, robust, and accurate solutions that can be deployed in a timely and effective way, and pave the wave for future developments in the field.publishedVersio
Automatic analysis of X (Twitter) data for supporting depression diagnosis
Depression is an increasingly common problem that often goes undiagnosed. The aim of this paper was to determine whether an analysis of tweets can serve as a proxy for assessing depression levels in the society. The work considered keyword-based sentiment analysis, which was enhanced to exclude informational tweets about depression or about recovery. The results demonstrated the words used in the posts most often and the emotional polarity of the tweets. A schedule of user activity was mapped out and trends related to daily activity of users were analyzed. It was observed that the identified X (Twitter) activity related to depression corresponded well with reports on persons with depression and statistics related to suicidal deaths. Therefore, it could be construed that people with undiagnosed depression express their feelings in social media more often, looking, in this way, for help with their emotional problems.publishedVersio
Recurrent Neural Networks for Artifact Correction in HRV Data During Physical Exercise
In this paper, we propose the use of recurrent neural networks (RNNs) for artifact correction and analysis of heart rate variability (HRV) data. HRV can be a valuable metric for determining the function of the heart and the autonomic nervous system. When measured during exercise, motion artifacts present a significant challenge. Several methods for artifact correction have previously been proposed, none of them applying machine learning, and each presenting some limitations regarding an accurate representation of HRV metrics. RNNs offer the ability to capture patterns that might otherwise not be detected, yielding predictions where no prior physiological assumptions are needed.
A hyperparameter search has been carried out to determine the best network configuration and the most important hyperparameters. The approach was tested on two extensive multi-subject data sets, one from a recreational bicycle race and the other from a laboratory experiment. The results demonstrate that RNNs outperform by order of magnitude existing methods with respect to the calculation of derived HRV metrics. However, they are not able to accurately fill in individual missing RR intervals in sequence. Future research should pursue improvements in the prediction of RR interval lengths and reduction in necessary training data.publishedVersio
Optimizing support vector machines and autoregressive integrated moving average methods for heart rate variability data correction
Heart rate variability (HRV) is the variation in time between successive heartbeats and can be used as an indirect measure of autonomic nervous system (ANS) activity. During physical exercise, movement of the measuring device can cause artifacts in the HRV data, severely affecting the analysis of the HRV data. Current methods used for data artifact correction perform insufficiently when HRV is measured during exercise. In this paper we propose the use of autoregressive integrated moving average (ARIMA) and support vector regression (SVR) for HRV data artifact correction. Since both methods are only trained on previous data points, they can be applied not only for correction (i.e., gap filling), but also prediction (i.e., forecasting future values). Our paper describes:
• why HRV is difficult to predict and why ARIMA and SVR might be valuable options.
• finding the best hyperparameters for using ARIMA and SVR to correct HRV data, including which criterion to use for choosing the best model.
• which correction method should be used given the data at hand.publishedVersio
The relationship between workload and exercise-induced cardiac troponin elevations is influenced by non-obstructive coronary atherosclerosis
The relationship between exercise-induced troponin elevation and non-obstructive coronary artery disease (CAD) is unclear. This observational study assessed non-obstructive CAD's impact on exercise-induced cardiac Troponin I (cTnI) elevation in middle-aged recreational athletes. cTnI levels of 40 well-trained recreational athletes (73% males, 50 ± 9 years old) were assessed by a high-sensitive cTnI assay 24 h before, and at 3 and 24 h following two high-intensity exercises of different durations; a cardiopulmonary exercise test (CPET), and a 91-km mountain bike race. Workload was measured with power meters. Coronary computed tomography angiography was used to determine the presence or absence of non-obstructive (<50% obstruction) CAD. A total of 15 individuals had non-obstructive CAD (Atherosclerotic group), whereas 25 had no atherosclerosis (normal). There were higher post-exercise cTnI levels following the race compared with CPET, both at 3 h (77.0 (35.3–112.4) ng/L vs. 11.6 (6.4–22.5) ng/L, p < 0.001) and at 24 h (14.7 (6.7–16.3) vs. 5.0 (2.6–8.9) ng/L, p < 0.001). Absolute cTnI values did not differ among groups. Still, the association of cTnI response to power output was significantly stronger in the CAD versus Normal group both at 3 h post-exercise (Rho = 0.80, p < 0.001 vs. Rho = −0.20, p = 0.33) and 24-h post-exercise (Rho = 0.87, p < 0.001 vs. Rho = −0.13, p = 0.55). Exercise-induced cTnI elevation was strongly correlated with exercise workload in middle-aged athletes with non-obstructive CAD but not in individuals without CAD. This finding suggests that CAD influences the relationship between exercise workload and the cTnI response even without coronary artery obstruction.publishedVersio
Extended approach to sum of absolute differences method for improved identification of periods in biomedical time series
Time series are a common data type in biomedical applications. Examples include heart rate, power output, and ECG. One of the typical analysis methods is to determine longest period a subject spent over a given heart rate threshold. While it might seem simple to find and measure such periods, biomedical data are often subject to significant noise and physiological artifacts. As a result, simple threshold calculations might not provide correct or expected results. A common way to improve such calculations is to use moving average filter. Length of the window is often determined using sum of absolute differences for various windows sizes. However, for real life biomedical data such approach might lead to extremely long windows that undesirably remove physiological information from the data. In this paper, we:
• propose a new approach to finding windows length using zero-points of third gradient (jerk) of Sum of Absolute Differences method;
• demonstrate how these points can be used to determine periods and area over a given threshold with and without uncertainty.
We demonstrate validity of this approach on the PAMAP2 Physical Activity Monitoring Data Set, an open dataset from the UCI Machine Learning Repository, as well as on the PhysioNet Simultaneous Physiological Measurements dataset. It shows that first zero-point usually falls at around 8 and 5 second window length respectively, while second zero-point usually falls between 16 and 24 and 8–16 s respectively. The value for the first zero-point can remove simple measurement errors when data are recorded once every few seconds. The value for the second zero-point corresponds well with what is known about physiological response of heart to changing load.publishedVersio
Reference Dataset for Rate of Penetration Benchmarking
In recent years, there were multiple papers published related to rate of penetration prediction using machine learning vastly outperforming analytical methods. There are models proposed reportedly achieving R2 values as high as 0.996. Unfortunately, it is most often impossible to independently verify these claims as the input data is rarely accessible to others. To solve this problem, this paper presents a database derived from Equinor's public Volve dataset that will serve as a benchmark for rate of penetration prediction methods. By providing a partially processed dataset with unambiguous testing scenarios, scientists can perform machine learning research on a level playing field. This in turn will both discourage publication of methods tested in a substandard manner as well as promote exploration of truly superior solutions. A set of seven wells with nearly 200–000 samples and twelve common attributes is proposed together with reference results from common machine learning algorithms. Data and relevant source code are published on the pages of University of Stavanger and GitHub.publishedVersio
Methods for preprocessing time and distance series data from personal monitoring devices
There is a need to develop more advanced tools to improve guidance on physical exercise to reduce risk of adverse events and improve benefits of exercise. Vast amounts of data are generated continuously by Personal Monitoring Devices (PMDs) from sports events, biomedical experiments, and fitness self-monitoring that may be used to guide physical exercise. Most of these data are sampled as time- or distance-series. However, the inherent high-dimensionality of exercise data is a challenge during processing. As a result, current data analysis from PMDs seldomly extends beyond aggregates.
Common challanges are:
• alterations in data density comparing the time- and the distance domain;
• large intra and interindividual variations in the relationship between numerical data and physiological properties;
• alterations in temporal statistical properties of data derived from exercise of different exercise durations.
These challenges are currently unresolved leading to suboptimal analytic models. In this paper, we present algorithms and approaches to address these problems, allowing the analysis of complete PMD datasets, rather than having to rely on cumulative statistics. Our suggested approaches permit effective application of established Symbolic Aggregate Approximation modeling and newer deep learning models, such as LSTM.publishedVersio
Extended approach to sum of absolute differences method for improved identification of periods in biomedical time series
Time series are a common data type in biomedical applications. Examples include heart rate, power output, and ECG. One of the typical analysis methods is to determine longest period a subject spent over a given heart rate threshold. While it might seem simple to find and measure such periods, biomedical data are often subject to significant noise and physiological artifacts. As a result, simple threshold calculations might not provide correct or expected results. A common way to improve such calculations is to use moving average filter. Length of the window is often determined using sum of absolute differences for various windows sizes. However, for real life biomedical data such approach might lead to extremely long windows that undesirably remove physiological information from the data. In this paper, we:
• propose a new approach to finding windows length using zero-points of third gradient (jerk) of Sum of Absolute Differences method;
• demonstrate how these points can be used to determine periods and area over a given threshold with and without uncertainty.
We demonstrate validity of this approach on the PAMAP2 Physical Activity Monitoring Data Set, an open dataset from the UCI Machine Learning Repository, as well as on the PhysioNet Simultaneous Physiological Measurements dataset. It shows that first zero-point usually falls at around 8 and 5 second window length respectively, while second zero-point usually falls between 16 and 24 and 8–16 s respectively. The value for the first zero-point can remove simple measurement errors when data are recorded once every few seconds. The value for the second zero-point corresponds well with what is known about physiological response of heart to changing load