221 research outputs found
Recommended from our members
The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code
Relative Geologic Time By Dynamic Time Warping
This thesis considers an approach to tackle a core problem within seismic interpretation, which is bringing an autonomously generated interpretation of the seismic
data, which is now known as a Relative Geologic Time. The proposed method readily utilizes the method of Dynamic Time Warping, which is an established method
within signal processing. Using Dynamic Time Warping is thought to replicate similar interpretations an interpreter would conduct when fulfilling an interpretation of
the subsurface. Utilizing Dynamic Time Warping to seismic data results in a fully
autonomous interpretation of the subsurface, conducted in minutes and seconds. The
method is simple and extendable, which can easily be further expanded. The workflow established during the thesis work results in a method that successfully produces
an RGT volume. However, problems related to the method must be improved to
enhance the outcome further and diminish errors present in the result. Furthermore,
even with problems associated with the method, potential solutions are described in
detail in the discussion and appendix. Discussion affiliated with previous attempts
in solving Relative Geologic Time volumes is emphasized. The research conducted in
Dynamic Time Warping is promising and emits potential for further research. LaTeX
setup by Gunn and Patel (2017)
Volcan de Fuego: A Machine Learning Approach in Understanding the Eruptive Cycles Using Precursory Tilt Signals
Volcan de Fuego is an active stratovolcano located in the Central Guatemalan segment of the 1100 m long Central America Volcanic Arc System (CAVAS). Fuego-Acatenango massif consists of at least four major vents of which the Fuego summit vent is the most active and the youngest member. The volcano exhibits primarily Strombolian and Vulcanian behavior along with occasional paroxysms and pyroclastic flows. Historically, Fuego has produced basaltic-andesitic rocks with more recent eruptions progressively trending towards maficity. Several studies have used short-term deployments of broadband seismometers, infrasound, and long-term remote sensing techniques to characterize the mechanism of Fuego. In our study, we analyze the tilt derived from transient broadband seismometers and tiltmeter stationed over several days during 2009, 2012, and 2015 near the summit crater using unsupervised learning.
Unsupervised learning has the potential to play a significant role in monitoring volcanoes dominated by large, unlabeled datasets. In our study, we make use of dynamic time warping distance measure along with unsupervised classification methods to identify precursory tilt signals. The unsupervised classification revealed two types of tilt signals with opposite polarity, one of which confirms features identified in previous studies while the other signal has been previously unknown. Template matching implemented with the known signal identified 268 events between October 1, 2015, and January 13, 2016, the duration of which varied between 7 and 39 minutes. The temporal distribution of these events as well as the maximum amplitude of inflation showed clustering activity accompanied by intra-cluster waxing and waning. We created subsets of temporal clusters and calculated repose times between successive events. Auto-correlation functions were calculated for each subset and probability density functions were fitted which support survival/failure processes. The long-term tilt records provided a useful tool to characterize the activity and revealed a near-continuous cyclicity
ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees
Existing systems dealing with the increasing volume of data series cannot
guarantee interactive response times, even for fundamental tasks such as
similarity search. Therefore, it is necessary to develop analytic approaches
that support exploration and decision making by providing progressive results,
before the final and exact ones have been computed. Prior works lack both
efficiency and accuracy when applied to large-scale data series collections. We
present and experimentally evaluate ProS, a new probabilistic learning-based
method that provides quality guarantees for progressive Nearest Neighbor (NN)
query answering. We develop our method for k-NN queries and demonstrate how it
can be applied with the two most popular distance measures, namely, Euclidean
and Dynamic Time Warping (DTW). We provide both initial and progressive
estimates of the final answer that are getting better during the similarity
search, as well suitable stopping criteria for the progressive queries.
Moreover, we describe how this method can be used in order to develop a
progressive algorithm for data series classification (based on a k-NN
classifier), and we additionally propose a method designed specifically for the
classification task. Experiments with several and diverse synthetic and real
datasets demonstrate that our prediction methods constitute the first practical
solutions to the problem, significantly outperforming competing approaches.
This paper was published in the VLDB Journal (2022)
Classification and repeatability studies of transient electromagnetic measurements with respect to the development of CO2-monitoring techniques
The mitigation of greenhouse gases, like CO2 is a challenging aspect for our society. A strategy to hamper the constant emission of CO2 is utilizing carbon capture and storage technologies. CO2 is sequestrated in subsurface reservoirs. However, these reservoirs harbor the risk of leakage and appropriate geophysical monitoring methods are needed. A crucial aspect of monitoring is the assignment of measured data to certain events occurring. Especially if changes in the measured data are small, suitable statistical methods are needed. In this thesis, a new statistical workflow based on cluster analysis is proposed to detect similar transient electromagnetic signals. The similarity criteria dynamic time warping, the autoregressive distance, and the normalized root-mean-square distance are investigated and evaluated with respect to the classic Euclidean norm. The optimal number of clusters is determined using the gap statistic and visualized with multidimensional scaling. To validate the clustering results, silhouette values are used. The statistical workflow is applied to a synthetic data set, a long-term monitoring data set and a repeat measurement at a pilot CO2-sequestration site in Brooks, Alberta
Statistical and deep learning methods for geoscience problems
Machine learning is the new frontier for technology development in geosciences and has developed extremely fast in the past decade. With the increased compute power provided by distributed computing and Graphics Processing Units (GPUs) and their exploitation provided by machine learning (ML) frameworks such as Keras, Pytorch, and Tensorflow, ML algorithms can now solve complex scientific problems. Although powerful, ML algorithms need to be applied to suitable problems conditioned for optimal results. For this reason ML algorithms require not only a deep understanding of the problem but also of the algorithm’s ability. In this dissertation, I show that Simple statistical techniques can often outperform ML-based models if applied correctly.
In this dissertation, I show the success of deep learning in addressing two difficult problems. In the first application I use deep learning to auto-detect the leaks in a carbon capture project using pressure field data acquired from the DOE Cranfield site in Mississippi. I use the history of pressure, rates, and cumulative injection volumes to detect leaks as pressure anomaly. I use a different deep learning workflow to forecast high-energy electrons in Earth’s outer radiation belt using in situ measurements of different space weather parameters such as solar wind density and pressure. I focus on predicting electron fluxes of 2 MeV and higher energy and introduce the ensemble of deep learning models to further improve the results as compared to using a single deep learning architecture.
I also show an example where a carefully constructed statistical approach, guided by the human interpreter, outperforms deep learning algorithms implemented by others. Here, the goal is to correlate multiple well logs across a survey area in order to map not only the thickness, but also to characterize the behavior of stacked gamma ray parasequence sets. Using tools including maximum likelihood estimation (MLE) and dynamic time warping (DTW) provides a means of generating quantitative maps of upward fining and upward coarsening across the oil field. The ultimate goal is to link such extensive well control with the spectral attribute signature of 3D seismic data volumes to provide a detailed maps of not only the depositional history, but also insight into lateral and vertical variation of mineralogy important to the effective completion of shale resource plays
Adversarial Attacks on Time Series
Time series classification models have been garnering significant importance
in the research community. However, not much research has been done on
generating adversarial samples for these models. These adversarial samples can
become a security concern. In this paper, we propose utilizing an adversarial
transformation network (ATN) on a distilled model to attack various time series
classification models. The proposed attack on the classification model utilizes
a distilled model as a surrogate that mimics the behavior of the attacked
classical time series classification models. Our proposed methodology is
applied onto 1-Nearest Neighbor Dynamic Time Warping (1-NN ) DTW, a Fully
Connected Network and a Fully Convolutional Network (FCN), all of which are
trained on 42 University of California Riverside (UCR) datasets. In this paper,
we show both models were susceptible to attacks on all 42 datasets. To the best
of our knowledge, such an attack on time series classification models has never
been done before. Finally, we recommend future researchers that develop time
series classification models to incorporating adversarial data samples into
their training data sets to improve resilience on adversarial samples and to
consider model robustness as an evaluative metric.Comment: 13 pages, 7 figures, 6 table
Deep Time-Series Clustering: A Review
We present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a case study in the context of movement behavior clustering utilizing the deep clustering method. Specifically, we modified the DCAE architectures to suit time-series data at the time of our prior deep clustering work. Lately, several works have been carried out on deep clustering of time-series data. We also review these works and identify state-of-the-art, as well as present an outlook on this important field of DTSC from five important perspectives
Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression
Time Series Extrinsic Regression (TSER) involves using a set of training time
series to form a predictive model of a continuous response variable that is not
directly related to the regressor series. The TSER archive for comparing
algorithms was released in 2022 with 19 problems. We increase the size of this
archive to 63 problems and reproduce the previous comparison of baseline
algorithms. We then extend the comparison to include a wider range of standard
regressors and the latest versions of TSER models used in the previous study.
We show that none of the previously evaluated regressors can outperform a
regression adaptation of a standard classifier, rotation forest. We introduce
two new TSER algorithms developed from related work in time series
classification. FreshPRINCE is a pipeline estimator consisting of a transform
into a wide range of summary features followed by a rotation forest regressor.
DrCIF is a tree ensemble that creates features from summary statistics over
random intervals. Our study demonstrates that both algorithms, along with
InceptionTime, exhibit significantly better performance compared to the other
18 regressors tested. More importantly, these two proposals (DrCIF and
FreshPRINCE) models are the only ones that significantly outperform the
standard rotation forest regressor.Comment: 19 pages, 21 figures, 6 tables. Appendix include
- …