1,599 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Seismic Ray Impedance Inversion
This thesis investigates a prestack seismic inversion scheme implemented in the ray
parameter domain. Conventionally, most prestack seismic inversion methods are
performed in the incidence angle domain. However, inversion using the concept of
ray impedance, as it honours ray path variation following the elastic parameter
variation according to Snell’s law, shows the capacity to discriminate different
lithologies if compared to conventional elastic impedance inversion.
The procedure starts with data transformation into the ray-parameter domain and then
implements the ray impedance inversion along constant ray-parameter profiles. With
different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated
based on the high-order statistics of the data and further refined after a proper well-to-seismic
tie. With the estimated wavelets ready, a Cauchy inversion method is used to
invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity
sequences for blocky impedance inversion. The impedance inversion from reflectivity
sequences adopts a standard generalised linear inversion scheme, whose results are
utilised to identify rock properties and facilitate quantitative interpretation. It has also
been demonstrated that we can further invert elastic parameters from ray impedance
values, without eliminating an extra density term or introducing a Gardner’s relation
to absorb this term.
Ray impedance inversion is extended to P-S converted waves by introducing the
definition of converted-wave ray impedance. This quantity shows some advantages in
connecting prestack converted wave data with well logs, if compared with the shearwave
elastic impedance derived from the Aki and Richards approximation to the
Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of
ray impedance is conducted through a real multicomponent dataset, which can reduce
the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis
as we believe it can render robust solutions to geophysical problems. Apart from the
reflectivity sequence, ray impedance and elastic parameter inversion mentioned above,
inversion methods are also adopted in transforming the prestack data from the offset
domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the
registration of P-P and P-S waves for the joint analysis.
The ray impedance inversion methods are successfully applied to different types of
datasets. In each individual step to achieving the ray impedance inversion, advantages,
disadvantages as well as limitations of the algorithms adopted are detailed. As a
conclusion, the ray impedance related analyses demonstrated in this thesis are highly
competent compared with the classical elastic impedance methods and the author
would like to recommend it for a wider application
Circulant singular spectrum analysis: a new automated procedure for signal extraction
Sometimes, it is of interest to single out the fluctuations associated to a given frequency. We propose a new variant of SSA, Circulant SSA (CiSSA), that allows to extract the signal associated to any frequency specified beforehand. This is a novelty when compared with other SSA procedures that need to iden- tify ex-post the frequencies associated to the extracted signals. We prove that CiSSA is asymptotically equivalent to these alternative procedures although with the advantage of avoiding the need of the subse- quent frequency identification. We check its good performance and compare it to alternative SSA methods through several simulations for linear and nonlinear time series. We also prove its validity in the nonsta- tionary case. We apply CiSSA in two different fields to show how it works with real data and find that it behaves successfully in both applications. Finally, we compare the performance of CiSSA with other state of the art techniques used for nonlinear and nonstationary signals with amplitude and frequency varying in time.MINECO/FEDE
Circulant singular spectrum analysis: A new automated procedure for signal extraction
Sometimes, it is of interest to single out the fluctuations associated to a given frequency. We propose a new variant of SSA, Circulant SSA (CiSSA), that allows to extract the signal associated to any frequency specified beforehand. This is a novelty when compared with other SSA procedures that need to identify ex-post the frequencies associated to the extracted signals. We prove that CiSSA is asymptotically equivalent to these alternative procedures although with the advantage of avoiding the need of the subsequent frequency identification. We check its good performance and compare it to alternative SSA methods through several simulations for linear and nonlinear time series. We also prove its validity in the nonstationary case. We apply CiSSA in two different fields to show how it works with real data and find that it behaves successfully in both applications. Finally, we compare the performance of CiSSA with other state of the art techniques used for nonlinear and nonstationary signals with amplitude and frequency varying in timeFinancial support from the Spanish government, contract grants MINECO/FEDER ECO2015-70331-C2-1-R, ECO2015-66593-P, ECO2016-76818-C3-3-P, PID2019-107161GB-C32 and PID2019-108079GB-C22 is acknowledge
Mapping and monitoring forest remnants : a multiscale analysis of spatio-temporal data
KEYWORDS : Landsat, time series, machine learning, semideciduous Atlantic forest, Brazil, wavelet transforms, classification, change detectionForests play a major role in important global matters such as carbon cycle, climate change, and biodiversity. Besides, forests also influence soil and water dynamics with major consequences for ecological relations and decision-making. One basic requirement to quantify and model these processes is the availability of accurate maps of forest cover. Data acquisition and analysis at appropriate scales is the keystone to achieve the mapping accuracy needed for development and reliable use of ecological models.The current and upcoming production of high-resolution data sets plus the ever-increasing time series that have been collected since the seventieth must be effectively explored. Missing values and distortions further complicate the analysis of this data set. Thus, integration and proper analysis is of utmost importance for environmental research. New conceptual models in environmental sciences, like the perception of multiple scales, require the development of effective implementation techniques.This thesis presents new methodologies to map and monitor forests on large, highly fragmented areas with complex land use patterns. The use of temporal information is extensively explored to distinguish natural forests from other land cover types that are spectrally similar. In chapter 4, novel schemes based on multiscale wavelet analysis are introduced, which enabled an effective preprocessing of long time series of Landsat data and improved its applicability on environmental assessment.In chapter 5, the produced time series as well as other information on spectral and spatial characteristics were used to classify forested areas in an experiment relating a number of combinations of attribute features. Feature sets were defined based on expert knowledge and on data mining techniques to be input to traditional and machine learning algorithms for pattern recognition, viz . maximum likelihood, univariate and multivariate decision trees, and neural networks. The results showed that maximum likelihood classification using temporal texture descriptors as extracted with wavelet transforms was most accurate to classify the semideciduous Atlantic forest in the study area.In chapter 6, a multiscale approach to digital change detection was developed to deal with multisensor and noisy remotely sensed images. Changes were extracted according to size classes minimising the effects of geometric and radiometric misregistration.Finally, in chapter 7, an automated procedure for GIS updating based on feature extraction, segmentation and classification was developed to monitor the remnants of semideciduos Atlantic forest. The procedure showed significant improvements over post classification comparison and direct multidate classification based on artificial neural networks.</p
On-Line Learning and Wavelet-Based Feature Extraction Methodology for Process Monitoring using High-Dimensional Functional Data
The recent advances in information technology, such as the various automatic data acquisition systems and sensor systems, have created tremendous opportunities for collecting valuable process data. The timely processing of such data for meaningful information remains a challenge. In this research, several data mining methodology that will aid information streaming of high-dimensional functional data are developed.
For on-line implementations, two weighting functions for updating support vector regression parameters were developed. The functions use parameters that can be easily set a priori with the slightest knowledge of the data involved and have provision for lower and upper bounds for the parameters. The functions are applicable to time series predictions, on-line predictions, and batch predictions. In order to apply these functions for on-line predictions, a new on-line support vector regression algorithm that uses adaptive weighting parameters was presented. The new algorithm uses varying rather than fixed regularization constant and accuracy parameter. The developed algorithm is more robust to the volume of data available for on-line training as well as to the relative position of the available data in the training sequence. The algorithm improves prediction accuracy by reducing uncertainty in using fixed values for the regression parameters. It also improves prediction accuracy by reducing uncertainty in using regression values based on some experts’ knowledge rather than on the characteristics of the incoming training data. The developed functions and algorithm were applied to feedwater flow rate data and two benchmark time series data. The results show that using adaptive regression parameters performs better than using fixed regression parameters.
In order to reduce the dimension of data with several hundreds or thousands of predictors and enhance prediction accuracy, a wavelet-based feature extraction procedure called step-down thresholding procedure for identifying and extracting significant features for a single curve was developed. The procedure involves transforming the original spectral into wavelet coefficients. It is based on multiple hypothesis testing approach and it controls family-wise error rate in order to guide against selecting insignificant features without any concern about the amount of noise that may be present in the data. Therefore, the procedure is applicable for data-reduction and/or data-denoising. The procedure was compared to six other data-reduction and data-denoising methods in the literature. The developed procedure is found to consistently perform better than most of the popular methods and performs at the same level with the other methods.
Many real-world data with high-dimensional explanatory variables also sometimes have multiple response variables; therefore, the selection of the fewest explanatory variables that show high sensitivity to predicting the response variable(s) and low sensitivity to the noise in the data is important for better performance and reduced computational burden. In order to select the fewest explanatory variables that can predict each of the response variables better, a two-stage wavelet-based feature extraction procedure is proposed. The first stage uses step-down procedure to extract significant features for each of the curves. Then, representative features are selected out of the extracted features for all curves using voting selection strategy. Other selection strategies such as union and intersection were also described and implemented. The essence of the first stage is to reduce the dimension of the data without any consideration for whether or not they can predict the response variables accurately. The second stage uses Bayesian decision theory approach to select some of the extracted wavelet coefficients that can predict each of the response variables accurately. The two stage procedure was implemented using near-infrared spectroscopy data and shaft misalignment data. The results show that the second stage further reduces the dimension and the prediction results are encouraging
- …