1,599 research outputs found

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Seismic Ray Impedance Inversion

    Get PDF
    This thesis investigates a prestack seismic inversion scheme implemented in the ray parameter domain. Conventionally, most prestack seismic inversion methods are performed in the incidence angle domain. However, inversion using the concept of ray impedance, as it honours ray path variation following the elastic parameter variation according to Snell’s law, shows the capacity to discriminate different lithologies if compared to conventional elastic impedance inversion. The procedure starts with data transformation into the ray-parameter domain and then implements the ray impedance inversion along constant ray-parameter profiles. With different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated based on the high-order statistics of the data and further refined after a proper well-to-seismic tie. With the estimated wavelets ready, a Cauchy inversion method is used to invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity sequences for blocky impedance inversion. The impedance inversion from reflectivity sequences adopts a standard generalised linear inversion scheme, whose results are utilised to identify rock properties and facilitate quantitative interpretation. It has also been demonstrated that we can further invert elastic parameters from ray impedance values, without eliminating an extra density term or introducing a Gardner’s relation to absorb this term. Ray impedance inversion is extended to P-S converted waves by introducing the definition of converted-wave ray impedance. This quantity shows some advantages in connecting prestack converted wave data with well logs, if compared with the shearwave elastic impedance derived from the Aki and Richards approximation to the Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of ray impedance is conducted through a real multicomponent dataset, which can reduce the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis as we believe it can render robust solutions to geophysical problems. Apart from the reflectivity sequence, ray impedance and elastic parameter inversion mentioned above, inversion methods are also adopted in transforming the prestack data from the offset domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the registration of P-P and P-S waves for the joint analysis. The ray impedance inversion methods are successfully applied to different types of datasets. In each individual step to achieving the ray impedance inversion, advantages, disadvantages as well as limitations of the algorithms adopted are detailed. As a conclusion, the ray impedance related analyses demonstrated in this thesis are highly competent compared with the classical elastic impedance methods and the author would like to recommend it for a wider application

    Circulant singular spectrum analysis: a new automated procedure for signal extraction

    Get PDF
    Sometimes, it is of interest to single out the fluctuations associated to a given frequency. We propose a new variant of SSA, Circulant SSA (CiSSA), that allows to extract the signal associated to any frequency specified beforehand. This is a novelty when compared with other SSA procedures that need to iden- tify ex-post the frequencies associated to the extracted signals. We prove that CiSSA is asymptotically equivalent to these alternative procedures although with the advantage of avoiding the need of the subse- quent frequency identification. We check its good performance and compare it to alternative SSA methods through several simulations for linear and nonlinear time series. We also prove its validity in the nonsta- tionary case. We apply CiSSA in two different fields to show how it works with real data and find that it behaves successfully in both applications. Finally, we compare the performance of CiSSA with other state of the art techniques used for nonlinear and nonstationary signals with amplitude and frequency varying in time.MINECO/FEDE

    Circulant singular spectrum analysis: A new automated procedure for signal extraction

    Get PDF
    Sometimes, it is of interest to single out the fluctuations associated to a given frequency. We propose a new variant of SSA, Circulant SSA (CiSSA), that allows to extract the signal associated to any frequency specified beforehand. This is a novelty when compared with other SSA procedures that need to identify ex-post the frequencies associated to the extracted signals. We prove that CiSSA is asymptotically equivalent to these alternative procedures although with the advantage of avoiding the need of the subsequent frequency identification. We check its good performance and compare it to alternative SSA methods through several simulations for linear and nonlinear time series. We also prove its validity in the nonstationary case. We apply CiSSA in two different fields to show how it works with real data and find that it behaves successfully in both applications. Finally, we compare the performance of CiSSA with other state of the art techniques used for nonlinear and nonstationary signals with amplitude and frequency varying in timeFinancial support from the Spanish government, contract grants MINECO/FEDER ECO2015-70331-C2-1-R, ECO2015-66593-P, ECO2016-76818-C3-3-P, PID2019-107161GB-C32 and PID2019-108079GB-C22 is acknowledge

    Mapping and monitoring forest remnants : a multiscale analysis of spatio-temporal data

    Get PDF
    KEYWORDS : Landsat, time series, machine learning, semideciduous Atlantic forest, Brazil, wavelet transforms, classification, change detectionForests play a major role in important global matters such as carbon cycle, climate change, and biodiversity. Besides, forests also influence soil and water dynamics with major consequences for ecological relations and decision-making. One basic requirement to quantify and model these processes is the availability of accurate maps of forest cover. Data acquisition and analysis at appropriate scales is the keystone to achieve the mapping accuracy needed for development and reliable use of ecological models.The current and upcoming production of high-resolution data sets plus the ever-increasing time series that have been collected since the seventieth must be effectively explored. Missing values and distortions further complicate the analysis of this data set. Thus, integration and proper analysis is of utmost importance for environmental research. New conceptual models in environmental sciences, like the perception of multiple scales, require the development of effective implementation techniques.This thesis presents new methodologies to map and monitor forests on large, highly fragmented areas with complex land use patterns. The use of temporal information is extensively explored to distinguish natural forests from other land cover types that are spectrally similar. In chapter 4, novel schemes based on multiscale wavelet analysis are introduced, which enabled an effective preprocessing of long time series of Landsat data and improved its applicability on environmental assessment.In chapter 5, the produced time series as well as other information on spectral and spatial characteristics were used to classify forested areas in an experiment relating a number of combinations of attribute features. Feature sets were defined based on expert knowledge and on data mining techniques to be input to traditional and machine learning algorithms for pattern recognition, viz . maximum likelihood, univariate and multivariate decision trees, and neural networks. The results showed that maximum likelihood classification using temporal texture descriptors as extracted with wavelet transforms was most accurate to classify the semideciduous Atlantic forest in the study area.In chapter 6, a multiscale approach to digital change detection was developed to deal with multisensor and noisy remotely sensed images. Changes were extracted according to size classes minimising the effects of geometric and radiometric misregistration.Finally, in chapter 7, an automated procedure for GIS updating based on feature extraction, segmentation and classification was developed to monitor the remnants of semideciduos Atlantic forest. The procedure showed significant improvements over post classification comparison and direct multidate classification based on artificial neural networks.</p

    On-Line Learning and Wavelet-Based Feature Extraction Methodology for Process Monitoring using High-Dimensional Functional Data

    Get PDF
    The recent advances in information technology, such as the various automatic data acquisition systems and sensor systems, have created tremendous opportunities for collecting valuable process data. The timely processing of such data for meaningful information remains a challenge. In this research, several data mining methodology that will aid information streaming of high-dimensional functional data are developed. For on-line implementations, two weighting functions for updating support vector regression parameters were developed. The functions use parameters that can be easily set a priori with the slightest knowledge of the data involved and have provision for lower and upper bounds for the parameters. The functions are applicable to time series predictions, on-line predictions, and batch predictions. In order to apply these functions for on-line predictions, a new on-line support vector regression algorithm that uses adaptive weighting parameters was presented. The new algorithm uses varying rather than fixed regularization constant and accuracy parameter. The developed algorithm is more robust to the volume of data available for on-line training as well as to the relative position of the available data in the training sequence. The algorithm improves prediction accuracy by reducing uncertainty in using fixed values for the regression parameters. It also improves prediction accuracy by reducing uncertainty in using regression values based on some experts’ knowledge rather than on the characteristics of the incoming training data. The developed functions and algorithm were applied to feedwater flow rate data and two benchmark time series data. The results show that using adaptive regression parameters performs better than using fixed regression parameters. In order to reduce the dimension of data with several hundreds or thousands of predictors and enhance prediction accuracy, a wavelet-based feature extraction procedure called step-down thresholding procedure for identifying and extracting significant features for a single curve was developed. The procedure involves transforming the original spectral into wavelet coefficients. It is based on multiple hypothesis testing approach and it controls family-wise error rate in order to guide against selecting insignificant features without any concern about the amount of noise that may be present in the data. Therefore, the procedure is applicable for data-reduction and/or data-denoising. The procedure was compared to six other data-reduction and data-denoising methods in the literature. The developed procedure is found to consistently perform better than most of the popular methods and performs at the same level with the other methods. Many real-world data with high-dimensional explanatory variables also sometimes have multiple response variables; therefore, the selection of the fewest explanatory variables that show high sensitivity to predicting the response variable(s) and low sensitivity to the noise in the data is important for better performance and reduced computational burden. In order to select the fewest explanatory variables that can predict each of the response variables better, a two-stage wavelet-based feature extraction procedure is proposed. The first stage uses step-down procedure to extract significant features for each of the curves. Then, representative features are selected out of the extracted features for all curves using voting selection strategy. Other selection strategies such as union and intersection were also described and implemented. The essence of the first stage is to reduce the dimension of the data without any consideration for whether or not they can predict the response variables accurately. The second stage uses Bayesian decision theory approach to select some of the extracted wavelet coefficients that can predict each of the response variables accurately. The two stage procedure was implemented using near-infrared spectroscopy data and shaft misalignment data. The results show that the second stage further reduces the dimension and the prediction results are encouraging
    • …
    corecore