4 research outputs found

    Principal Component Regression Modelling with Variational Bayesian Approach to Overcome Multicollinearity at Various Levels of Missing Data Proportion

    Get PDF
    This study aims to model Principal Component Regression (PCR) using Variational Bayesian Principal Component Analysis (VBPCA) with Ordinary Least Square (OLS) as a method of estimating regression parameters to overcome multicollinearity at various levels of the proportion of missing data. The data used in this study are secondary data and simulation data contaminated with collinearity in the predictor variables with various missing data proportions of 1%, 5%, and 10%. The secondary data used is the Human Depth Index in Java in 2021, complete data without missing values. The results indicate that the multicollinearity in secondary and original data can be optimally overcome as indicated by the smaller standard error value of the regression parameter for the PCR using VBPCA method which is smaller and has a relative efficiency value of less than 1. VBPCA can handle the proportion of missing data to less than 10%. The proportion of missing data causes information from the original variable to decrease, as evidenced by immense MAPE value and the parameter estimation bias that gets bigger. Then the cross validation (Q^2 ) value and the coefficient of determination (adjusted R^2 ) are get smaller as the proportion of missing data increases.

    Extracting Common Mode Errors of Regional GNSS Position Time Series in the Presence of Missing Data by Variational Bayesian Principal Component Analysis

    No full text
    Removal of the common mode error (CME) is very important for the investigation of global navigation satellite systems’ (GNSS) error and the estimation of an accurate GNSS velocity field for geodynamic applications. The commonly used spatiotemporal filtering methods normally process the evenly spaced time series without missing data. In this article, we present the variational Bayesian principal component analysis (VBPCA) to estimate and extract CME from the incomplete GNSS position time series. The VBPCA method can naturally handle missing data in the Bayesian framework and utilizes the variational expectation-maximization iterative algorithm to search each principal subspace. Moreover, it could automatically select the optimal number of principal components for data reconstruction and avoid the overfitting problem. To evaluate the performance of the VBPCA algorithm for extracting CME, 44 continuous GNSS stations located in Southern California were selected. Compared to previous approaches, VBPCA could achieve better performance with lower CME relative errors when more missing data exists. Since the first principal component (PC) extracted by VBPCA is remarkably larger than the other components, and its corresponding spatial response presents nearly uniform distribution, we only use the first PC and its eigenvector to reconstruct the CME for each station. After filtering out CME, the interstation correlation coefficients are significantly reduced from 0.43, 0.46, and 0.38 to 0.11, 0.10, and 0.08, for the north, east, and up (NEU) components, respectively. The root mean square (RMS) values of the residual time series and the colored noise amplitudes for the NEU components are also greatly suppressed, with average reductions of 27.11%, 28.15%, and 23.28% for the former, and 49.90%, 54.56%, and 49.75% for the latter. Moreover, the velocity estimates are more reliable and precise after removing CME, with average uncertainty reductions of 51.95%, 57.31%, and 49.92% for the NEU components, respectively. All these results indicate that the VBPCA method is an alternative and efficient way to extract CME from regional GNSS position time series in the presence of missing data. Further work is still required to consider the effect of formal errors on the CME extraction during the VBPCA implementation

    GPS Studies of Subtle Deformation Signals in the Western United States

    Get PDF
    In the past fifteen years, the network of Global Positioning System (GPS) stations in the western United States has dramatically expanded, greatly improving the spatial resolution at which we can resolve geophysical signals. This is particularly important for areas such as the Basin and Range, where data limitations prevented substantial analysis in the past. In addition to improved network geometries, many robust data analysis techniques have been produced, and revised reference frames and data processing strategies have greatly improved data quality. While these advancements have expanded our understanding of long term tectonics in the western United States, they also provide the opportunity to robustly investigate temporally variable, subtle deformation signals. Many of these signals were previously below uncertainty levels of the data, or station coverage was too sparse. The research presented in this dissertation takes advantage of this progress, to advance our understanding of the interaction of subtle deformation signals within the western United States, across a range of spatio-temporal scales. The first study investigates drought induced deformation observed at GPS stations near the Great Salt Lake (GSL), in Utah, between 2012 and 2016. During this time, GPS timeseries show a subtle, but distinct, three-dimensional change in trend, with horizontal motion away from the lake and vertical uplift centered upon it. Concurrently, GSL lost a total of 1.89 m of surface elevation. Previous hydrologic studies have typically only used vertical GPS displacements to quantify load variation over broad, regional scales. Here, we find that at small spatial scales, three-dimensional GPS is sensitive to not just the unloading of the lake, but the nearby groundwater as well. In our preferred model, the volume lost by GSL is equivalent to that observed, at 5.5 ± 1.0 km3, and the inferred groundwater is substantial at 10.9 ± 2.8 km3. Seismicity is modulated by the hydrologic cycle within the inferred load region, revealing increased earthquake rates during drier periods as stresses on faults under the loads are reduced. This study highlights the impact of subtle, multi-year, drought signals on GPS time series, and indicates, that for robust regional analyses, small scale hydrologic loading must be accounted for. In the second study, we focus on correcting subtle deformation signals within the central Basin and Range, and produce the most robust interseismic velocity field of the region to date. Since deformation rates are low, the combined corrections produced in this study for postseismic deformation, hydrologic loading, and regional common mode error, substantially alter the velocity field and resulting strain rates. Station uncertainties reduce by 62.1% and 53.8% in the east and north components, compared to the original velocity field. The Pahranagat Shear Zone is strongly affected by postseismic relaxation, which accounts for as much as half the shear along its western extent. We find that east–west extension across the Las Vegas Valley is substantially larger than previously estimated at 0.5 – 0.6 mm/yr and our preferred strain rates within the Las Vegas Valley are 8.5 ± 2.4 x10−9yr−1, supporting that crustal deformation is active within the urban area of Las Vegas. These results show in detail the significant impact that subtle deformation signals impart on regional analyses and their interpretation. Positioning errors present in five-minute GPS time series propagate subtly into the daily position of the station. In the final study, we produce a sensitivity analysis of the zenith tropospheric delay (ZTD) random walk constraint, and show that station vertical scatter can be greatly improved by loosening its value. We find that large wavelike displacements of ~100 mm, which occurred along the coast of California during Winter Storm Ezekiel in 2019, are suppressed when using a random walk constraint of 24 mm/√(hr) (i.e., eight times looser than the default value). Global station RMS and repeatability shows improvements of 4% – 9% and 10% – 21% respectively, when using uniform random walk constraints of 6 – 12 mm/√(hr). Further improvement is attained when defining characteristic random walk constraints to the stations, with a 10% improvement in repeatability globally. A daily optimal random walk approach reveals 24% improvement in global station repeatability. These findings reveal an opportunity to greatly improve five-minute vertical positioning, not just for stations in the western United States during storms, but for the global GPS network as a whole, by loosening the ZTD random walk constraint at least to a value of 6 mm/√(hr)
    corecore