51 research outputs found

    Choice of the ridge factor from the correlation matrix determinant

    Get PDF
    Ridge regression is the alternative method to ordinary least squares, which is mostly applied when a multiple linear regression model presents a worrying degree of collinearity. A relevant topic in ridge regression is the selection of the ridge parameter, and different proposals have been presented in the scientific literature. Since the ridge estimator is biased, its estimation is normally based on the calculation of the mean square error (MSE) without considering (to the best of our knowledge) whether the proposed value for the ridge parameter really mitigates the collinearity. With this goal and different simulations, this paper proposes to estimate the ridge parameter from the determinant of the matrix of correlation of the data, which verifies that the variance inflation factor (VIF) is lower than the traditionally established threshold. The possible relation between the VIF and the determinant of the matrix of correlation is also analysed. Finally, the contribution is illustrated with three real examples

    From Points to Probability Measures: Statistical Learning on Distributions with Kernel Mean Embedding

    Get PDF
    The dissertation presents a novel learning framework on probability measures which has abundant real-world applications. In classical setup, it is assumed that the data are points that have been drawn independent and identically (i.i.d.) from some unknown distribution. In many scenarios, however, representing data as distributions may be more preferable. For instance, when the measurement is noisy, we may tackle the uncertainty by treating the data themselves as distributions, which is often the case for microarray data and astronomical data where the measurement process is imprecise and replication is often required. Distributions not only embody individual data points, but also constitute information about their interactions which can be beneficial for structural learning in high-energy physics, cosmology, causality, and so on. Moreover, classical problems in statistics such as statistical estimation, hypothesis testing, and causal inference, may be interpreted in a decision-theoretic sense as machine learning problems on empirical distributions. Rephrasing these problems as such leads to novel approach for statistical inference and estimation. Hence, allowing learning algorithms to operate directly on distributions prompts a wide range of future applications. To work with distributions, the key methodology adopted in this thesis is the kernel mean embedding of distributions which represents each distribution as a mean function in a reproducing kernel Hilbert space (RKHS). In particular, the kernel mean embedding has been applied successfully in two-sample testing, graphical model, and probabilistic inference. On the other hand, this thesis will focus mainly on the predictive learning on distributions, i.e., when the observations are distributions and the goal is to make prediction about the previously unseen distributions. More importantly, the thesis investigates kernel mean estimation which is one of the most fundamental problems of kernel methods. Probability distributions, as opposed to data points, constitute information at a higher level such as aggregate behavior of data points, how the underlying process evolves over time and domains, and a complex concept that cannot be described merely by individual points. Intelligent organisms have the ability to recognize and exploit such information naturally. Thus, this work may shed light on future development of intelligent machines, and most importantly, may provide clues on the true meaning of intelligence

    Modeling spatial variability and transport processes in a glacial till soil

    Get PDF
    The objective of this study was to investigate the spatial variability of different physical and chemical properties of soils, and their role on transport processes of chemicals to groundwater sources in a glacial till soil of central Iowa;Measurements were made to determine the in-situ saturated hydraulic conductivity (K[subscript] sat) of a glacial till soil at sixty six sites in a tillage established plot. One hundred thirty-two data points on K[subscript] sat collected at two soil depths along two bisecting perpendicular transects, were used to develop semivariogram models, in conjunction with split-window median polish approach. Nested structure with an overall range of 60 m was found for the K[subscript] sat at a depth of 30 cm below the soil surface. K[subscript] sat values at 15 cm depth were found as structureless random noise;Another data set was collected on nitrate-nitrogen (NO[subscript]3-N) concentration in soil water, soil moisture content, and soil profile NO[subscript]3-N content in the same field under two different tillage practices using a different sampling pattern. Data on NO[subscript]3-N concentration in the soil water collected at 175 grid points arranged on a three dimensional (3-D) grid, were compared for spatial distribution patterns as function of the tillage system. Results of this study indicated transitional spatial structure of NO[subscript]3-N distribution, both in vertical and horizontal directions, under conventional tillage. In contrast, nugget and linear type semivariograms were observed for the no tillage system, in the vertical and horizontal directions, respectively;Data on soil moisture content, NO[subscript]3-N concentration in soil water, and soil total NO[subscript]3-N contents in the soil profile collected at five soil depths (30, 60, 90, 120, and 150 cm below the ground surface) in a tile drained plot were studied for coregionalization. This study indicated that well-structured cross semivariograms existed between depths of 60 and 90 cm, and 90 and 120 cm, for NO[subscript]3-N concentration and soil moisture content. Strong negative correlation between soil moisture content and NO[subscript]3-N concentration resulted in negative cross-semivariograms at 90 and 120 cm depths;A deterministic simulation model was developed with an effective hydraulic conductivity parameter based on the spatial correlation length in place of an average hydraulic conductivity parameter, to simulate the major water and nitrate transport processes for predicting the NO[subscript]3-N losses to subsurface drainage systems. (Abstract shortened by UMI.

    Optimal Fusion Estimation with Multi-Step Random Delays and Losses in Transmission

    Get PDF
    This paper is concerned with the optimal fusion estimation problem in networked stochastic systems with bounded random delays and packet dropouts, which unavoidably occur during the data transmission in the network. The measured outputs from each sensor are perturbed by random parameter matrices and white additive noises, which are cross-correlated between the different sensors. Least-squares fusion linear estimators including filter, predictor and fixed-point smoother, as well as the corresponding estimation error covariance matrices are designed via the innovation analysis approach. The proposed recursive algorithms depend on the delay probabilities at each sampling time, but do not to need to know if a particular measurement is delayed or not. Moreover, the knowledge of the signal evolution model is not required, as the algorithms need only the first and second order moments of the processes involved. Some of the practical situations covered by the proposed system model with random parameter matrices are analyzed and the influence of the delays in the estimation accuracy are examined in a numerical example.This research is supported by the “Ministerio de Economía y Competitividad” and “Fondo Europeo de Desarrollo Regional” FEDER (Grant No. MTM2014-52291-P)

    Improving the predictability of the oil–US stock nexus: The role of macroeconomic variables

    Get PDF
    In this study, we revisit the oil–stock nexus by accounting for the role of macroeconomic variables and testing their in-sample and out-of-sample predictive powers. We follow the approaches of Lewellen (2004) and Westerlund and Narayan (2015), which were formulated into a linear multi-predictive form by Makin et al. (2014) and Salisu et al. (2018) and a nonlinear multi-predictive model by Salisu and Isah (2018). Thereafter, we extend the multi-predictive model to account for structural breaks and asymmetries. Our analyses are conducted on aggregate and sectoral stock price indexes for the US stock market. Our proposed predictive model, which accounts for macroeconomic variables, outperforms the oil-based single-factor variant in forecasting aggregate and sectoral US stocks for both in-sample and out-of-sample forecasts. We find that it is important to account for structural breaks in our proposed predictive model, although asymmetries do not seem to improve predictability. In addition, we show that it is important to pre-test the predictors for persistence, endogeneity, and conditional heteroscedasticity, particularly when modeling with high-frequency series. Our results are robust to different forecast measures and forecast horizons

    Modeling and Optimization of Stochastic Process Parameters in Complex Engineering Systems

    Get PDF
    For quality engineering researchers and practitioners, a wide number of statistical tools and techniques are available for use in the manufacturing industry. The objective or goal in applying these tools has always been to improve or optimize a product or process in terms of efficiency, production cost, or product quality. While tremendous progress has been made in the design of quality optimization models, there remains a significant gap between existing research and the needs of the industrial community. Contemporary manufacturing processes are inherently more complex - they may involve multiple stages of production or require the assessment of multiple quality characteristics. New and emerging fields, such as nanoelectronics and molecular biometrics, demand increased degrees of precision and estimation, that which is not attainable with current tools and measures. And since most researchers will focus on a specific type of characteristic or a given set of conditions, there are many critical industrial processes for which models are not applicable. Thus, the objective of this research is to improve existing techniques by not only expanding their range of applicability, but also their ability to more realistically model a given process. Several quality models are proposed that seek greater precision in the estimation of the process parameters and the removal of assumptions that limit their breadth and scope. An extension is made to examine the effectiveness of these models in both non-standard conditions and in areas that have not been previously investigated. Upon the completion of an in-depth literature review, various quality models are proposed, and numerical examples are used to validate the use of these methodologies

    Study of the Kalman filter for arrhythmia detection with intracardiac electrograms

    Get PDF
    Third generation implantable antitachycardia devices offer tiered-therapy to reverse ventricular fibrillation (VF) by defibrillation and ventricular tachycardia (VT) by low-energy cardioversion or antitachycardia pacing. The schemes for detecting cardiac arrhythmias often realize nonpathologic tachycardia as serious arrhythmias and deliver false shocks. In this study, an arrhythmia classification technique has been developed with the use of Kalman filter applied on cyclostationary autoregressive model. This new algorithm was developed with a training set of 24 arrhythmia passages and tested on a different data set of 29 arrhythmia passages. The algorithm provides 100% detection of VF on the test set. 77.8% of VTs were detected correctly while 16.7% of VTs were diagnosed as sinus rhythm and 5.5% of VTs were detected as VF

    Models for calculating confidence intervals for neural networks

    Get PDF
    This research focused on coding and analyzing existing models to calculate confidence intervals on the results of neural networks. The three techniques for determining confidence intervals determination were the non-linear regression, the bootstrapping estimation, and the maximum likelihood estimation. Confidence intervals for non-linear regression, bootstrap estimation, and maximum likelihood were coded in Visual Basic. The neural network used the backpropagation algorithm with an input layer, one hidden layer and an output layer with one unit. The hidden layer had a logistic or binary sigmoidal activation function and the output layer had a linear activation function. These techniques were tested on various data sets with and without additional noise. Out of eight cases studied, non-linear regression and bootstrapping each had the four lowest values for the average coverage probability minus the nominal probability. For the average coverage probabilities minus the nominal probabilities of all data sets, the bootstrapping estimation obtained the lowest values. The ranges and standard deviations of the coverage probabilities over 15 simulations for the three techniques were computed, and it was observed that the non-linear regression obtained consistent results with the least range and standard deviation, and bootstrapping had the largest ranges and standard deviations. The bootstrapping estimation technique gave a slightly better average coverage probability (CP) minus nominal values than the non-linear regression method, but it had considerably more variation in individual simulations. The maximum likelihood estimation had the poorest results with respect to the average CP minus nominal values

    Data-driven modeling and monitoring of fuel cell performance

    Get PDF
    A mathematical framework that provides practical guidelines for user adoption is proposed for fuel cell performance evaluation. By leveraging the mathematical framework, two measures that describe the average and worst-case performance are presented. To facilitate the computation of the performance measures in a practical setting, we model the distribution of the voltages at different current points as a Gaussian process. Then the minimum number of samples needed to estimate the performance measures is obtained using information-theoretic notions. Furthermore, we introduce a sensing algorithm that finds the current points that are maximally informative about the voltage. Observing the voltages at the points identified by the proposed algorithm enables the user to estimate the voltages at the unobserved points. The proposed performance measures and the corresponding results are validated on a fuel cell dataset provided by an industrial user whose conclusion coincides with the judgement from the fuel cell manufacturer

    An empirical comparison of the performance of alternative option pricing models

    Get PDF
    Published as an article in: Investigaciones Economicas, 2005, vol. 29, issue 3, pages 483-523.This paper presents a comparison of alternative option pricing models based neither on jump-diffusion nor stochastic volatility data generating processes. We assume either a smooth volatility function of some previously defined explanatory variables or a model in which discrete-based observations can be employed to estimate both path-dependence volatility and the negative correlation between volatility and underlying returns. Moreover, we also allow for liquidity frictions to recognize that underlying markets may not be fully integrated. The simplest models tend to present a superior out-of sample performance and a better hedging ability, although the model with liquidity costs seems to display better in-sample behavior. However, none of the models seems to be able to capture the rapidly changing distribution of the underlying index return or the net buying pressure characterizing option markets.Eva Ferreira and Gonzalo Rubio acknowledge the financial support provided by Ministerio de Ciencia y Tecnología grant BEC2001-0636
    corecore