1,174 research outputs found

    Bandwidth Selection for Multivariate Kernel Density Estimation Using MCMC

    Get PDF
    Kernel density estimation for multivariate data is an important technique that has a wide range of applications in econometrics and finance. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal data-driven bandwidth as the dimension of data increases. We provide Markov chain Monte Carlo (MCMC) algorithms for estimating optimal bandwidth matrices for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters whose posterior density can be obtained through the likelihood cross-validation criterion. Numerical studies for bivariate data show that the MCMC algorithm generally performs better than the plug-in algorithm under the Kullback-Leibler information criterion, and is as good as the plug-in algorithm under the mean integrated squared errors (MISE) criterion. Numerical studies for 5 dimensional data show that our algorithm is superior to the normal reference rule. Our MCMC algorithm is the first data-driven bandwidth selector for kernel density estimation with more than two variables, and the sampling algorithm involves no increased difficulty as the dimension of data increaseBandwidth matrices; Cross-validation; Kullback-Leibler information; mean integrated squared errors; Sampling algorithms.

    Bandwidth Selection for Multivariate Kernel Density Estimation Using MCMC

    Get PDF
    We provide Markov chain Monte Carlo (MCMC) algorithms for computing the bandwidth matrix for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters to be estimated, which we do by optimizing the likelihood cross-validation criterion. Numerical results show that the resulting bandwidths are superior to all existing methods; for dimensions greater than two, our algorithm is the first practical method for estimating the optimal bandwidth matrix. Moreover, the MCMC algorithm for bandwidth selection for multivariate data has no increased difficulty as the dimension of data increases.Bandwidth selection, cross-validation, multivariate kernel density estimation, sampling algorithms.

    Local Linear Forecasts Using Cubic Smoothing Splines

    Get PDF
    We show how cubic smoothing splines fitted to univariate time series data can be used to obtain local linear forecasts. Our approach is based on a stochastic state space model which allows the use of a likelihood approach for estimating the smoothing parameter, and which enables easy construction of prediction intervals. We show that our model is a special case of an ARIMA(0,2,2) model and we provide a simple upper bound for the smoothing parameter to ensure an invertible model. We also show that the spline model is not a special case of Holt's local linear trend method. Finally we compare the spline forecasts with Holt's forecasts and those obtained from the full ARIMA(0,2,2) model, showing that the restricted parameter space does not impair forecast performance.ARIMA models; exponential smoothing; Holt's local linear forecasts; maximum likelihood estimation; nonparametric regression; smoothing splines; state space model, stochastic trends.

    Organizational Chart Inference

    Full text link
    Nowadays, to facilitate the communication and cooperation among employees, a new family of online social networks has been adopted in many companies, which are called the "enterprise social networks" (ESNs). ESNs can provide employees with various professional services to help them deal with daily work issues. Meanwhile, employees in companies are usually organized into different hierarchies according to the relative ranks of their positions. The company internal management structure can be outlined with the organizational chart visually, which is normally confidential to the public out of the privacy and security concerns. In this paper, we want to study the IOC (Inference of Organizational Chart) problem to identify company internal organizational chart based on the heterogeneous online ESN launched in it. IOC is very challenging to address as, to guarantee smooth operations, the internal organizational charts of companies need to meet certain structural requirements (about its depth and width). To solve the IOC problem, a novel unsupervised method Create (ChArT REcovEr) is proposed in this paper, which consists of 3 steps: (1) social stratification of ESN users into different social classes, (2) supervision link inference from managers to subordinates, and (3) consecutive social classes matching to prune the redundant supervision links. Extensive experiments conducted on real-world online ESN dataset demonstrate that Create can perform very well in addressing the IOC problem.Comment: 10 pages, 9 figures, 1 table. The paper is accepted by KDD 201

    Assessing the effects of exposure timing on biomarker expression using β-estradiol

    Get PDF
    Temporal and spatial variability in estrogenicity has been documented formanytreated wastewater effluents with the consequences of this variability on the expression of biomarkers of endocrine disruption being largely unknown. Laboratory exposure studies usually utilize constant exposure concentrations which may produce biological effects that differ from those observed in organisms exposed in natural environments. In this study, we investigated the effects of differential timing of exposures with 17β - estradiol (E2) on a range of fathead minnow biomarkers to simulate diverse environmentally relevant exposure profiles. Two 21-day, replicate experiments were performed exposing mature male fathead minnows to E2 at time-weighted mean concentrations (similar average exposure to the contaminant during the 21-day exposure period; 17 ng E2/L experiment 1; 12 ng E2/L experiment 2) comparable to E2 equivalency values (EEQ) reported for several anthropogenically altered environments. A comparable time-weighted mean concentration of E2 was applied to five treatments which varied in the daily application schema: E2 was either applied at a steady rate (ST), in a gradual decreasing concentration (HI), a gradual increasing concentration (LO), applied intermittently (IN), or at a randomly varying concentration (VA). We assessed a range of widely used physiological (vitellogenin mRNA induction and plasma concentrations), anatomical (body and organ indices, secondary sex characteristics, and histopathology), and behavioral (nest holding) biomarkers reported to change following exposure to endocrine active compounds (EACs). All treatments responded with a rise in plasma vitellogenin concentration when compared with the ethanol carrier control. Predicatively, vitellogenin mRNA induction, which tracked closely with plasma vitellogenin concentrations in most treatments was not elevated in the HI treatment, presumably due to the lack of E2 exposure immediately prior to analysis. The ability of treatment male fish to hold nest sites in direct competition with control males was sensitive to E2 exposure and did yield statistically significant differences between treatments and carrier control. Other biological endpoints assessed in this study (organosomatic indices, secondary sex characteristics) varied little between treatments and controls. This study indicates that a broad suite of endpoints is necessary to fully assess the biological consequences of fish exposure to estrogens and that for at least field studies, a combination of vitellogenin mRNA and plasma vitellogenin analysis are most promising in deciphering exposure histories of wild-caught and caged fishes

    Temperature time series forecasting in The Optimal Challenges in Irrigation (TO CHAIR)

    Get PDF
    Predicting and forecasting weather time series has always been a difficult field of research analysis with a very slow progress rate over the years. The main challenge in this project—The Optimal Challenges in Irrigation (TO CHAIR)—is to study how to manage irrigation problems as an optimal control problem: the daily irrigation problem of minimizing water consumption. For that it is necessary to estimate and forecast weather variables in real time in each monitoring area of irrigation. These time series present strong trends and high-frequency seasonality. How to best model and forecast these patterns has been a long-standing issue in time series analysis. This study presents a comparison of the forecasting performance of TBATS (Trigonometric Seasonal, Box-Cox Transformation, ARMA errors, Trend and Seasonal Components) and regression with correlated errors models. These methods are chosen due to their ability to model trend and seasonal fluctuations present in weather data, particularly in dealing with time series with complex seasonal patterns (multiple seasonal patterns). The forecasting performance is demonstrated through a case study of weather time series: minimum air temperature.publishe

    Stochastic Feedback and the Regulation of Biological Rhythms

    Full text link
    We propose a general approach to the question of how biological rhythms spontaneously self-regulate, based on the concept of ``stochastic feedback''. We illustrate this approach by considering the neuroautonomic regulation of the heart rate. The model generates complex dynamics and successfully accounts for key characteristics of cardiac variability, including the 1/f1/f power spectrum, the functional form and scaling of the distribution of variations, and correlations in the Fourier phases. Our results suggest that in healthy systems the control mechanisms operate to drive the system away from extreme values while not allowing it to settle down to a constant output.Comment: 15 pages, latex2e using rotate and epsf, with 4 ps figures. Submitted to PR

    The substantive and practical significance of citation impact differences between institutions: Guidelines for the analysis of percentiles using effect sizes and confidence intervals

    Full text link
    In our chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual's test scores - intelligence tests for example - by comparing them with the percentiles of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (American Psychological Association, 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (2012) we show how examinations of effect sizes (e.g. Cohen's d statistic) and confidence intervals can lead to a clear understanding of citation impact differences
    • …
    corecore