275 research outputs found

    Bounds of star discrepancy for HSFC-based sampling

    Full text link
    In this paper, we focus on estimating the probabilistic upper bounds of star discrepancy for Hilbert space filling curve (HSFC) sampling. The main idea is the stratified random sampling method, but the strict condition for sampling number N=mdN=m^d of jittered sampling is removed. We inherit the advantages of this sampling and get better results than Monte Carlo (MC) sampling.Comment: 10 page

    Scalable visualisation methods for modern Generalized Additive Models

    Full text link
    In the last two decades the growth of computational resources has made it possible to handle Generalized Additive Models (GAMs) that formerly were too costly for serious applications. However, the growth in model complexity has not been matched by improved visualisations for model development and results presentation. Motivated by an industrial application in electricity load forecasting, we identify the areas where the lack of modern visualisation tools for GAMs is particularly severe, and we address the shortcomings of existing methods by proposing a set of visual tools that a) are fast enough for interactive use, b) exploit the additive structure of GAMs, c) scale to large data sets and d) can be used in conjunction with a wide range of response distributions. All the new visual methods proposed in this work are implemented by the mgcViz R package, which can be found on the Comprehensive R Archive Network

    A note on the spectral analysis of matrix sequences via GLT momentary symbols: from all-at-once solution of parabolic problems to distributed fractional order matrices

    Full text link
    The first focus of this paper is the characterization of the spectrum and the singular values of the coefficient matrix stemming from the discretization with space-time grid for a parabolic diffusion problem and from the approximation of distributed order fractional equations. For this purpose we will use the classical GLT theory and the new concept of GLT momentary symbols. The first permits to describe the singular value or eigenvalue asymptotic distribution of the sequence of the coefficient matrices, the latter permits to derive a function, which describes the singular value or eigenvalue distribution of the matrix of the sequence, even for small matrix-sizes but under given assumptions. The note is concluded with a list of open problems, including the use of our machinery in the study of iteration matrices, especially those concerning multigrid-type techniques

    A note on the spectral analysis of matrix sequences via GLT momentary symbols: from all-at-once solution of parabolic problems to distributed fractional order matrices

    Get PDF
    The first focus of this paper is the characterization of the spectrum and the singular values of the coefficient matrix stemming from the discretization of a parabolic diffusion problem using a space-time grid and secondly from the approximation of distributed-order fractional equations. For this purpose we use the classical GLT theory and the new concept of GLT momentary symbols. The first permits us to describe the singular value or eigenvalue asymptotic distribution of the sequence of the coefficient matrices. The latter permits us to derive a function that describes the singular value or eigenvalue distribution of the matrix of the sequence, even for small matrix sizes, but under given assumptions. The paper is concluded with a list of open problems, including the use of our machinery in the study of iteration matrices, especially those concerning multigrid-type techniques

    The Effects of Additive Outliers and Measurement Errors when Testing for Structural Breaks in Variance

    Get PDF
    This paper discusses the asymptotic and finite-sample properties of CUSUM-based tests for detecting structural breaks in volatility in the presence of stochastic contamination, such as additive outliers or measurement errors. This analysis is particularly relevant for financial data, on which these tests are commonly used to detect variance breaks. In particular, we focus on the tests by InclĂĄn and Tiao [IT] (1994) and Kokoszka and Leipus [KL] (1998, 2000), which have been intensively used in the applied literature. Our results are extensible to related procedures. We show that the asymptotic distribution of the IT test can largely be affected by sample contamination, whereas the distribution of the KL test remains invariant. Furthermore, the break-point estimator of the KL test renders consistent estimates. In spite of the good large-sample properties of this test, large additive outliers tend to generate power distortions or wrong break-date estimates in small samples.

    Conditioned Export-Led Growth Hypothesis: A Panel Threshold Regressions Approach

    Get PDF
    This paper proposes a reassessment of the export-led growth hypothesis focusing on conditioning effects from countries initial level of GDP per worker, human capital stock, and exports share in GDP. For this purpose a panel threshold regression technique was applied over selected cross-country panel data, covering a broad sample of 72 countries and two sub-samples over the period from 1974 to 2003. Special attention was given to the 5-years data averaging procedure, using panel unit root tests, and to the variables measures choice, where a sensitivity analysis is proposed. Overall, the evidence reported favors the export-led growth hypothesis, where the relationship between exports and growth was showed to be not as trivial as linear specifications would indicate.Export-led growth, panel threshold regressions, trade and growth

    Prediction of Atomization Energy Using Graph Kernel and Active Learning

    Get PDF
    Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effect of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7 data set

    Dynamic Graphics and Reporting for Statistics

    Get PDF
    Statistics as a scientific discipline has a dynamic nature, which can be observed in many statistical algorithms and theories as well as in data analysis. For example, asymptotic theories in statistics are inherently dynamic: they describe how a statistic or an estimator behaves as the sample size increases. Data analysis is almost never a static process. Instead, it is an iterative process involving cleaning, describing, modeling, and re-cleaning the data. Reports may end up being re-written due to changes in the data and analysis. This thesis consists of three parts, addressing the dynamic aspects of statistics and data analysis. In the first part, we show how to explain the ideas behind some statistical methods using animations, followed by an introduction to the design and functionality of the animation package. In the second part, we discuss the design of an interactive statistical graphics system, with an emphasis on the reactive programming paradigm and its connection with the data infrastructure in R, as utilized in the cranvas package. In the third part, we provide a solution to statistical reporting, which is implemented in the knitr package, making use of literate programming. It frees us from the traditional approach of cut-and-paste, and provides a seamless integration of computing and reporting that enhances reproducible research. Demos and examples were given along with the discussion
    • 

    corecore