275 research outputs found
Bounds of star discrepancy for HSFC-based sampling
In this paper, we focus on estimating the probabilistic upper bounds of star
discrepancy for Hilbert space filling curve (HSFC) sampling. The main idea is
the stratified random sampling method, but the strict condition for sampling
number of jittered sampling is removed. We inherit the advantages of
this sampling and get better results than Monte Carlo (MC) sampling.Comment: 10 page
Scalable visualisation methods for modern Generalized Additive Models
In the last two decades the growth of computational resources has made it
possible to handle Generalized Additive Models (GAMs) that formerly were too
costly for serious applications. However, the growth in model complexity has
not been matched by improved visualisations for model development and results
presentation. Motivated by an industrial application in electricity load
forecasting, we identify the areas where the lack of modern visualisation tools
for GAMs is particularly severe, and we address the shortcomings of existing
methods by proposing a set of visual tools that a) are fast enough for
interactive use, b) exploit the additive structure of GAMs, c) scale to large
data sets and d) can be used in conjunction with a wide range of response
distributions. All the new visual methods proposed in this work are implemented
by the mgcViz R package, which can be found on the Comprehensive R Archive
Network
A note on the spectral analysis of matrix sequences via GLT momentary symbols: from all-at-once solution of parabolic problems to distributed fractional order matrices
The first focus of this paper is the characterization of the spectrum and the
singular values of the coefficient matrix stemming from the discretization with
space-time grid for a parabolic diffusion problem and from the approximation of
distributed order fractional equations. For this purpose we will use the
classical GLT theory and the new concept of GLT momentary symbols. The first
permits to describe the singular value or eigenvalue asymptotic distribution of
the sequence of the coefficient matrices, the latter permits to derive a
function, which describes the singular value or eigenvalue distribution of the
matrix of the sequence, even for small matrix-sizes but under given
assumptions. The note is concluded with a list of open problems, including the
use of our machinery in the study of iteration matrices, especially those
concerning multigrid-type techniques
A note on the spectral analysis of matrix sequences via GLT momentary symbols: from all-at-once solution of parabolic problems to distributed fractional order matrices
The first focus of this paper is the characterization of the spectrum and the singular values of the coefficient matrix stemming from the discretization of a parabolic diffusion problem using a space-time grid and secondly from the approximation of distributed-order fractional equations. For this purpose we use the classical GLT theory and the new concept of GLT momentary symbols. The first permits us to describe the singular value or eigenvalue asymptotic distribution of the sequence of the coefficient matrices. The latter permits us to derive a function that describes the singular value or eigenvalue distribution of the matrix of the sequence, even for small matrix sizes, but under given assumptions. The paper is concluded with a list of open problems, including the use of our machinery in the study of iteration matrices, especially those concerning multigrid-type techniques
The Effects of Additive Outliers and Measurement Errors when Testing for Structural Breaks in Variance
This paper discusses the asymptotic and finite-sample properties of CUSUM-based tests for detecting structural breaks in volatility in the presence of stochastic contamination, such as additive outliers or measurement errors. This analysis is particularly relevant for financial data, on which these tests are commonly used to detect variance breaks. In particular, we focus on the tests by InclĂĄn and Tiao [IT] (1994) and Kokoszka and Leipus [KL] (1998, 2000), which have been intensively used in the applied literature. Our results are extensible to related procedures. We show that the asymptotic distribution of the IT test can largely be affected by sample contamination, whereas the distribution of the KL test remains invariant. Furthermore, the break-point estimator of the KL test renders consistent estimates. In spite of the good large-sample properties of this test, large additive outliers tend to generate power distortions or wrong break-date estimates in small samples.
Conditioned Export-Led Growth Hypothesis: A Panel Threshold Regressions Approach
This paper proposes a reassessment of the export-led growth hypothesis focusing on conditioning effects from countries initial level of GDP per worker, human capital stock, and exports share in GDP. For this purpose a panel threshold regression technique was applied over selected cross-country panel data, covering a broad sample of 72 countries and two sub-samples over the period from 1974 to 2003. Special attention was given to the 5-years data averaging procedure, using panel unit root tests, and to the variables measures choice, where a sensitivity analysis is proposed. Overall, the evidence reported favors the export-led growth hypothesis, where the relationship between exports and growth was showed to be not as trivial as linear specifications would indicate.Export-led growth, panel threshold regressions, trade and growth
Prediction of Atomization Energy Using Graph Kernel and Active Learning
Data-driven prediction of molecular properties presents unique challenges to
the design of machine learning methods concerning data
structure/dimensionality, symmetry adaption, and confidence management. In this
paper, we present a kernel-based pipeline that can learn and predict the
atomization energy of molecules with high accuracy. The framework employs
Gaussian process regression to perform predictions based on the similarity
between molecules, which is computed using the marginalized graph kernel. To
apply the marginalized graph kernel, a spatial adjacency rule is first employed
to convert molecules into graphs whose vertices and edges are labeled by
elements and interatomic distances, respectively. We then derive formulas for
the efficient evaluation of the kernel. Specific functional components for the
marginalized graph kernel are proposed, while the effect of the associated
hyperparameters on accuracy and predictive confidence are examined. We show
that the graph kernel is particularly suitable for predicting extensive
properties because its convolutional structure coincides with that of the
covariance formula between sums of random variables. Using an active learning
procedure, we demonstrate that the proposed method can achieve a mean absolute
error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7
data set
Dynamic Graphics and Reporting for Statistics
Statistics as a scientific discipline has a dynamic nature, which can be
observed in many statistical algorithms and theories as well as in data
analysis. For example, asymptotic theories in statistics are inherently
dynamic: they describe how a statistic or an estimator behaves as the sample
size increases. Data analysis is almost never a static process. Instead, it
is an iterative process involving cleaning, describing, modeling, and
re-cleaning the data. Reports may end up being re-written due to changes in
the data and analysis.
This thesis consists of three parts, addressing the dynamic aspects of
statistics and data analysis. In the first part, we show how to explain the
ideas behind some statistical methods using animations, followed by an
introduction to the design and functionality of the animation package. In
the second part, we discuss the design of an interactive statistical
graphics system, with an emphasis on the reactive programming paradigm and
its connection with the data infrastructure in R, as utilized in the cranvas
package. In the third part, we provide a solution to statistical reporting,
which is implemented in the knitr package, making use of literate
programming. It frees us from the traditional approach of cut-and-paste, and
provides a seamless integration of computing and reporting that enhances
reproducible research. Demos and examples were given along with the
discussion
- âŠ