4 research outputs found
Conformalized Frequency Estimation from Sketched Data
A flexible conformal inference method is developed to construct confidence
intervals for the frequencies of queried objects in a very large data set,
based on the information contained in a much smaller sketch of those data. The
approach is completely data-adaptive and makes no use of any knowledge of the
population distribution or of the inner workings of the sketching algorithm;
instead, it constructs provably valid frequentist confidence intervals under
the sole assumption of data exchangeability. Although the proposed solution is
much more broadly applicable, this paper explicitly demonstrates its use in
combination with the famous count-min sketch algorithm and a non-linear
variation thereof to facilitate the exposition. The performance is compared to
that of existing frequentist and Bayesian alternatives through several
experiments with synthetic data as well as with real data sets consisting of
SARS-CoV-2 DNA sequences and classic English literature.Comment: 29 pages, 20 figures, 2 table
Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability
A flexible method is developed to construct a confidence interval for the
frequency of a queried object in a very large data set, based on a much smaller
sketch of the data. The approach requires no knowledge of the data distribution
or of the details of the sketching algorithm; instead, it constructs provably
valid frequentist confidence intervals for random queries using a conformal
inference approach. After achieving marginal coverage for random queries under
the assumption of data exchangeability, the proposed method is extended to
provide stronger inferences accounting for possibly heterogeneous frequencies
of different random queries, redundant queries, and distribution shifts. While
the presented methods are broadly applicable, this paper focuses on use cases
involving the count-min sketch algorithm and a non-linear variation thereof, to
facilitate comparison to prior work. In particular, the developed methods are
compared empirically to frequentist and Bayesian alternatives, through
simulations and experiments with data sets of SARS-CoV-2 DNA sequences and
classic English literature.Comment: 56 pages, 31 figures, 2 tables. arXiv admin note: substantial text
overlap with arXiv:2204.0427
The risk-return relationship and volatility feedback in South Africa: a nonparametric Bayesian approach.
Masters Degree. University of KwaZulu-Natal, Durban.The risk-return relationship is a fundamental concept in finance and economic theory
and is also known as the “first fundamental law” in finance. Traditionally, the risk-return
relationship is known to help assist individuals in the construction of an efficient
portfolio where a desired risk and return profile is tailored to their needs. However, it
is a source of much more valuable information to various market participants such as
bankers, investors, policy makers and researchers alike. There are a number of
investment strategies, policy frameworks, theories and asset pricing models built on
the empirical result of the risk-return relationship. Hence, the topic of the risk-return
relationship is of broad importance. It has been widely investigated on an international
scale, especially by developed markets from as early as the 1950's, with the primary
motive being to help market participants optimise their chance to earn higher returns.
According to conventional economic theory, the relationship between risk and return
is a positive and linear relationship – the higher the risk, the higher the return.
However, there are many studies documented in literature which show a positive or
negative or no relationship at all. As a result, due to the magnitude of conflicting results
over the years, this has caused an international and local debate to arise regarding
the risk-return relationship. International studies have explored a number of theories
and models to attempt resolving the inconclusive empirical backing of the risk-return
relationship. On the other hand, the methods employed by South African studies and
the volume of literature on the topic is relatively limited.
South Africa is becoming increasingly more recognised, liberalised, interactive and
integrated into the international economy. Therefore, this study makes a significant
contribution from a South African market perspective. This study identifies volatility
feedback, a stronger measure of regular volatility, as an important source of
asymmetry to take into account when investigating the risk-return relationship. Given
that South Africa is an emerging market which is subject to higher levels of volatility,
one would expect the presence of this mechanism to be more pronounced. Thus, this
study investigates the risk-return relationship once volatility feedback is taken into
account by its magnitude in the South African market.
A valuable contribution of this study is the introduction of the novel concept
“asymmetric returns exposure” which refers to the risk that arises from the asymmetric
nature of returns. This measure has a certain level of uncertainty attached to it due to
its latent and stochastic nature. As a result, it may be ineffectively accounted for by
existing parametric methods such as regression analysis and GARCH type models
which are prone to model misspecification.
The results of this study are presented according to the robustness of the approaches
in the build up to the final result. First, the GARCH approach is employed and consists
of a symmetric and asymmetric GARCH type models. The GARCH approach is treated
as a preliminary test to investigate the presence of risk-return relationship and volatility
feedback, respectively. While the GARCH type models have the ability to take into
account the volatile nature of returns, asymmetries and nonlinearities remain
uncaptured by the probability distributions governing the model innovations. Thus, the
results of the GARCH type models are inconsistent and not statistically sound.
This motivates the use of a more robust method, namely, the Bayesian approach
which consists of a parametric and nonparametric Bayesian model. The Bayesian
approach has the ability to average out sources of uncertainty and measurement
errors and thus effectively account for “asymmetric returns exposure”. The test results
of both the parametric and nonparametric Bayesian model find that volatility feedback
has an insignificant effect in the South African market. Consequently, the risk-return
relationship is estimated free from empirical distortions that result from volatility
feedback. The result of the parametric Bayesian model is a positive and linear
relationship, in line with traditional theoretical expectations.
However, it is noteworthy that in the context of this study that the nonparametric
approach is highlighted over the parametric approach. The nonparametric approach
has the ability to adjust for model misspecifications and effectively account for
stochastic, asymmetric and latent properties. It has the ability to take into account an
infinite number of higher moment asymmetric forms of the risk-return relationship.
Thus, the nonparametric Bayesian model estimates the actual fundamental nature of
the data free from any predetermined assumptions or bias. According to the
nonparametric Bayesian model, the final result of this study is no relationship between
risk and return, in line with early South African studies
Recommended from our members
A Bayesian Nonparametric View on Count-Min Sketch
The count-min sketch is a time- and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data and text data, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens