Search CORE

4 research outputs found

Conformalized Frequency Estimation from Sketched Data

Author: Favaro Stefano
Sesia Matteo
Publication venue
Publication date: 08/04/2022
Field of study

A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in a very large data set, based on the information contained in a much smaller sketch of those data. The approach is completely data-adaptive and makes no use of any knowledge of the population distribution or of the inner workings of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although the proposed solution is much more broadly applicable, this paper explicitly demonstrates its use in combination with the famous count-min sketch algorithm and a non-linear variation thereof to facilitate the exposition. The performance is compared to that of existing frequentist and Bayesian alternatives through several experiments with synthetic data as well as with real data sets consisting of SARS-CoV-2 DNA sequences and classic English literature.Comment: 29 pages, 20 figures, 2 table

arXiv.org e-Print Archive

Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability

Author: Dobriban Edgar
Favaro Stefano
Sesia Matteo
Publication venue
Publication date: 08/11/2022
Field of study

A flexible method is developed to construct a confidence interval for the frequency of a queried object in a very large data set, based on a much smaller sketch of the data. The approach requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals for random queries using a conformal inference approach. After achieving marginal coverage for random queries under the assumption of data exchangeability, the proposed method is extended to provide stronger inferences accounting for possibly heterogeneous frequencies of different random queries, redundant queries, and distribution shifts. While the presented methods are broadly applicable, this paper focuses on use cases involving the count-min sketch algorithm and a non-linear variation thereof, to facilitate comparison to prior work. In particular, the developed methods are compared empirically to frequentist and Bayesian alternatives, through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.Comment: 56 pages, 31 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2204.0427

arXiv.org e-Print Archive

The risk-return relationship and volatility feedback in South Africa: a nonparametric Bayesian approach.

Author: Dwarika Nitesha.
Publication venue
Publication date: 01/01/2020
Field of study

Masters Degree. University of KwaZulu-Natal, Durban.The risk-return relationship is a fundamental concept in finance and economic theory and is also known as the “first fundamental law” in finance. Traditionally, the risk-return relationship is known to help assist individuals in the construction of an efficient portfolio where a desired risk and return profile is tailored to their needs. However, it is a source of much more valuable information to various market participants such as bankers, investors, policy makers and researchers alike. There are a number of investment strategies, policy frameworks, theories and asset pricing models built on the empirical result of the risk-return relationship. Hence, the topic of the risk-return relationship is of broad importance. It has been widely investigated on an international scale, especially by developed markets from as early as the 1950's, with the primary motive being to help market participants optimise their chance to earn higher returns. According to conventional economic theory, the relationship between risk and return is a positive and linear relationship – the higher the risk, the higher the return. However, there are many studies documented in literature which show a positive or negative or no relationship at all. As a result, due to the magnitude of conflicting results over the years, this has caused an international and local debate to arise regarding the risk-return relationship. International studies have explored a number of theories and models to attempt resolving the inconclusive empirical backing of the risk-return relationship. On the other hand, the methods employed by South African studies and the volume of literature on the topic is relatively limited. South Africa is becoming increasingly more recognised, liberalised, interactive and integrated into the international economy. Therefore, this study makes a significant contribution from a South African market perspective. This study identifies volatility feedback, a stronger measure of regular volatility, as an important source of asymmetry to take into account when investigating the risk-return relationship. Given that South Africa is an emerging market which is subject to higher levels of volatility, one would expect the presence of this mechanism to be more pronounced. Thus, this study investigates the risk-return relationship once volatility feedback is taken into account by its magnitude in the South African market. A valuable contribution of this study is the introduction of the novel concept “asymmetric returns exposure” which refers to the risk that arises from the asymmetric nature of returns. This measure has a certain level of uncertainty attached to it due to its latent and stochastic nature. As a result, it may be ineffectively accounted for by existing parametric methods such as regression analysis and GARCH type models which are prone to model misspecification. The results of this study are presented according to the robustness of the approaches in the build up to the final result. First, the GARCH approach is employed and consists of a symmetric and asymmetric GARCH type models. The GARCH approach is treated as a preliminary test to investigate the presence of risk-return relationship and volatility feedback, respectively. While the GARCH type models have the ability to take into account the volatile nature of returns, asymmetries and nonlinearities remain uncaptured by the probability distributions governing the model innovations. Thus, the results of the GARCH type models are inconsistent and not statistically sound. This motivates the use of a more robust method, namely, the Bayesian approach which consists of a parametric and nonparametric Bayesian model. The Bayesian approach has the ability to average out sources of uncertainty and measurement errors and thus effectively account for “asymmetric returns exposure”. The test results of both the parametric and nonparametric Bayesian model find that volatility feedback has an insignificant effect in the South African market. Consequently, the risk-return relationship is estimated free from empirical distortions that result from volatility feedback. The result of the parametric Bayesian model is a positive and linear relationship, in line with traditional theoretical expectations. However, it is noteworthy that in the context of this study that the nonparametric approach is highlighted over the parametric approach. The nonparametric approach has the ability to adjust for model misspecifications and effectively account for stochastic, asymmetric and latent properties. It has the ability to take into account an infinite number of higher moment asymmetric forms of the risk-return relationship. Thus, the nonparametric Bayesian model estimates the actual fundamental nature of the data free from any predetermined assumptions or bias. According to the nonparametric Bayesian model, the final result of this study is no relationship between risk and return, in line with early South African studies

ResearchSpace@UKZN

Recommended from our members

A Bayesian Nonparametric View on Count-Min Sketch

Author: Adams Ryan P
Cai Diana
Mitzenmacher Michael
Publication venue
Publication date: 01/01/2018
Field of study

The count-min sketch is a time- and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data and text data, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens

Princeton University Open Access Repository