13,325 research outputs found
Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures
In this work we present a non-parametric online market regime detection
method for multidimensional data structures using a path-wise two-sample test
derived from a maximum mean discrepancy-based similarity metric on path space
that uses rough path signatures as a feature map. The latter similarity metric
has been developed and applied as a discriminator in recent generative models
for small data environments, and has been optimised here to the setting where
the size of new incoming data is particularly small, for faster reactivity.
On the same principles, we also present a path-wise method for regime
clustering which extends our previous work. The presented regime clustering
techniques were designed as ex-ante market analysis tools that can identify
periods of approximatively similar market activity, but the new results also
apply to path-wise, high dimensional-, and to non-Markovian settings as well as
to data structures that exhibit autocorrelation.
We demonstrate our clustering tools on easily verifiable synthetic datasets
of increasing complexity, and also show how the outlined regime detection
techniques can be used as fast on-line automatic regime change detectors or as
outlier detection tools, including a fully automated pipeline. Finally, we
apply the fine-tuned algorithms to real-world historical data including
high-dimensional baskets of equities and the recent price evolution of crypto
assets, and we show that our methodology swiftly and accurately indicated
historical periods of market turmoil.Comment: 65 pages, 52 figure
The Geometric Median and Applications to Robust Mean Estimation
This paper is devoted to the statistical and numerical properties of the
geometric median, and its applications to the problem of robust mean estimation
via the median of means principle. Our main theoretical results include (a) an
upper bound for the distance between the mean and the median for general
absolutely continuous distributions in R^d, and examples of specific classes of
distributions for which these bounds do not depend on the ambient dimension
; (b) exponential deviation inequalities for the distance between the sample
and the population versions of the geometric median, which again depend only on
the trace-type quantities and not on the ambient dimension. As a corollary, we
deduce improved bounds for the (geometric) median of means estimator that hold
for large classes of heavy-tailed distributions. Finally, we address the error
of numerical approximation, which is an important practical aspect of any
statistical estimation procedure. We demonstrate that the objective function
minimized by the geometric median satisfies a "local quadratic growth"
condition that allows one to translate suboptimality bounds for the objective
function to the corresponding bounds for the numerical approximation to the
median itself. As a corollary, we propose a simple stopping rule (applicable to
any optimization method) which yields explicit error guarantees. We conclude
with the numerical experiments including the application to estimation of mean
values of log-returns for S&P 500 data.Comment: 28 pages, 2 figure
Big in Reverse Mathematics: the uncountability of the real numbers
The uncountability of is one of its most basic properties, known
far outside of mathematics. Cantor's 1874 proof of the uncountability of
even appears in the very first paper on set theory, i.e. a
historical milestone. In this paper, we study the uncountability of
in Kohlenbach's higher-order Reverse Mathematics (RM for short),
in the guise of the following principle: An important
conceptual observation is that the usual definition of countable set -- based
on injections or bijections to -- does not seem suitable for the
RM-study of mainstream mathematics; we also propose a suitable (equivalent over
strong systems) alternative definition of countable set, namely union over
of finite sets; the latter is known from the literature and closer
to how countable sets occur 'in the wild'. We identify a considerable number of
theorems that are equivalent to the centred theorem based on our alternative
definition. Perhaps surprisingly, our equivalent theorems involve most basic
properties of the Riemann integral, regulated or bounded variation functions,
Blumberg's theorem, and Volterra's early work circa 1881. Our equivalences are
also robust, promoting the uncountability of to the status of
'big' system in RM.Comment: To appear in the Journal of Symbolic Logic; 28 pages plus technical
appendix. Same technical appendix as: arXiv:2102.0478
Optimality and Complexity in Measured Quantum-State Stochastic Processes
If an experimentalist observes a sequence of emitted quantum states via
either projective or positive-operator-valued measurements, the outcomes form a
time series. Individual time series are realizations of a stochastic process
over the measurements' classical outcomes. We recently showed that, in general,
the resulting stochastic process is highly complex in two specific senses: (i)
it is inherently unpredictable to varying degrees that depend on measurement
choice and (ii) optimal prediction requires using an infinite number of
temporal features. Here, we identify the mechanism underlying this
complicatedness as generator nonunifilarity -- the degeneracy between sequences
of generator states and sequences of measurement outcomes. This makes it
possible to quantitatively explore the influence that measurement choice has on
a quantum process' degrees of randomness and structural complexity using
recently introduced methods from ergodic theory. Progress in this, though,
requires quantitative measures of structure and memory in observed time series.
And, success requires accurate and efficient estimation algorithms that
overcome the requirement to explicitly represent an infinite set of predictive
features. We provide these metrics and associated algorithms, using them to
design informationally-optimal measurements of open quantum dynamical systems.Comment: 31 pages, 6 appendices, 22 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/qdic.ht
Moments of Dirichlet L-functions in Function Fields
In this thesis, we compute several moments and mean values of Dirichlet L-functions in function fields, in both the odd and even characteristic setting.Leverhulme Trus
Development of simulator software on the topic "Normal algorithms" of the distance learning course "Theory of Algorithms"
The paper describes the design and development of a training simulator in the NetBeans integrated environment in the Java programming language. The simulator program articulates questions of three levels of complexity, methodological recommendations and theoretical issues on the topic. The developed software product is implemented in the corresponding distance learning course on the Moodle platform and is recommended for use in the educational process by applicants in the "Computer Science" specialty
Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research
This study evaluates the GPT-4 Large Language Model's abductive reasoning in
complex fields like medical diagnostics, criminology, and cosmology. Using an
interactive interview format, the AI assistant demonstrated reliability in
generating and selecting hypotheses. It inferred plausible medical diagnoses
based on patient data and provided potential causes and explanations in
criminology and cosmology. The results highlight the potential of LLMs in
complex problem-solving and the need for further research to maximize their
practical applications.Comment: The article is 12 pages long and has one figure. It also includes a
link to some ChatGPT dialogues that show the experiments that support the
article's findings. The article will be published in V. Bambini and C.
Barattieri di San Pietro (eds.), Sistemi Intelligenti, Special Section
"Multidisciplinary perspectives on ChatGPT and the family of Large Language
Models
Existence and stability of nonmonotone hydraulic shocks for the Saint Venant equations of inclined thin-film flow
Extending work of Yang-Zumbrun for the hydrodynamically stable case of Froude
number F < 2, we categorize completely the existence and convective stability
of hydraulic shock profiles of the Saint Venant equations of inclined thin-film
flow. Moreover, we confirm by numerical experiment that asymptotic dynamics for
general Riemann data is given in the hydrodynamic instability regime by either
stable hydraulic shock waves, or a pattern consisting of an invading roll wave
front separated by a finite terminating Lax shock from a constant state at plus
infinity. Notably, profiles, and existence and stability diagrams are all
rigorously obtained by mathematical analysis and explicit calculation
A Revenue Function for Comparison-Based Hierarchical Clustering
Comparison-based learning addresses the problem of learning when, instead of
explicit features or pairwise similarities, one only has access to comparisons
of the form: \emph{Object is more similar to than to .} Recently, it
has been shown that, in Hierarchical Clustering, single and complete linkage
can be directly implemented using only such comparisons while several
algorithms have been proposed to emulate the behaviour of average linkage.
Hence, finding hierarchies (or dendrograms) using only comparisons is a well
understood problem. However, evaluating their meaningfulness when no
ground-truth nor explicit similarities are available remains an open question.
In this paper, we bridge this gap by proposing a new revenue function that
allows one to measure the goodness of dendrograms using only comparisons. We
show that this function is closely related to Dasgupta's cost for hierarchical
clustering that uses pairwise similarities. On the theoretical side, we use the
proposed revenue function to resolve the open problem of whether one can
approximately recover a latent hierarchy using few triplet comparisons. On the
practical side, we present principled algorithms for comparison-based
hierarchical clustering based on the maximisation of the revenue and we
empirically compare them with existing methods.Comment: 26 pages, 6 figures, 5 tables. Transactions on Machine Learning
Research (2023
Sturmian and infinitely desubstitutable words accepted by an {\omega}-automaton
Given an -automaton and a set of substitutions, we look at which
accepted words can also be defined through these substitutions, and in
particular if there is at least one. We introduce a method using desubstitution
of -automata to describe the structure of preimages of accepted words
under arbitrary sequences of homomorphisms: this takes the form of a
meta--automaton.
We decide the existence of an accepted purely substitutive word, as well as
the existence of an accepted fixed point. In the case of multiple substitutions
(non-erasing homomorphisms), we decide the existence of an accepted infinitely
desubstitutable word, with possibly some constraints on the sequence of
substitutions e.g. Sturmian words or Arnoux-Rauzy words). As an application, we
decide when a set of finite words codes e.g. a Sturmian word. As another
application, we also show that if an -automaton accepts a Sturmian
word, it accepts the image of the full shift under some Sturmian morphism
- …