13,325 research outputs found

    Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures

    Full text link
    In this work we present a non-parametric online market regime detection method for multidimensional data structures using a path-wise two-sample test derived from a maximum mean discrepancy-based similarity metric on path space that uses rough path signatures as a feature map. The latter similarity metric has been developed and applied as a discriminator in recent generative models for small data environments, and has been optimised here to the setting where the size of new incoming data is particularly small, for faster reactivity. On the same principles, we also present a path-wise method for regime clustering which extends our previous work. The presented regime clustering techniques were designed as ex-ante market analysis tools that can identify periods of approximatively similar market activity, but the new results also apply to path-wise, high dimensional-, and to non-Markovian settings as well as to data structures that exhibit autocorrelation. We demonstrate our clustering tools on easily verifiable synthetic datasets of increasing complexity, and also show how the outlined regime detection techniques can be used as fast on-line automatic regime change detectors or as outlier detection tools, including a fully automated pipeline. Finally, we apply the fine-tuned algorithms to real-world historical data including high-dimensional baskets of equities and the recent price evolution of crypto assets, and we show that our methodology swiftly and accurately indicated historical periods of market turmoil.Comment: 65 pages, 52 figure

    The Geometric Median and Applications to Robust Mean Estimation

    Full text link
    This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension dd; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a "local quadratic growth" condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself. As a corollary, we propose a simple stopping rule (applicable to any optimization method) which yields explicit error guarantees. We conclude with the numerical experiments including the application to estimation of mean values of log-returns for S&P 500 data.Comment: 28 pages, 2 figure

    Big in Reverse Mathematics: the uncountability of the real numbers

    Full text link
    The uncountability of R\mathbb{R} is one of its most basic properties, known far outside of mathematics. Cantor's 1874 proof of the uncountability of R\mathbb{R} even appears in the very first paper on set theory, i.e. a historical milestone. In this paper, we study the uncountability of R\mathbb{R} in Kohlenbach's higher-order Reverse Mathematics (RM for short), in the guise of the following principle: for a countable set AR, there exists yRA.\hbox{for a countable set $A\subset \mathbb{R}$, there exists $y\in \mathbb{R}\setminus A$.} An important conceptual observation is that the usual definition of countable set -- based on injections or bijections to N\mathbb{N} -- does not seem suitable for the RM-study of mainstream mathematics; we also propose a suitable (equivalent over strong systems) alternative definition of countable set, namely union over N\mathbb{N} of finite sets; the latter is known from the literature and closer to how countable sets occur 'in the wild'. We identify a considerable number of theorems that are equivalent to the centred theorem based on our alternative definition. Perhaps surprisingly, our equivalent theorems involve most basic properties of the Riemann integral, regulated or bounded variation functions, Blumberg's theorem, and Volterra's early work circa 1881. Our equivalences are also robust, promoting the uncountability of R\mathbb{R} to the status of 'big' system in RM.Comment: To appear in the Journal of Symbolic Logic; 28 pages plus technical appendix. Same technical appendix as: arXiv:2102.0478

    Optimality and Complexity in Measured Quantum-State Stochastic Processes

    Full text link
    If an experimentalist observes a sequence of emitted quantum states via either projective or positive-operator-valued measurements, the outcomes form a time series. Individual time series are realizations of a stochastic process over the measurements' classical outcomes. We recently showed that, in general, the resulting stochastic process is highly complex in two specific senses: (i) it is inherently unpredictable to varying degrees that depend on measurement choice and (ii) optimal prediction requires using an infinite number of temporal features. Here, we identify the mechanism underlying this complicatedness as generator nonunifilarity -- the degeneracy between sequences of generator states and sequences of measurement outcomes. This makes it possible to quantitatively explore the influence that measurement choice has on a quantum process' degrees of randomness and structural complexity using recently introduced methods from ergodic theory. Progress in this, though, requires quantitative measures of structure and memory in observed time series. And, success requires accurate and efficient estimation algorithms that overcome the requirement to explicitly represent an infinite set of predictive features. We provide these metrics and associated algorithms, using them to design informationally-optimal measurements of open quantum dynamical systems.Comment: 31 pages, 6 appendices, 22 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/qdic.ht

    Moments of Dirichlet L-functions in Function Fields

    Get PDF
    In this thesis, we compute several moments and mean values of Dirichlet L-functions in function fields, in both the odd and even characteristic setting.Leverhulme Trus

    Development of simulator software on the topic "Normal algorithms" of the distance learning course "Theory of Algorithms"

    Get PDF
    The paper describes the design and development of a training simulator in the NetBeans integrated environment in the Java programming language. The simulator program articulates questions of three levels of complexity, methodological recommendations and theoretical issues on the topic. The developed software product is implemented in the corresponding distance learning course on the Moodle platform and is recommended for use in the educational process by applicants in the "Computer Science" specialty

    Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research

    Full text link
    This study evaluates the GPT-4 Large Language Model's abductive reasoning in complex fields like medical diagnostics, criminology, and cosmology. Using an interactive interview format, the AI assistant demonstrated reliability in generating and selecting hypotheses. It inferred plausible medical diagnoses based on patient data and provided potential causes and explanations in criminology and cosmology. The results highlight the potential of LLMs in complex problem-solving and the need for further research to maximize their practical applications.Comment: The article is 12 pages long and has one figure. It also includes a link to some ChatGPT dialogues that show the experiments that support the article's findings. The article will be published in V. Bambini and C. Barattieri di San Pietro (eds.), Sistemi Intelligenti, Special Section "Multidisciplinary perspectives on ChatGPT and the family of Large Language Models

    Existence and stability of nonmonotone hydraulic shocks for the Saint Venant equations of inclined thin-film flow

    Full text link
    Extending work of Yang-Zumbrun for the hydrodynamically stable case of Froude number F < 2, we categorize completely the existence and convective stability of hydraulic shock profiles of the Saint Venant equations of inclined thin-film flow. Moreover, we confirm by numerical experiment that asymptotic dynamics for general Riemann data is given in the hydrodynamic instability regime by either stable hydraulic shock waves, or a pattern consisting of an invading roll wave front separated by a finite terminating Lax shock from a constant state at plus infinity. Notably, profiles, and existence and stability diagrams are all rigorously obtained by mathematical analysis and explicit calculation

    A Revenue Function for Comparison-Based Hierarchical Clustering

    Full text link
    Comparison-based learning addresses the problem of learning when, instead of explicit features or pairwise similarities, one only has access to comparisons of the form: \emph{Object AA is more similar to BB than to CC.} Recently, it has been shown that, in Hierarchical Clustering, single and complete linkage can be directly implemented using only such comparisons while several algorithms have been proposed to emulate the behaviour of average linkage. Hence, finding hierarchies (or dendrograms) using only comparisons is a well understood problem. However, evaluating their meaningfulness when no ground-truth nor explicit similarities are available remains an open question. In this paper, we bridge this gap by proposing a new revenue function that allows one to measure the goodness of dendrograms using only comparisons. We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities. On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons. On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods.Comment: 26 pages, 6 figures, 5 tables. Transactions on Machine Learning Research (2023

    Sturmian and infinitely desubstitutable words accepted by an {\omega}-automaton

    Full text link
    Given an ω\omega-automaton and a set of substitutions, we look at which accepted words can also be defined through these substitutions, and in particular if there is at least one. We introduce a method using desubstitution of ω\omega-automata to describe the structure of preimages of accepted words under arbitrary sequences of homomorphisms: this takes the form of a meta-ω\omega-automaton. We decide the existence of an accepted purely substitutive word, as well as the existence of an accepted fixed point. In the case of multiple substitutions (non-erasing homomorphisms), we decide the existence of an accepted infinitely desubstitutable word, with possibly some constraints on the sequence of substitutions e.g. Sturmian words or Arnoux-Rauzy words). As an application, we decide when a set of finite words codes e.g. a Sturmian word. As another application, we also show that if an ω\omega-automaton accepts a Sturmian word, it accepts the image of the full shift under some Sturmian morphism
    corecore