Search CORE

301 research outputs found

Data Markets for Collaborative Forecasting in the Energy Sector

Author: Tiago Guedes Teixeira
Publication venue
Publication date: 20/11/2023
Field of study

Repositório Aberto da Universidade do Porto

Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition

Author: Colominas Marcelo A.
Ruiz Joaquin
Wu Hau-tieng
Publication venue
Publication date: 08/09/2023
Field of study

Dealing with time series with missing values, including those afflicted by low quality or over-saturation, presents a significant signal processing challenge. The task of recovering these missing values, known as imputation, has led to the development of several algorithms. However, we have observed that the efficacy of these algorithms tends to diminish when the time series exhibit non-stationary oscillatory behavior. In this paper, we introduce a novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the performance of existing imputation algorithms for oscillatory time series. After running any chosen imputation algorithm, HaLI leverages the harmonic decomposition based on the adaptive nonharmonic model of the initial imputation to improve the imputation accuracy for oscillatory time series. Experimental assessments conducted on synthetic and real signals consistently highlight that HaLI enhances the performance of existing imputation algorithms. The algorithm is made publicly available as a readily employable Matlab code for other researchers to use

arXiv.org e-Print Archive

Predictive PAC Learning and Process Decompositions

Author: Kontorovich Aryeh
Shalizi Cosma Rohilla
Publication venue
Publication date: 19/09/2013
Field of study

We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular (

\beta

-mixing) processes, of independent probability-theoretic interest.Comment: 9 pages, accepted in NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Econometrics of Machine Learning Methods in Economic Forecasting

Author: Babii Andrii
Ghysels Eric
Striaukas Jonas
Publication venue
Publication date: 21/08/2023
Field of study

This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses

arXiv.org e-Print Archive

Incentivizing Data Sharing for Energy Forecasting: Analytics Markets with Correlated Data

Author: Falconer Thomas
Kazempour Jalal
Pinson Pierre
Publication venue
Publication date: 09/10/2023
Field of study

Reliably forecasting uncertain power production is beneficial for the social welfare of electricity markets by reducing the need for balancing resources. Describing such forecasting as an analytics task, the current literature proposes analytics markets as an incentive for data sharing to improve accuracy, for instance by leveraging spatio-temporal correlations. The challenge is that, when used as input features for forecasting, correlated data complicates the market design with respect to the revenue allocation, as the value of overlapping information is inherently combinatorial. We develop a correlation-aware analytics market for a wind power forecasting application. To allocate revenue, we adopt a Shapley value-based attribution policy, framing the features of agents as players and their interactions as a characteristic function game. We illustrate that there are multiple options to describe such a game, each having causal nuances that influence market behavior when features are correlated. We argue that no option is correct in a general sense, but that the decision hinges on whether the market should address correlations from a data-centric or model-centric perspective, a choice that can yield counter-intuitive allocations if not considered carefully by the market designer.Comment: 15 pages, 9 figures, 1 tabl

arXiv.org e-Print Archive

Path Signatures for Seizure Forecasting

Author: Burkitt Anthony N.
Cook Mark J.
Eskikand Parvin Zarei
Grayden David B.
Haderlein Jonas F.
Mareels Iven M. Y.
Peterson Andre D. H.
Publication venue
Publication date: 18/08/2023
Field of study

Forecasting the state of a system from an observed time series is the subject of research in many domains, such as computational neuroscience. Here, the prediction of epileptic seizures from brain measurements is an unresolved problem. There are neither complete models describing underlying brain dynamics, nor do individual patients exhibit a single seizure onset pattern, which complicates the development of a `one-size-fits-all' solution. Based on a longitudinal patient data set, we address the automated discovery and quantification of statistical features (biomarkers) that can be used to forecast seizures in a patient-specific way. We use existing and novel feature extraction algorithms, in particular the path signature, a recent development in time series analysis. Of particular interest is how this set of complex, nonlinear features performs compared to simpler, linear features on this task. Our inference is based on statistical classification algorithms with in-built subset selection to discern time series with and without an impending seizure while selecting only a small number of relevant features. This study may be seen as a step towards a generalisable pattern recognition pipeline for time series in a broader context

arXiv.org e-Print Archive

SAMoSSA: Multivariate Singular Spectrum Analysis with Stochastic Autoregressive Noise

Author: Alomar Abdullah
Dahleh Munther
Mann Sean
Shah Devavrat
Publication venue
Publication date: 26/11/2023
Field of study

The well-established practice of time series analysis involves estimating deterministic, non-stationary trend and seasonality components followed by learning the residual stochastic, stationary components. Recently, it has been shown that one can learn the deterministic non-stationary components accurately using multivariate Singular Spectrum Analysis (mSSA) in the absence of a correlated stationary component; meanwhile, in the absence of deterministic non-stationary components, the Autoregressive (AR) stationary component can also be learnt readily, e.g. via Ordinary Least Squares (OLS). However, a theoretical underpinning of multi-stage learning algorithms involving both deterministic and stationary components has been absent in the literature despite its pervasiveness. We resolve this open question by establishing desirable theoretical guarantees for a natural two-stage algorithm, where mSSA is first applied to estimate the non-stationary components despite the presence of a correlated stationary AR component, which is subsequently learned from the residual time series. We provide a finite-sample forecasting consistency bound for the proposed algorithm, SAMoSSA, which is data-driven and thus requires minimal parameter tuning. To establish theoretical guarantees, we overcome three hurdles: (i) we characterize the spectra of Page matrices of stable AR processes, thus extending the analysis of mSSA; (ii) we extend the analysis of AR process identification in the presence of arbitrary bounded perturbations; (iii) we characterize the out-of-sample or forecasting error, as opposed to solely considering model identification. Through representative empirical studies, we validate the superior performance of SAMoSSA compared to existing baselines. Notably, SAMoSSA's ability to account for AR noise structure yields improvements ranging from 5% to 37% across various benchmark datasets

arXiv.org e-Print Archive