301 research outputs found
Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition
Dealing with time series with missing values, including those afflicted by
low quality or over-saturation, presents a significant signal processing
challenge. The task of recovering these missing values, known as imputation,
has led to the development of several algorithms. However, we have observed
that the efficacy of these algorithms tends to diminish when the time series
exhibit non-stationary oscillatory behavior. In this paper, we introduce a
novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the
performance of existing imputation algorithms for oscillatory time series.
After running any chosen imputation algorithm, HaLI leverages the harmonic
decomposition based on the adaptive nonharmonic model of the initial imputation
to improve the imputation accuracy for oscillatory time series. Experimental
assessments conducted on synthetic and real signals consistently highlight that
HaLI enhances the performance of existing imputation algorithms. The algorithm
is made publicly available as a readily employable Matlab code for other
researchers to use
Predictive PAC Learning and Process Decompositions
We informally call a stochastic process learnable if it admits a
generalization error approaching zero in probability for any concept class with
finite VC-dimension (IID processes are the simplest example). A mixture of
learnable processes need not be learnable itself, and certainly its
generalization error need not decay at the same rate. In this paper, we argue
that it is natural in predictive PAC to condition not on the past observations
but on the mixture component of the sample path. This definition not only
matches what a realistic learner might demand, but also allows us to sidestep
several otherwise grave problems in learning from dependent data. In
particular, we give a novel PAC generalization bound for mixtures of learnable
processes with a generalization error that is not worse than that of each
mixture component. We also provide a characterization of mixtures of absolutely
regular (-mixing) processes, of independent probability-theoretic
interest.Comment: 9 pages, accepted in NIPS 201
Econometrics of Machine Learning Methods in Economic Forecasting
This paper surveys the recent advances in machine learning method for
economic forecasting. The survey covers the following topics: nowcasting,
textual data, panel and tensor data, high-dimensional Granger causality tests,
time series cross-validation, classification with economic losses
Incentivizing Data Sharing for Energy Forecasting: Analytics Markets with Correlated Data
Reliably forecasting uncertain power production is beneficial for the social
welfare of electricity markets by reducing the need for balancing resources.
Describing such forecasting as an analytics task, the current literature
proposes analytics markets as an incentive for data sharing to improve
accuracy, for instance by leveraging spatio-temporal correlations. The
challenge is that, when used as input features for forecasting, correlated data
complicates the market design with respect to the revenue allocation, as the
value of overlapping information is inherently combinatorial. We develop a
correlation-aware analytics market for a wind power forecasting application. To
allocate revenue, we adopt a Shapley value-based attribution policy, framing
the features of agents as players and their interactions as a characteristic
function game. We illustrate that there are multiple options to describe such a
game, each having causal nuances that influence market behavior when features
are correlated. We argue that no option is correct in a general sense, but that
the decision hinges on whether the market should address correlations from a
data-centric or model-centric perspective, a choice that can yield
counter-intuitive allocations if not considered carefully by the market
designer.Comment: 15 pages, 9 figures, 1 tabl
Path Signatures for Seizure Forecasting
Forecasting the state of a system from an observed time series is the subject
of research in many domains, such as computational neuroscience. Here, the
prediction of epileptic seizures from brain measurements is an unresolved
problem. There are neither complete models describing underlying brain
dynamics, nor do individual patients exhibit a single seizure onset pattern,
which complicates the development of a `one-size-fits-all' solution. Based on a
longitudinal patient data set, we address the automated discovery and
quantification of statistical features (biomarkers) that can be used to
forecast seizures in a patient-specific way. We use existing and novel feature
extraction algorithms, in particular the path signature, a recent development
in time series analysis. Of particular interest is how this set of complex,
nonlinear features performs compared to simpler, linear features on this task.
Our inference is based on statistical classification algorithms with in-built
subset selection to discern time series with and without an impending seizure
while selecting only a small number of relevant features. This study may be
seen as a step towards a generalisable pattern recognition pipeline for time
series in a broader context
SAMoSSA: Multivariate Singular Spectrum Analysis with Stochastic Autoregressive Noise
The well-established practice of time series analysis involves estimating
deterministic, non-stationary trend and seasonality components followed by
learning the residual stochastic, stationary components. Recently, it has been
shown that one can learn the deterministic non-stationary components accurately
using multivariate Singular Spectrum Analysis (mSSA) in the absence of a
correlated stationary component; meanwhile, in the absence of deterministic
non-stationary components, the Autoregressive (AR) stationary component can
also be learnt readily, e.g. via Ordinary Least Squares (OLS). However, a
theoretical underpinning of multi-stage learning algorithms involving both
deterministic and stationary components has been absent in the literature
despite its pervasiveness. We resolve this open question by establishing
desirable theoretical guarantees for a natural two-stage algorithm, where mSSA
is first applied to estimate the non-stationary components despite the presence
of a correlated stationary AR component, which is subsequently learned from the
residual time series. We provide a finite-sample forecasting consistency bound
for the proposed algorithm, SAMoSSA, which is data-driven and thus requires
minimal parameter tuning. To establish theoretical guarantees, we overcome
three hurdles: (i) we characterize the spectra of Page matrices of stable AR
processes, thus extending the analysis of mSSA; (ii) we extend the analysis of
AR process identification in the presence of arbitrary bounded perturbations;
(iii) we characterize the out-of-sample or forecasting error, as opposed to
solely considering model identification. Through representative empirical
studies, we validate the superior performance of SAMoSSA compared to existing
baselines. Notably, SAMoSSA's ability to account for AR noise structure yields
improvements ranging from 5% to 37% across various benchmark datasets
- …