21,493 research outputs found
Econometrics of Machine Learning Methods in Economic Forecasting
This paper surveys the recent advances in machine learning method for
economic forecasting. The survey covers the following topics: nowcasting,
textual data, panel and tensor data, high-dimensional Granger causality tests,
time series cross-validation, classification with economic losses
POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion
We propose a data-centric pipeline able to generate exogenous observation
data for the New Fashion Product Performance Forecasting (NFPPF) problem, i.e.,
predicting the performance of a brand-new clothing probe with no available past
observations. Our pipeline manufactures the missing past starting from a
single, available image of the clothing probe. It starts by expanding textual
tags associated with the image, querying related fashionable or unfashionable
images uploaded on the web at a specific time in the past. A binary classifier
is robustly trained on these web images by confident learning, to learn what
was fashionable in the past and how much the probe image conforms to this
notion of fashionability. This compliance produces the POtential Performance
(POP) time series, indicating how performing the probe could have been if it
were available earlier. POP proves to be highly predictive for the probe's
future performance, ameliorating the sales forecasts of all state-of-the-art
models on the recent VISUELLE fast-fashion dataset. We also show that POP
reflects the ground-truth popularity of new styles (ensembles of clothing
items) on the Fashion Forward benchmark, demonstrating that our webly-learned
signal is a truthful expression of popularity, accessible by everyone and
generalizable to any time of analysis. Forecasting code, data and the POP time
series are available at:
https://github.com/HumaticsLAB/POP-Mining-POtential-PerformanceComment: ECCV 202
Detecting and Explaining Causes From Text For a Time Series Event
Explaining underlying causes or effects about events is a challenging but
valuable task. We define a novel problem of generating explanations of a time
series event by (1) searching cause and effect relationships of the time series
with textual data and (2) constructing a connecting chain between them to
generate an explanation. To detect causal features from text, we propose a
novel method based on the Granger causality of time series between features
extracted from text such as N-grams, topics, sentiments, and their composition.
The generation of the sequence of causal entities requires a commonsense
causative knowledge base with efficient reasoning. To ensure good
interpretability and appropriate lexical usage we combine symbolic and neural
representations, using a neural reasoning algorithm trained on commonsense
causal tuples to predict the next cause step. Our quantitative and human
analysis show empirical evidence that our method successfully extracts
meaningful causality relationships between time series with textual features
and generates appropriate explanation between them.Comment: Accepted at EMNLP 201
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Essays on asynchronous time series and related multidimensional data
This thesis focusses on asynchronous time series and related multidimensional data: timedependent measurements with varying publication delays. This class of data exists in a broad range of fields. In social sciences, most official time series and repeated surveys are indeed asynchronous in nature since statistical offices need time to collect and aggregate raw data. In STEM, statistical offices are generally less relevant and most publication delays are caused by more exotic factors. For instance, with series derived from technological networks, they are usually generated by a direct reference (digital or textual) of the past (e.g., publishing pictures of a trip done a week ago that was also photographed and posted in real time by a friend). As a result, the study of data releases is key for developing accurate real-time models and finds applications in forecasting, policy and risk management
- …