102 research outputs found

    Statistical Methods for Analyzing Time Series Data Drawn from Complex Social Systems

    Get PDF
    The rise of human interaction in digital environments has lead to an abundance of behavioral traces. These traces allow for model-based investigation of human-human and human-machine interaction `in the wild.' Stochastic models allow us to both predict and understand human behavior. In this thesis, we present statistical procedures for learning such models from the behavioral traces left in digital environments. First, we develop a non-parametric method for smoothing time series data corrupted by serially correlated noise. The method determines the simplest smoothing of the data that simultaneously gives the simplest residuals, where simplicity of the residuals is measured by their statistical complexity. We find that complexity regularized regression outperforms generalized cross validation in the presence of serially correlated noise. Next, we cast the task of modeling individual-level user behavior on social media into a predictive framework. We demonstrate the performance of two contrasting approaches, computational mechanics and echo state networks, on a heterogeneous data set drawn from user behavior on Twitter. We demonstrate that the behavior of users can be well-modeled as processes with self-feedback. We find that the two modeling approaches perform very similarly for most users, but that users where the two methods differ in performance highlight the challenges faced in applying predictive models to dynamic social data. We then expand the predictive problem of the previous work to modeling the aggregate behavior of large collections of users. We use three models, corresponding to seasonal, aggregate autoregressive, and aggregation-of-individual approaches, and find that the performance of the methods at predicting times of high activity depends strongly on the tradeoff between true and false positives, with no method dominating. Our results highlight the challenges and opportunities involved in modeling complex social systems, and demonstrate how influencers interested in forecasting potential user engagement can use complexity modeling to make better decisions. Finally, we turn from a predictive to a descriptive framework, and investigate how well user behavior can be attributed to time of day, self-memory, and social inputs. The models allow us to describe how a user processes their past behavior and their social inputs. We find that despite the diversity of observed user behavior, most models inferred fall into a small subclass of all possible finitary processes. Thus, our work demonstrates that user behavior, while quite complex, belies simple underlying computational structures

    Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages

    Full text link
    The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the theory of multivariate time series; however, in practice, identifiability issues have led many authors to abandon VARMA modeling in favor of the simpler Vector AutoRegressive (VAR) model. Such a practice is unfortunate since even very simple VARMA models can have quite complicated VAR representations. We narrow this gap with a new optimization-based approach to VARMA identification that is built upon the principle of parsimony. Among all equivalent data-generating models, we seek the parameterization that is "simplest" in a certain sense. A user-specified strongly convex penalty is used to measure model simplicity, and that same penalty is then used to define an estimator that can be efficiently computed. We show that our estimator converges to a parsimonious element in the set of all equivalent data-generating models, in a double asymptotic regime where the number of component time series is allowed to grow with sample size. Further, we derive non-asymptotic upper bounds on the estimation error of our method relative to our specially identified target. Novel theoretical machinery includes non-asymptotic analysis of infinite-order VAR, elastic net estimation under a singular covariance structure of regressors, and new concentration inequalities for quadratic forms of random variables from Gaussian time series. We illustrate the competitive performance of our methods in simulation and several application domains, including macro-economic forecasting, demand forecasting, and volatility forecasting

    Teollisuuden tuottajahintaindeksin ennustaminen suuriulotteisen aineiston avulla

    Get PDF
    Kansantalouden nykyistä ja tulevaa tilaa koskevan ajankohtaisen tiedon tuottaminen on tärkeää käytännön talouspolitiikan näkökulmasta: politiikkatoimien toimeenpanon ja niiden vaikutusten ilmenemisen välillä on tyypillisesti merkittäviä viiveitä, mikä luo tarpeen ennakoida kokonaistaloudellisten suureiden kehitystä. Tuottajahintaindeksi on yksi tällainen makrotaloudellinen suure: tuottajahintaindeksien avulla pyritään seuraamaan kansantaloudessa tuotettujen hyödykkeiden yleisen hintatason muutoksia tuottajien näkökulmasta, mikä tekee niistä varteenotettavan inflaatiopaineen ja suhdanneolojen indikaattorin. Tämän tutkielman pääasiallisena tavoitteena on selvittää mahdollisuuksia kotimaisen teollisuuden tuottajahintaindeksin luotettavaan ennustamiseen lyhyellä aikavälillä hyödyntäen suurta ulkoisten ennustavien muuttujien joukkoa. Ennustavien muuttujien lukumäärän kasvattaminen altistaa tavanomaiset ennustamismenetelmät epätarkkuuksille ja tekee niiden soveltamisen suoranaisen mahdottomaksi, kun muuttujien määrä ylittää mallin estimoimiseen käytettävissä olevien havaintojen lukumäärän. Ratkaisuksi tähän ongelmaan on ehdotettu lukuisia vaihtoehtoisia menetelmiä. Tutkielma tarjoaa laajan yleiskatsauksen näihin menetelmiin sekä muihin makrotaloudellisten muuttujien ennustamisen kannalta oleellisiin näkökohtiin. Koska yksikään vaihtoehtoisista menetelmistä ei ole osoittautunut käytännön sovelluksissa yksiselitteisesti muita paremmaksi, tutkielman empiiriseen osuuteen on valittu sovellettavaksi menetelmiä, jotka edustavat kahta keskenään erityyppistä lähestymistapaa suuriulotteiseen ennustamiseen: dynaamisia faktorimalleja ja regularisoituja regressioita. Dynaamisten faktorimallien vaikuttavuus perustuu oletukseen, jonka mukaan suuriulotteisen aineiston sisältämä oleellinen informaatio voidaan tiivistää huomattavasti pienempään joukkoon taustalla vaikuttavia muuttujia, faktoreita, joiden estimaatteja voidaan käyttää edelleen ennustamiseen. Regularisoitujen regressioiden tarjoama ratkaisu taas perustuu ennusteeseen liittyvän harhan ja varianssin tasapainottamiseen. Laajempaan regularisoitujen regressioiden luokkaan kuuluvista menetelmistä tutkielmassa on käytössä neljä eri muunnosta: ridge, lasso, elastinen verkko ja adaptiivinen lasso. Menetelmien empiiristä suorituskykyä arvioidaan toteuttamalla simuloitu otoksen ulkopuolinen ennustekoe, jossa kohdemuuttujalle estimoidaan historiallisen aineiston avulla sarja peräkkäisiä ennusteita verrattavaksi vastaavan ajanjakson toteutuneisiin arvoihin. Koejärjestelyn tavoitteena on tuottaa edustavaa tietoa ennustemallien tarkkuudesta jäljittelemällä tosiaikaisen ennustamisen olosuhteita: kunkin ennusteen tuottamiseksi hyödynnetään ainoastaan informaatiota, joka olisi ollut käytettävissä ennusteen laadinta-ajankohtana. Kokeessa käytettävien ennustavien muuttujien joukko koostuu eri lähteistä kerättyjen taloudellisten muuttujien kuukausittaisista aikasarjoista. Ennustekokeen perusteella suuriulotteisten mallien etu keskimääräisessä ennustetarkkuudessa yksinkertaiseen autoregressiiviseen verrokkimalliin verrattuna osoittautuu ainoastaan marginaaliseksi yhden, kahden ja kolmen kuukauden päähän tähtäävillä ennustehorisonteilla. Myöskään käytettyjen suuriulotteisten menetelmien kesken ei havaita merkittäviä eroja ennustetarkkuudessa. Suotuisampia tuloksia saavutetaan sen sijaan käyttämällä suhteellisen nopeasti saataville tulevien markkinamuuttujien havaintoja indeksin samanaikaisten arvojen ennustamiseen tulevien arvojen sijaan. Tässä tapauksessa erityisesti regularisoidut mallit esiintyvät edukseen. Tulokset antavat osviittaa, että varteenotettavimmat mahdollisuudet tuottajahintaindeksin ennakoimiseen voisivat perustua ulkoisten muuttujien julkaisuviiveeseen liittyvän edun hyödyntämiseen indeksin samanaikaisessa ennustamisessa.Producing timely information regarding the current and future state of the economy is important for the practice of economic policy: the delay between the implementation of policy measures and the emergence of their effects is typically considerable, which creates a need to anticipate developments in macroeconomic variables. The producer price index is one such variable: producer price indices are used to track changes in the general price level of goods produced within an economy from the point-of-view of producers, which makes them prominent indicators of inflationary pressures and business cycle conditions. The principal objective of this thesis is to investigate whether the Finnish Producer Price Index for Manufactured Goods could be reliably forecasted in the short run using large sets of external predictors. Increasing the number of predictors exposes standard forecasting methods to inaccuracies and makes their application outright infeasible once the number of variables exceeds the number of observations available for the estimation of the forecasting model. Various alternative methods have been proposed to counter this issue. This thesis provides a broad overview of these methods as well as other relevant issues pertaining to the forecasting macroeconomic variables. Given that no single framework has proven to dominate others in practical applications, a selection of methods has been chosen for the empirical section of this thesis. These methods represent two different approaches to high-dimensional forecasting: dynamic factor models and penalized regressions. The effectiveness of dynamic factor models is based on the assumption that relevant information contained in high-dimensional data can be summarized using only relatively few underlying factors, the estimates of which can, in turn, be used for forecasting. The solution offered by penalized regressions, on the other hand, is based on striking a balance between the bias and variance of the forecasts. Out of the broader class of penalized methods, four different variations will be utilized in this thesis: the Ridge, Lasso, Elastic Net, and Adaptive Lasso. The empirical performance of the methods will be assessed by conducting a simulated out-of-sample forecasting experiment, in which a series of consecutive forecasts are estimated for the target variable using historical data. These forecasts are, in turn, compared to their realized counterparts. The objective of the experimental arrangement is to produce representative information regarding the empirical accuracy of the respective forecasting models by emulating circumstances faced in real-time forecasting: only information that would have been available at the time is used to produce each forecast. The set of predictors used in the experiment is composed of monthly economic time series collected from a variety of sources. Based on the forecasting experiment, the benefit of the high-dimensional models in terms of average forecasting accuracy turns out to be only marginal in comparison to a univariate autoregressive benchmark at the one-, two-, and three-month horizons. Moreover, the differences among the respective high-dimensional methods are found to be insignificant. On the other hand, more favorable results are achieved by using relatively timely market-based variables to predict the concurrent rather than strictly future values of the index. In this case, the penalized models perform particularly well. The results indicate that leveraging the advantage in publication lag enjoyed by external predictors for the purpose of contemporaneous prediction, or nowcasting, could represent the most potential for predicting the producer price index

    M-GARCH Hedge Ratios And Hedging Effectiveness In Australian Futures Markets

    Get PDF
    This study deals with the estimation of the optimal hedge ratios using various econometric models. Most of the recent papers have demonstrated that the conventional ordinary least squares (OLS) method of estimating constant hedge ratios is inappropriate, other more complicated models however seem to produce no more efficient hedge ratios. Using daily AOIs and SPI futures on the Australian market, optimal hedge ratios are calculated from four different models: the OLS regression model, the bivariate vector autoaggressive model (BVAR), the error-correction model (ECM) and the multivariate diagonal Vcc GARCH Model. The performance of each hedge ratio is then compared. The hedging effectiveness is measured in terms of ex-post and ex-ante risk-return traHe-off at various forcasting horizons. It is generally found that the GARCH time varying hedge ratios provide the greatest portfolio risk reduction, particularly for longer hedging horizons, but hey so not generate the highest portfolio return

    On the predictability of U.S. stock market using machine learning and deep learning techniques

    Get PDF
    Conventional market theories are considered to be inconsistent approach in modern financial analysis. This thesis focuses mainly on the application of sophisticated machine learning and deep learning techniques in stock market statistical predictability and economic significance over the benchmark conventional efficient market hypothesis and econometric models. Five chapters and three publishable papers were proposed altogether, and each chapter is developed to solve specific identifiable problem(s). Chapter one gives the general introduction of the thesis. It presents the statement of the research problems identified in the relevant literature, the objective of the study and the significance of the study. Chapter two applies a plethora of machine learning techniques to forecast the direction of the U.S. stock market. The notable sophisticated techniques such as regularization, discriminant analysis, classification trees, Bayesian and neural networks were employed. The empirical findings revealed that the discriminant analysis classifiers, classification trees, Bayesian classifiers and penalized binary probit models demonstrate significant outperformance over the binary probit models both statistically and economically, proving significant alternatives to portfolio managers. Chapter three focuses mainly on the application of regression training (RT) techniques to forecast the U.S. equity premium. The RT models demonstrate significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. Chapter four investigates the statistical predictive power and economic significance of financial stock market data by deep learning techniques. Chapter five give the summary, conclusion and present area(s) of further research. The techniques are proven to be robust both statistically and economically when forecasting the equity premium out-of-sample using recursive window method. Overall, the deep learning techniques produced the best result in this thesis. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk

    Essays in high-dimensional nonlinear time series analysis

    Get PDF
    In this thesis, I study high-dimensional nonlinear time series analysis, and its applications in financial forecasting and identifying risk in highly interconnected financial networks. The first chapter is devoted to the testing for nonlinearity in financial time series. I present a tentative classification of the various linearity tests that have been proposed in the literature. Then I investigate nonlinear features of real financial series to determine if the data justify the use of nonlinear techniques, such as those inspired by machine learning theories. In Chapter 3 & 5, I develop forecasting strategies with a high-dimensional panel of predictors while considering nonlinear dynamics. Combining these two elements is a developing area of research. In the third chapter, I propose a nonlinear generalization of the statistical factor models. As a first step, factor estimation, I employ an auto-associative neural network to estimate nonlinear factors from predictors. In the second step, forecasting equation, I apply a nonlinear function -feedforward neural networkon estimated factors for prediction. I show that these features can go beyond covariance analysis and enhance forecast accuracy. I apply this approach to forecast equity returns, and show that capturing nonlinear dynamics between equities significantly improves the quality of forecasts over current univariate and multivariate factor models. In Chapter 5, I propose a high-dimensional learning based on a shrinkage estimation of a backpropagation algorithm for skip-layer neural networks. This thesis emphasizes that linear models can be represented as special cases of these two aforementioned models, which basically means that if there is no nonlinearity between series, the proposed models will reduce to a linear model. This thesis also includes a chapter (chapter 4, with Negar Kiyavash and Seyedjalal Etesami), which in this chapter, we propose a new approach for identifying and measuring systemic risk in financial networks by introducing a nonlinearly modified Granger-causality network based on directed information graphs. The suggested method allows for nonlinearity and has predictive power over future economic activity through a time-varying network of interconnections. We apply the method to the daily returns of U.S. financial Institutions including banks, brokers and insurance companiesto identifythe level of systemic risk inthe financial sector and the contribution of each financial institution

    Boosting functional regression models

    Get PDF
    In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates. In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor. For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}. The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates

    Modeling and Estimation of High-dimensional Vector Autoregressions.

    Full text link
    Vector Autoregression (VAR) represents a popular class of time series models in applied macroeconomics and finance, widely used for structural analysis and simultaneous forecasting of a number of temporally observed variables. Over the years it has gained popularity in the fields of control theory, statistics, economics, finance, genetics and neuroscience. In addition to the "curse of dimensionality" introduced by a quadratically growing dimension of the parameter space, VAR estimation poses considerable challenges due to the temporal and cross-sectional dependence in the data. In the first part of this thesis, we discuss modeling and estimation of high-dimensional VAR from short panels of time series, with applications to reconstruction of gene regulatory network from time course gene expression data. We investigate adaptively thresholded lasso regularized estimation of VAR models and propose a thesholded group lasso regularization framework to incorporate a priori available pathway information in the model. The properties of the proposed methods are assessed both theoretically and via numerical experiments. The study is illustrated on two motivating examples coming from functional genomics and financial econometrics. The second part of this thesis focuses on modeling and estimation of high-dimensional VAR in the traditional time series setting, where one observes a single replicate of a long, stationary time series. We investigate the theoretical properties of l1-regularized and thresholded estimators in high-dimensional VAR, stochastic regression and covariance estimation problems in a non-asymptotic framework. We establish consistency of the estimators under high-dimensional scaling and propose a measure of stability that provides insight into the effect of temporal and cross-sectional dependence on the accuracy of the regularized estimates. We also propose a low-rank plus sparse modeling strategy of high-dimensional VAR in the presence of latent variables. We study the theoretical properties of the proposed estimator in a non-asymptotic framework, establish its estimation consistency under high-dimensional scaling and compare its performance with existing methods via extensive simulation studies.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/109029/1/sumbose_1.pd

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore