563,107 research outputs found

    Model selection for time series of count data

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordSelecting between competing statistical models is a challenging problem especially when the competing models are non-nested. An effective algorithm is developed in a Bayesian framework for selecting between a parameter-driven autoregressive Poisson regression model and an observationdriven integer valued autoregressive model when modeling time series count data. In order to achieve this a particle MCMC algorithm for the autoregressive Poisson regression model is introduced. The particle filter underpinning the particle MCMC algorithm plays a key role in estimating the marginal likelihood of the autoregressive Poisson regression model via importance sampling and is also utilised to estimate the DIC. The performance of the model selection algorithms are assessed via a simulation study. Two real-life data sets, monthly US polio cases (1970-1983) and monthly benefit claims from the logging industry to the British Columbia Workers Compensation Board (1985-1994) are successfully analysed

    Model selection for time series of count data

    Get PDF
    Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. An effective algorithm is developed in a Bayesian framework for selecting between a parameter-driven autoregressive Poisson regression model and an observation-driven integer valued autoregressive model when modeling time series count data. In order to achieve this a particle MCMC algorithm for the autoregressive Poisson regression model is introduced. The particle filter underpinning the particle MCMC algorithm plays a key role in estimating the marginal likelihood of the autoregressive Poisson regression model via importance sampling and is also utilised to estimate the DIC. The performance of the model selection algorithms are assessed via a simulation study. Two real-life data sets, monthly US polio cases (1970-1983) and monthly benefit claims from the logging industry to the British Columbia Workers Compensation Board (1985-1994) are successfully analysed

    Vocational Training and Innovation.

    Get PDF
    Human capital is considered as one of the main inputs in economic growth. Human capital can generate endogenous growth thanks to a continuous process of knowledge and externalities accumulation (Aghion and Howitt, 1998). In that context, this paper explores the relationship between innovation and vocational training. Our methodological approach allows to contribute to the literature in three manners. First, we propose different indicators of vocational training. Second, we build a count data panel with a long time data series. This deals with the issue of non-random selection and potentially with measurement error from short panels. Finally, we explicitly allow for endogeneity and fixed effects using GMM techniques. Estimations are made on a panel data set relative to French industrial firms over the period 1986-1992. Our results indicate that whatever the indicators, vocational training has a positive impact on the technological innovation.R&D; count panel data; training; patents; linear feedback model;

    Modeling count time series following generalized linear models

    Get PDF
    Count time series are found in many different applications, e.g. from medicine, finance or industry, and have received increasing attention in the last two decades. The class of count time series following generalized linear models is very flexible and can describe serial correlation in a parsimonious way. The conditional mean of the observed process is linked to its past values, to past observations and to potential covariate effects. In this thesis we give a comprehensive formulation of this model class. We consider models with the identity and with the logarithmic link function. The conditional distribution can be Poisson or Negative Binomial. An important special case of this class is the so-called INGARCH model and its log-linear extension.A key contribution of this thesis is the R package tscount which provides likelihood-based estimation methods for analysis and modeling of count time series based on generalized linear models. The package includes methods for model fitting and assessment, prediction and intervention analysis. This thesis summarizes the theoretical background of these methods. It gives details on the implementation of the package and provides simulation results for models which have not been studied theoretically before. The usage of the package is illustrated by two data examples. Additionally, we provide a review of R packages which can be used for count time series analysis. A detailed comparison of tscount to those packages demonstrates that tscount is an important contribution which extends and complements existing software. A thematic focus of this thesis is the treatment of all kinds of unusual effects influencing the ordinary pattern of the data. This includes structural changes and different forms of outliers one is faced with in many time series. Our first study on this topic is concerned with retrospective detection of such changes. We analyze different approaches for modeling such intervention effects in count time series based on INGARCH models. Other authors treated a model where an intervention affects the non-observable underlying mean process at the time point of its occurrence and additionally the whole process thereafter via its dynamics. As an alternative, we consider a model where an intervention directly affects the observation at its occurrence, but not the underlying mean, and then also enters the dynamics of the process. While the former definition describes an internal change of the system, the latter can be understood as an external effect on the observations due to e.g. immigration. For our alternative model we develop conditional likelihood estimation and, based on this, develop tests and detection procedures for intervention effects. Both models are compared analytically and using simulated and real data examples. The procedures for our new model work reliably and we find some robustness against misspecification of the intervention model. The aforementioned methods are applied after the complete time series has been observed. In another study we investigate the prospective detection of structural changes, i.e. in real time. For example in public health, surveillance of infectious diseases aims at recognizing outbreaks of epidemics with only short time delays in order to take adequate action promptly. We point out that serial dependence is present in many infectious disease time series. Nevertheless it is still ignored by many procedures used for infectious disease surveillance. Using historical data, we design a prediction-based monitoring procedure for count time series following generalized linear models. We illustrate benefits but also pitfalls of using dependence models for monitoring.Moreover, we briefly review the literature on model selection, robust estimation and robust prediction for count time series. We also make a first study on robust model identification using robust estimators of the (partial) autocorrelation

    Analysis of Financial Data using a Difference-Poisson Autoregressive Model

    Get PDF
    Box and Jenkins methodologies have massively contributed to the analysis of time series data. However, the assumptions used in these methods impose constraints on the type of the data. As a result, difficulties arise when we apply those tools to a more generalized type of data (e.g. count, categorical or integer-valued data) rather than the classical continuous or more specifically Gaussian type. Papers in the literature proposed alternate methods to model discrete-valued time series data, among these methods is Pegram's operator (1980). We use this operator to build an AR(p) model for integer-valued time series (including both positive and negative integers). The innovations follow the differenced Poisson distribution, or Skellam distribution. While the model includes the usual AR(p) correlation structure, it can be made more general. In fact, the operator can be extended in a way where it is possible to have components which contribute to positive correlation, while at the same time have components which contribute to negative correlation. As an illustration, the process is used to model the change in a stock’s price, where three variations are presented: Variation I, Variation II and Variation III. The first model disregards outliers; however, the second and third include large price changes associated with the effect of large volume trades and market openings. Parameters of the model are estimated using Maximum Likelihood methods. We use several model selection criteria to select the best order for each variation of the model as well as to determine which is the best variation of the model. The most adequate order for all the variations of the model is AR(3)AR(3). While the best fit for the data is Variation II, residuals' diagnostic plots suggest that Variation III represents a better correlation structure for the model

    Observation-driven models for discrete-valued time series

    Get PDF
    Statistical inference for discrete-valued time series has not been developed like traditional methods for time series generated by continuous random variables. Some relevant models exist, but the lack of a homogenous framework raises some critical issues. For instance, it is not trivial to explore whether models are nested and it is quite arduous to derive stochastic properties which simultaneously hold across different specifications. In this paper, inference for a general class of first order observation-driven models for discrete-valued processes is developed. Stochastic properties such as stationarity and ergodicity are derived under easy-to-check conditions, which can be directly applied to all the models encompassed in the class and for every distribution which satisfies mild moment conditions. Consistency and asymptotic normality of quasi-maximum likelihood estimators are established, with the focus on the exponential family. Finite sample properties and the use of information criteria for model selection are investigated throughout Monte Carlo studies. An empirical application to count data is discussed, concerning a test-bed time series on the spread of an infection

    Feature selection and modelling methods for microarray data from acute coronary syndrome

    Get PDF
    Acute coronary syndrome (ACS) represents a leading cause of mortality and morbidity worldwide. Providing better diagnostic solutions and developing therapeutic strategies customized to the individual patient represent societal and economical urgencies. Progressive improvement in diagnosis and treatment procedures require a thorough understanding of the underlying genetic mechanisms of the disease. Recent advances in microarray technologies together with the decreasing costs of the specialized equipment enabled affordable harvesting of time-course gene expression data. The high-dimensional data generated demands for computational tools able to extract the underlying biological knowledge. This thesis is concerned with developing new methods for analysing time-course gene expression data, focused on identifying differentially expressed genes, deconvolving heterogeneous gene expression measurements and inferring dynamic gene regulatory interactions. The main contributions include: a novel multi-stage feature selection method, a new deconvolution approach for estimating cell-type specific signatures and quantifying the contribution of each cell type to the variance of the gene expression patters, a novel approach to identify the cellular sources of differential gene expression, a new approach to model gene expression dynamics using sums of exponentials and a novel method to estimate stable linear dynamical systems from noisy and unequally spaced time series data. The performance of the proposed methods was demonstrated on a time-course dataset consisting of microarray gene expression levels collected from the blood samples of patients with ACS and associated blood count measurements. The results of the feature selection study are of significant biological relevance. For the first time is was reported high diagnostic performance of the ACS subtypes up to three months after hospital admission. The deconvolution study exposed features of within and between groups variation in expression measurements and identified potential cell type markers and cellular sources of differential gene expression. It was shown that the dynamics of post-admission gene expression data can be accurately modelled using sums of exponentials, suggesting that gene expression levels undergo a transient response to the ACS events before returning to equilibrium. The linear dynamical models capturing the gene regulatory interactions exhibit high predictive performance and can serve as platforms for system-level analysis, numerical simulations and intervention studies

    Clozapine, neutropenia and Covid-19: should clinicians be concerned? 3 months report

    Get PDF
    © 2021 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Background: Clozapine is among the most effective antipsychotics used for treatment resistant schizophrenia. Adverse reactions to clozapine include neutropenia. In March 2020, at the start of the Coronavirus -19 pandemic, clinicians raised concerns regarding continuation of antipsychotic treatment, and specifically of clozapine, in patients with coronavirus disease. We aimed here at providing a short report focusing on the association between neutropenia and clozapine in a case series of psychiatric inpatients diagnosed with COVID-19.Patients & methods: We retrospectively inspected data of 10 patients on clozapine, admitted to Highgate Mental Health Centre, Camden & Islington NHS Foundation Trust, between March and July 2020; selection was based on their COVID-19 positive PCR test. We used a linear regression model to estimate whether there was a significant drop in the neutrophil count during SARS-CoV-2 infection.The analysis was done in R using a linear regression to the origin.Results: Data were collected on 10 patients, of which 7 were males. During COVID-19 infection, neutrophils' count (ANC) was 4.13 ​× ​10 9/l (SD ​= ​2.70) which constituted a significant drop from a baseline value of 5.2 ​× ​10 9/l (SD ​= ​2.24). The mean relative reduction in ANC was -0.2729 (SD ​= ​0.1666). The beta value of 0.8377 obtained with the linear regression showed that ANC values during SARS-CoV-2 infection were 83.77% of the baseline ANC showing that within the two time points there was a decrease of 16.23%. The linear regression had a pvalue ​= ​8.96 ​× ​10 -8 and an adjusted R 2 of 95.94% which shows that the variability of the data is very well explained by the model. We also compared baseline ANC with ANC values approximately a month after resolution of the infection and results indicate that ANC values return to a 95% of baseline. Conclusions: Clinicians should bear in mind that a significant drop in neutrophils' count may occur in patients taking clozapine and affected from a SARS-CoV-2 infectionand that this drop is only transitory.Peer reviewe

    Modeling Multiple-Subject and Discrete-Valued High-Dimensional Time Series

    Get PDF
    This thesis focuses on two separate topics in modeling of high-dimensional time series (HDTS) with several structures and their various applications. The first topic is on modeling HDTS from multiple subjects. Here, the structures of interest include model components that are shared by all subjects and that are individual to subjects or their groups. A running theme in this modeling is the heterogeneity of subjects. Dealing with heterogeneous data has been of particular interest recently in social, health, behavioral, and other sciences. The second topic is on modeling HDTS that are discrete-valued, including binary, categorical, and non-negative count observations. Compared with continuous time series modeling where autoregressive-type models dominate, there are no generally preferred models in the discrete setting. The models considered in this thesis are based on latent Gaussian processes, which drive the dynamics of the observed discrete-valued series. The models have the advantages of allowing negative autocorrelations, and flexible choices of marginal distributions of discrete observations. The thesis consists of four projects, with two on each topic. The first project proposes a stratified Lasso (multi-task learning) formulation for vector autoregressive (VAR) models from multiple subjects. The VAR transition matrices are decomposed additively into the common components shared across all subjects and individual components specific to each subject. An efficient estimation procedure combined with cross-validation for several tuning parameters is designed. The simulation study shows that the approach performs well in the presence of heterogeneity across individual dynamics for the different levels of sparsity. The model is applied to intensive longitudinal data of the emotional states to reveal common and individual temporal dependences of daily emotions across study participants. The proposed model enhances interpretability and forecasting performance, which are expected to be beneficial in assessing conflicting evidence from empirical studies and establishing universal explanations of the studied phenomenon. The second project develops integrative dynamic factor models (DFMs) for multiple subjects in several groups. The models have components that allow one to explore the inter-differences across subjects (and groups). At the same time, the intra-differences can be investigated by reconstructing the individual temporal dynamics of different subjects. A flexible identifiability condition on the factor covariance is adopted, which expands the scope of heterogeneity and contributes to better model interpretation and forecasting results. From a methodological standpoint, a novel algorithm that combines non-iterative block segmentation, efficient rank selection, and variants of PCA for multiple subjects, is suggested. Simulations under various scenarios and analysis of resting-state functional MRI data collected from multiple subjects are conducted. The third project concerns latent Gaussian DFMs for count HDTS. The proposed estimation procedure combines the classical PCA, Yule-Walker equations, and link functions, which are pairwise mappings of the second-order properties of the latent and observed time series. The forecasting is carried out through a particle-based sequential Monte Carlo method, which approximates predictions of counts, driven by the latent DFM generated through Kalman recursions. Simulation results reveal that the estimation approach performs similarly to the usual DFMs, and the model provides better forecasting results than the considered benchmarks. The model is applied to item response data from psychology, where the existence of latent factors has been verified but their temporal dependence has not been studied yet. The fourth project considers the analogous models for count HDTS but where the latent Gaussian time series follows a sparse VAR. A penalized estimation procedure based on Lasso and its adaptive form is explored for latent Gaussian VAR. An alternative proposed formulation leverages the second-order properties of the latent process directly. Along with the estimation of link functions, we suggest a data-splitting strategy, which can select tuning parameters for penalization. Simulations under various marginal count distributions and patterns of transition matrices are performed. A data example of major depressive disorder in psychiatry is considered to illustrate the modeling approach.Doctor of Philosoph
    corecore