3 research outputs found

    Contribution to Data Science: Time Series, Uncertainty Quantification and Applications

    Get PDF
    Time series analysis is an essential tool in modern world statistical analysis, with a myriad of real data problems having temporal components that need to be studied to gain a better understanding of the temporal dependence structure in the data. For example, in the stock market, it is of significant importance to identify the ups and downs of the stock prices, for which time series analysis is crucial. Most of the existing literature on time series deals with linear time series, or with Gaussianity assumption. However, there are multiple instances where the time series shows nonlinear trends, or when the underlying error structure is non-Gaussian. In such instances, nonlinear time series analysis is essential. That can be achieved by using a nonlinear parametric structure or using nonparametric approaches. In Chapter 2, we have proposed a quadratic prediction procedure that provides a better prediction accuracy when there exists non-linearity or non-Gaussinaity in the time series and a quantification of the amount of prediction gain we obtain using the quadratic prediction. We also provide a characterization of the processes for which the quadratic prediction will always give a better result compared to linear prediction in terms of the bispectra of the underlying process. We have provided ample simulation studies and two real data analyses to substantiate the theoretical results obtained. Chapter 3 deals with polyspectral means, a higher-order version of spectral means, which gives us important insights into a time series under the existence of non-linearity. We have proposed an estimate of the polyspectral mean and derived its asymptotic distribution. We have also proposed a linearity test based on the obtained asymptotic normality result. Finally, we have provided a simulation study and a real-world data analysis to offer possible applications of the polyspectral means in the real-world scenario. The next part of the thesis deals with real data analysis. Chapter 4 is devoted to an election-prediction algorithm, which utilizes hashtag information and the dynamic network structure in social media data and the opinion polls. We proposed two algorithms, one using the network structure (THANOS) and one without (THOS). Both our methods performed better than existing election prediction algorithms. Also, for closely fought elections, the one using the network structure gave much closer predictions than the one without. Chapter 5 involves proposing a bot-detection algorithm for social media data. Inorganic accounts, famously known as bots, are used extensively for spreading malicious information and false propaganda, and it is of significant importance to identify them as quickly as possible. We have extracted several temporal and semantic features and used known machine learning algorithms to identify the inorganic accounts. The final chapter deals with bootstrap in extreme value analysis. Efron’s bootstrap is found to be inconsistent with extreme value theory. It is known that m out of n bootstrap works in this particular scenario when m = o(n). However, not much work has been done in finding the optimal choice of m in the m out of n bootstrap. In Chapter 6, we propose an optimal choice of m which would minimize the convergence rate of the bootstrap. We have given a real-world data analysis using the AQI level of several cities around the world

    Contribution to Data Science: Time Series, Uncertainty Quantification and Applications

    Get PDF
    Time series analysis is an essential tool in modern world statistical analysis, with a myriad of real data problems having temporal components that need to be studied to gain a better understanding of the temporal dependence structure in the data. For example, in the stock market, it is of significant importance to identify the ups and downs of the stock prices, for which time series analysis is crucial. Most of the existing literature on time series deals with linear time series, or with Gaussianity assumption. However, there are multiple instances where the time series shows nonlinear trends, or when the underlying error structure is non-Gaussian. In such instances, nonlinear time series analysis is essential. That can be achieved by using a nonlinear parametric structure or using nonparametric approaches. In Chapter 2, we have proposed a quadratic prediction procedure that provides a better prediction accuracy when there exists non-linearity or non-Gaussinaity in the time series and a quantification of the amount of prediction gain we obtain using the quadratic prediction. We also provide a characterization of the processes for which the quadratic prediction will always give a better result compared to linear prediction in terms of the bispectra of the underlying process. We have provided ample simulation studies and two real data analyses to substantiate the theoretical results obtained. Chapter 3 deals with polyspectral means, a higher-order version of spectral means, which gives us important insights into a time series under the existence of non-linearity. We have proposed an estimate of the polyspectral mean and derived its asymptotic distribution. We have also proposed a linearity test based on the obtained asymptotic normality result. Finally, we have provided a simulation study and a real-world data analysis to offer possible applications of the polyspectral means in the real-world scenario. The next part of the thesis deals with real data analysis. Chapter 4 is devoted to an election-prediction algorithm, which utilizes hashtag information and the dynamic network structure in social media data and the opinion polls. We proposed two algorithms, one using the network structure (THANOS) and one without (THOS). Both our methods performed better than existing election prediction algorithms. Also, for closely fought elections, the one using the network structure gave much closer predictions than the one without. Chapter 5 involves proposing a bot-detection algorithm for social media data. Inorganic accounts, famously known as bots, are used extensively for spreading malicious information and false propaganda, and it is of significant importance to identify them as quickly as possible. We have extracted several temporal and semantic features and used known machine learning algorithms to identify the inorganic accounts. The final chapter deals with bootstrap in extreme value analysis. Efron’s bootstrap is found to be inconsistent with extreme value theory. It is known that m out of n bootstrap works in this particular scenario when m = o(n). However, not much work has been done in finding the optimal choice of m in the m out of n bootstrap. In Chapter 6, we propose an optimal choice of m which would minimize the convergence rate of the bootstrap. We have given a real-world data analysis using the AQI level of several cities around the world

    Contributions to the analysis of discrete-valued time series

    Get PDF
    Dissertação de Doutoramento em Matemática Aplicada apresentada à Faculdade de Ciências da Universidade do PortoSéries temporais de contagem, categóricas ou binárias são exemplos de séries temporaisde valor discreto que aparecem frequentemente na prática. No entanto, uma vezque estas séries apresentam valores que pertencem a conjuntos finitos ou infinitosnumeráveis, os métodos tradicionais utilizados na análise de séries temporais não sãoadequados.Vários modelos para processos estacionários com distribuição marginal discreta têmsido propostos. Um desses modelos, particularmente usado para séries de contagem,são os processos Auto-Regressivos de valor INteiro de ordem p, denotados por INAR(p).Na primeira parte desta tese, os processos INAR são estudados quer no contexto deuma única série temporal, quer de réplicas da mesma série temporal. As principaiscaracterísticas destes modelos são apresentadas e exploradas de modo a obter métodosde estimação mais robustos. Por exemplo, as estatísticas de ordem superior forneceminformação adicional sobre os processos INAR porque estes são processos não lineares.Assim, são propostos dois métodos de estimação baseados em momentos e cumulantesde terceira ordem. Por outro lado, a utilização do critério de Whittle como métodode estimação é justificada através da propriedade mixing dos processos INAR. Mais,o critério automático para selecção de ordem baseado na versão corrigida do Critériode Informação de Akaike (AICC) é estabelecido para processos INAR. Extensos estudosde simulação investigam e comparam o desempenho dos diferentes estimadorespropostos, assim como do critério de selecção de ordem.Dados reais de contagem (duas séries temporais associadas a aplicações médicas eum conjunto de réplicas relacionadas com a astronomia) são analisados considerandoa metodologia subjacente aos modelos INAR. Verifica-se que a classe dos modelosINAR é adequada para a descrição dos dados.Existem séries temporais que apresentam mudanças abruptas, formas descontínuasou cujos valores pertencem a um conjunto finito discreto. Nestes casos, ..
    corecore