460 research outputs found

    Feature selection and extraction in spatiotemporal traffic forecasting: a systematic literature review

    Get PDF
    A spatiotemporal approach that simultaneously utilises both spatial and temporal relationships is gaining scientific interest in the field of traffic flow forecasting. Accurate identification of the spatiotemporal structure (dependencies amongst traffic flows in space and time) plays a critical role in modern traffic forecasting methodologies, and recent developments of data-driven feature selection and extraction methods allow the identification of complex relationships. This paper systematically reviews studies that apply feature selection and extraction methods for spatiotemporal traffic forecasting. The reviewed bibliographic database includes 211 publications and covers the period from early 1984 to March 2018. A synthesis of bibliographic sources clarifies the advantages and disadvantages of different feature selection and extraction methods for learning the spatiotemporal structure and discovers trends in their applications. We conclude that there is a clear need for development of comprehensive guidelines for selecting appropriate spatiotemporal feature selection and extraction methods for urban traffic forecasting. Document type: Articl

    Effect of traffic dataset on various machine-learning algorithms when forecasting air quality

    Get PDF
    © Emerald Publishing Limited. This is the accepted manuscript version of an article which has been published in final form at https://10.1108/JEDT-10-2021-0554Purpose (limit 100 words) Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic datasets on air quality predictions has not been clearly investigated. This research investigates the effects traffic dataset have on the performance of Machine Learning (ML) predictive models in air quality prediction. Design/methodology/approach (limit 100 words) To achieve this, we have set up an experiment with the control dataset having only the Air Quality (AQ) dataset and Meteorological (Met) dataset. While the experimental dataset is made up of the AQ dataset, Met dataset and Traffic dataset. Several ML models (such as Extra Trees Regressor, eXtreme Gradient Boosting Regressor, Random Forest Regressor, K-Neighbors Regressor, and five others) were trained, tested, and compared on these individual combinations of datasets to predict the volume of PM2.5, PM10, NO2, and O3 in the atmosphere at various time of the day. Findings (limit 100 words) The result obtained showed that various ML algorithms react differently to the traffic dataset despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%. Research limitations/implications (limit 100 words) This research is limited in terms of the study area and the result cannot be generalized outside of the UK as many conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research. Therefore, leaving out a few other ML algorithms. Practical implications (limit 100 words) This study reinforces the belief that the traffic dataset has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form traffic dataset in the development of an air quality prediction model. This implies that developers and researchers in air quality prediction need to identify the ML algorithms that behave in their best interest before implementation. Originality/value (limit 100 words) This will enable researchers to focus more on algorithms of benefit when using traffic datasets in air quality prediction.Peer reviewe

    Deep Sequence Learning with Auxiliary Information for Traffic Prediction

    Get PDF
    Predicting traffic conditions from online route queries is a challenging task as there are many complicated interactions over the roads and crowds involved. In this paper, we intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information. We do this within an encoder-decoder sequence learning framework that integrates the following data: 1) offline geographical and social attributes. For example, the geographical structure of roads or public social events such as national celebrations; 2) road intersection information. In general, traffic congestion occurs at major junctions; 3) online crowd queries. For example, when many online queries issued for the same destination due to a public performance, the traffic around the destination will potentially become heavier at this location after a while. Qualitative and quantitative experiments on a real-world dataset from Baidu have demonstrated the effectiveness of our framework.Comment: KDD 2018. The first two authors share equal contribution

    A Framework for Credit Risk Prediction Using the Optimized-FKSVR Machine Learning Classifier

    Get PDF
    Transparency is influenced by several crucial factors, such as credit risk (CR) predictions, model reliability, efficient loan processing, etc. The emergence of machine learning (ML) techniques provides a promising solution to address these challenges. However, it is the responsibility of banking or nonbanking organizations to control their approach to incorporate this innovative methodology to mitigate human preferences in loan decision-making. The research article presents the Optimized-Feature based Kernel Support Vector Regression (O-FKSVR) model which is an ML-based CR analysis model in the digital banking. This proposal aims to compare several ML methods to identify a precise model for CR assessment using real credit database information. The goal is to introduce a classification model that uses a hybrid of Stochastic Gradient Descent (SGD) and firefly optimization (FFO) methods with Support Vector Regression (SVR) to predict credit risks in the form of probability, loss given, and exposure at defaults. The proposed  O-FKSVR model extracts features and predicts outcomes based on data gathered from online credit analysis. The proposed O-FKSVR model has increased the accuracy rate and resolved the existing problems. The experimental study is conducted in Python, and the results demonstrate improvements in accuracy, precision, and reduced error rates compared to previous ML methods. The proposed O-FKSVR model has achieved a maximum accuracy rate value of 0.955%, precision value of 0.96%, and recall value of 0.952%, error rate value of 4.4 when compared with the existing models such as SVR, DT, RF, and AdaBoost.&nbsp

    Mecanismos para controlo e gestão de redes 5G: redes de operador

    Get PDF
    In 5G networks, time-series data will be omnipresent for the monitoring of network metrics. With the increase in the number of Internet of Things (IoT) devices in the next years, it is expected that the number of real-time time-series data streams increases at a fast pace. To be able to monitor those streams, test and correlate different algorithms and metrics simultaneously and in a seamless way, time-series forecasting is becoming essential for the pro-active successful management of the network. The objective of this dissertation is to design, implement and test a prediction system in a communication network, that allows integrating various networks, such as a vehicular network and a 4G operator network, to improve the network reliability and Quality-of-Service (QoS). To do that, the dissertation has three main goals: (1) the analysis of different network datasets and implementation of different approaches to forecast network metrics, to test different techniques; (2) the design and implementation of a real-time distributed time-series forecasting architecture, to enable the network operator to make predictions about the network metrics; and lastly, (3) to use the forecasting models made previously and apply them to improve the network performance using resource management policies. The tests done with two different datasets, addressing the use cases of congestion management and resource splitting in a network with a limited number of resources, show that the network performance can be improved with proactive management made by a real-time system able to predict the network metrics and act on the network accordingly. It is also done a study about what network metrics can cause reduced accessibility in 4G networks, for the network operator to act more efficiently and pro-actively to avoid such eventsEm redes 5G, séries temporais serão omnipresentes para a monitorização de métricas de rede. Com o aumento do número de dispositivos da Internet das Coisas (IoT) nos próximos anos, é esperado que o número de fluxos de séries temporais em tempo real cresça a um ritmo elevado. Para monitorizar esses fluxos, testar e correlacionar diferentes algoritmos e métricas simultaneamente e de maneira integrada, a previsão de séries temporais está a tornar-se essencial para a gestão preventiva bem sucedida da rede. O objetivo desta dissertação é desenhar, implementar e testar um sistema de previsão numa rede de comunicações, que permite integrar várias redes diferentes, como por exemplo uma rede veicular e uma rede 4G de operador, para melhorar a fiabilidade e a qualidade de serviço (QoS). Para isso, a dissertação tem três objetivos principais: (1) a análise de diferentes datasets de rede e subsequente implementação de diferentes abordagens para previsão de métricas de rede, para testar diferentes técnicas; (2) o desenho e implementação de uma arquitetura distribuída de previsão de séries temporais em tempo real, para permitir ao operador de rede efetuar previsões sobre as métricas de rede; e finalmente, (3) o uso de modelos de previsão criados anteriormente e sua aplicação para melhorar o desempenho da rede utilizando políticas de gestão de recursos. Os testes efetuados com dois datasets diferentes, endereçando os casos de uso de gestão de congestionamento e divisão de recursos numa rede com recursos limitados, mostram que o desempenho da rede pode ser melhorado com gestão preventiva da rede efetuada por um sistema em tempo real capaz de prever métricas de rede e atuar em conformidade na rede. Também é efetuado um estudo sobre que métricas de rede podem causar reduzida acessibilidade em redes 4G, para o operador de rede atuar mais eficazmente e proativamente para evitar tais acontecimentos.Mestrado em Engenharia de Computadores e Telemátic

    Representation learning in finance

    Get PDF
    Finance studies often employ heterogeneous datasets from different sources with different structures and frequencies. Some data are noisy, sparse, and unbalanced with missing values; some are unstructured, containing text or networks. Traditional techniques often struggle to combine and effectively extract information from these datasets. This work explores representation learning as a proven machine learning technique in learning informative embedding from complex, noisy, and dynamic financial data. This dissertation proposes novel factorization algorithms and network modeling techniques to learn the local and global representation of data in two specific financial applications: analysts’ earnings forecasts and asset pricing. Financial analysts’ earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to fully utilize this type of data due to the missing values. This work proposes one matrix-based algorithm, “Coupled Matrix Factorization,” and one tensor-based algorithm, “Nonlinear Tensor Coupling and Completion Framework,” to impute missing values in analysts’ earnings forecasts and then use the imputed data to predict firms’ future earnings. Experimental analysis shows that missing value imputation and representation learning by coupled matrix/tensor factorization from the observed entries improve the accuracy of firm earnings prediction. The results confirm that representing financial time-series in their natural third-order tensor form improves the latent representation of the data. It learns high-quality embedding by overcoming information loss of flattening data in spatial or temporal dimensions. Traditional asset pricing models focus on linear relationships among asset pricing factors and often ignore nonlinear interaction among firms and factors. This dissertation formulates novel methods to identify nonlinear asset pricing factors and develops asset pricing models that capture global and local properties of data. First, this work proposes an artificial neural network “auto enco der” based model to capture the latent asset pricing factors from the global representation of an equity index. It also shows that autoencoder effectively identifies communal and non-communal assets in an index to facilitate portfolio optimization. Second, the global representation is augmented by propagating information from local communities, where the network determines the strength of this information propagation. Based on the Laplacian spectrum of the equity market network, a network factor “Z-score” is proposed to facilitate pertinent information propagation and capture dynamic changes in network structures. Finally, a “Dynamic Graph Learning Framework for Asset Pricing” is proposed to combine both global and local representations of data into one end-to-end asset pricing model. Using graph attention mechanism and information diffusion function, the proposed model learns new connections for implicit networks and refines connections of explicit networks. Experimental analysis shows that the proposed model incorporates information from negative and positive connections, captures the network evolution of the equity market over time, and outperforms other state-of-the-art asset pricing and predictive machine learning models in stock return prediction. In a broader context, this is a pioneering work in FinTech, particularly in understanding complex financial market structures and developing explainable artificial intelligence models for finance applications. This work effectively demonstrates the application of machine learning to model financial networks, capture nonlinear interactions on data, and provide investors with powerful data-driven techniques for informed decision-making
    corecore