6 research outputs found
Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD
On Predicting Election Results using Twitter and Linked Open Data: The Case of the UK 2010 Election
The analysis of Social Media data enables eliciting public behaviour and opinion. In this context, a number of studies have recently explored Social Media's capability to predict the outcome of real-world phenomena. The results of these studies are controversial with elections being the most disputable phenomenon. The objective of this paper is to present a case of predicting the results of the UK 2010 through Twitter. In particular, we study to what extend it is possible to use Twitter data to accurately predict the percentage of votes of the three most prominent political parties namely the Conservative Party, Liberal Democrats, and the Labour Party. The approach we follow capitalises on (a) a theoretical Social Media data analysis framework for predictions and (b) Linked Open Data to enrich Twitter data. We extensively discuss each step of the framework to emphasise on the details that could affect the prediction accuracy.We anticipate that this paper will contribute to the ongoing discussion of understanding to what extend and under which circumstances election results are predictable through Social Media
Open Statistics: The Rise of a New Era for Open Data?
Part 1: E-Government FoundationsInternational audienceA large part of open data concerns statistics, such as demographic, economic and social data (henceforth referred to as Open Statistical Data, OSD). In this paper we start by introducing open data fragmentation as a major obstacle for OSD reuse. We proceed by outlining data cube as a logical model for structuring OSD. We then introduce Open Statistics as a new area aiming to systematically study OSD. Open Statistics reuse and extends methods from diverse fields like Open Data, Statistics, Data Warehouses and the Semantic Web. In this paper, we focus on benefits and challenges of Open Statistics. The results suggest that Open Statistics provide benefits not present in any of these fields alone. We conclude that in certain cases OSD can realise the potential of open data
Applying Brand Equity Theory to Understand Consumer Opinion in Social Media
Billions of people everyday use Social Media (SM), such as Facebook and Twitter, to express their opinions and experiences with brands. Companies are highly interested in understanding such SM brand-related content. Consequently, many studies have been conducted and many applications have been developed to analyse this content. For analysis purposes, the main SM metrics used include volume and sentiment. Interestingly, however, brand equity theory proposes different metrics for assessing brand reputation. These include brand image, brand satisfaction and purchase intention (henceforth referred to as marketing metrics). The objective of this paper is to explore the feasibility of applying marketing metrics in Twitter brand-related content. For this purpose, we collect, study and analyse tweets that mention two brands, namely IKEA and Gatorade. The manual analysis suggests that a significant amount of brand tweets is related to brand image, brand satisfaction and purchase intention. We thereafter design an algorithm that classifies tweets into relevant categories to enable automatic marketing metrics computation. We implement the algorithm using statistical learning approaches and prove that its classification accuracy is good. We anticipate that this article will motivate other studies as well as applications' designers in adopting marketing theories when evaluating brand reputation through SM content
Graph Neural Networks and Open-Government Data to Forecast Traffic Flow
Traffic forecasting has been an important area of research for several decades, with significant implications for urban traffic planning, management, and control. In recent years, deep-learning models, such as graph neural networks (GNN), have shown great promise in traffic forecasting due to their ability to capture complex spatio–temporal dependencies within traffic networks. Additionally, public authorities around the world have started providing real-time traffic data as open-government data (OGD). This large volume of dynamic and high-value data can open new avenues for creating innovative algorithms, services, and applications. In this paper, we investigate the use of traffic OGD with advanced deep-learning algorithms. Specifically, we deploy two GNN models—the Temporal Graph Convolutional Network and Diffusion Convolutional Recurrent Neural Network—to predict traffic flow based on real-time traffic OGD. Our evaluation of the forecasting models shows that both GNN models outperform the two baseline models—Historical Average and Autoregressive Integrated Moving Average—in terms of prediction performance. We anticipate that the exploitation of OGD in deep-learning scenarios will contribute to the development of more robust and reliable traffic-forecasting algorithms, as well as provide innovative and efficient public services for citizens and businesses
Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot
Large language models possess tremendous natural language understanding and generation abilities. However, they often lack the ability to discern between fact and fiction, leading to factually incorrect responses. Open Government Data are repositories of, often times linked, information that is freely available to everyone. By combining these two technologies in a proof of concept designed application utilizing the GPT3.5 OpenAI model and the Scottish open statistics portal, we show that not only is it possible to augment the large language model's factuality of responses, but also propose a novel way to effectively access and retrieve statistical information from the data portal just through natural language querying. We anticipate that this paper will trigger a discussion regarding the transformation of Open Government Portals through large language models