59 research outputs found
Harnessing data flow and modelling potentials for sustainable development
Tackling some of the global challenges relating to health, poverty, business and the environment is known to be heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised and information society remains digitally divided. On the African continent, in particular, the division has resulted into a gap between knowledge generation and its transformation into tangible products and services which Kirsop and Chan (2005) attribute to a broken information flow. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the peoples' quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability.
Its main outcomes include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies between the private sector, academic and research institutions within and between countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the peoples' overall quality of life. To void running high implementation costs, selected open source tools are recommended for developing and sustaining the system.
Key words: Cloud Computing, Data Mining, Digital Divide, Globalisation, Grid Computing, Information Society, KTP, Predictive Modelling and STI
Monitoring Sustainable Development Goals Amidst COVID-19 Through Big Data, Deep Learning and Interdisciplinarity
As the coronavirus disease 2019 (COVID–19) ravaged across the globe, in the first half of
2020, the world was once again reminded of the huge gaps in our knowledge, despite our current
scientific and technological capacities. The pandemic has had a severe impact on our ways of life, and
despite its devastating impact, it has presented us with an opportunity for paying greater attention to
the challenges we face. It is in that context that we associate the fight against COVID-19 with monitoring
Sustainable Development Goals (SDG). Considering each SDG as a source of Big Data, we present a
generic framework for combining Big Data, machine learning and interdisciplinarity to address global
challenges. The work delivers descriptive and prescriptive findings, using data visualisation and
animation techniques, on the one hand, and predictive results, based on convolutional neural networks,
on the other. The former is based on structured data on cases and deaths from COVID–19 obtained
from the European Centre for Disease Prevention and Control (ECDC) and data on the impact of the
pandemic on various aspects of life, obtained from the UK Office of National Statistics. Predictive
findings are based on unstructured data–a large COVID–19 X–Ray data, 3181 image files, obtained from
Github and Kaggle. The results from both sets are presented in the form that resonates with cross
disciplinary discussions, opening novel paths for interdisciplinary research in tackling global challenges
A Repeated Sampling and Clustering Method for Intrusion Detection
Various tools, methods and techniques have been developed
in recent years to deal with intrusion detection and ensure
network security. However, despite all these efforts, gaps
remain, apparently due to insufficient data sources on attacks on which to train and test intrusion detection algorithms. We propose a data-flow adaptive method for intrusion detection based on searching through high-dimensional dataset for naturally arising structures. The algorithm is trained on a subset of 82332 observations on 25 numeric variables and one cyber-attack label and tested on another large subset of similar structure. Its novelty derives from iterative estimation of cluster centroids, variability and proportions based on repeated sampling. Data visualisation and numerical results provide a clear separation of a set of variables associated with two types of attacks. We highlight the algorithm’s potential extensions – its allurement to predictive modelling and
adaptation to other dimensional-reduction techniques
An iterative multiple sampling method for intrusion detection
Threats to network security increase with growing volumes and velocity of data across networks, and they present challenges not only to law enforcement agencies, but to businesses, families and individuals. The volume, velocity and veracity of shared data across networks entail accurate and reliable automated tools for filtering out useful from malicious, noisy or irrelevant data. While data mining and machine learning techniques have widely been adopted within the network security community, challenges and gaps in knowledge extraction from data have remained due to insufficient data sources on attacks on which to test the algorithms accuracy and reliability. We propose a data-flow adaptive approach to intrusion detection based on high-dimensional cyber-attacks data. The algorithm repeatedly takes random samples from an inherently bi-modal, high-dimensional dataset of 82,332 observations on 25 numeric and two categorical variables. Its main idea is to capture subtle information resulting from reduced data dimension of a large number of malicious flows and by iteratively estimating roles played by individual variables in construction of key components. Data visualization and numerical results provide a clear separation of a set of variables associated with attack types and show that component-dominating parameters are crucial in monitoring future attacks
Detection of natural structures and classification of HCI-HPR data using robust forward search algorithm
Purpose – The purpose of this paper is to proposes a forward search algorithm for detecting and identifying natural structures arising in human-computer interaction (HCI) and human physiological response (HPR) data.
Design/methodology/approach – The paper portrays aspects that are essential to modelling and precision in detection. The methods involves developed algorithm for detecting outliers in data to recognise natural patterns in incessant data such as HCI-HPR data. The detected categorical data are
simultaneously labelled based on the data reliance on parametric rules to predictive models used in classification algorithms. Data were also simulated based on multivariate normal distribution method and used to compare and validate the original data.
Findings – Results shows that the forward search method provides robust features that are capable of repelling over-fitting in physiological and eye movement data.
Research limitations/implications – One of the limitations of the robust forward search algorithm is that when the number of digits for residuals value is more than the expected size for stack flow, it normally yields an error caution; to counter this, the data sets are normally standardized by taking the logarithmic function of the model before running the algorithm.
Practical implications – The authors conducted some of the experiments at individual residence which may affect environmental constraints.
Originality/value – The novel approach to this method is the detection of outliers for data sets based on the Mahalanobis distances on HCI and HPR. And can also involve a large size of data with p possible parameters. The improvement made to the algorithm is application of more graphical display and rendering of the residual plot
An Ensemble Method for Intrusion Detection with Conformity to Data Variability
The high volume of traffic across modern networks entails use of accurate and reliable automated tools for intrusion detection. The capacity for data mining and machine learning algorithms to learn rules from data are typically constrained by the random nature of training and test data; diversity and disparity of models and related parameters and limitations in data sharing. We propose an ensemble method for intrusion detection which conforms to variability in data. Trained on a high-dimensional 82332x27 data attributes cyber-attack data variables for classification by Decision Trees (DT). Its novelty derives from iterative training and testing several DT models on multiple high-dimensional samples aimed at separating the types of attacks. Unlike Random Forests, the number of variables, p, isn’t altered to enable identification of the importance of predictor variables. It also minimises the influence of multicollinearity and strength of individual trees. Results show that the ensemble model conforms to data variability and yields more insightful predictions on multinomial targets
Statistical analysis of particulate matter data in Doha, Qatar
Pollution in Doha is measured using passive, active and automatic sampling. In this paper we consider data automatically sampled in which various pollutants were continually collected and analysed every hour. At each station the sample is analysed on-line and in real time and the data is stored within the analyser, or a separate logger so it can be downloaded remotely by a modem. The accuracy produced enables pollution episodes to be analysed in detail and related to traffic flows, meteorology and other variables. Data has been collected hourly over more than 6 years at 3 different locations, with measurements available for various pollutants – for example, ozone, nitrogen oxides, sulphur dioxide, carbon monoxide, THC, methane and particulate matter (PM1.0, PM2.5 and PM10), as well as meteorological data such as humidity, temperature, and wind speed and direction. Despite much care in the data collection process, the resultant data has long stretches of missing values, when the equipment has malfunctioned – often as a result of more extreme conditions. Our analysis is twofold. Firstly, we consider ways to “clean” the data, by imputing missing values, including identified outliers. The second aspect specifically considers prediction of each particulate (PM1.0, PM2.5 and PM10) 24 hours ahead, using current (and previous) pollution and meteorological data. In this case, we use vector autoregressive models, compare with decision trees and propose variable selection criteria which explicitly adapt to missing data. Our results show that the regression tree models, with no variable transformations, perform the best, and that attempts to impute missing values are hampered by non-random missingness
A Control System for Detecting Emotions on Visual Interphase Stimulus
Complex dynamic contents of visual stimuli induce implicit reactions in a user. This leads to changes in physiological processes of the user which is referred to as stress. Our goal is to model and produce a system that represents the mechanical interactions of the body and eye movement behavior. We are particularly concerned with the skin conductance response (SCR) and eye fixations to visual stimulus and build a dynamic system that detects stress and its correlates to visual widgets. The process consists of the following modules: (1) a hypothesis generator for suggesting possible structural changes that result from the direct interaction with visual stimulus, (2) an information source for responding to operator querying about users’ interactive and physiological processes, and (3) a continuous system simulator for simulating and illustrating physiological reactions during interaction. This model serves as an infrastructure for modeling physiological processes and could be of benefit in usability laboratory, web developers, and designers of interactive systems, enabling evaluators to visualize interface as a better access to identifying areas that cause stress to users
- …