9 research outputs found
SLDPC: Towards Second Order Learning for Detecting Persistent Clusters in Data Streams
The main attention of research on data stream clustering algorithms so far has been focused on the adaptation of the algorithms for static datasets to the data streams and improvements of the existing adapted algorithms. Such algorithms fulfil the purpose of the first-order learning from data to clusters. This paper prompts a new question on second-order learning of cluster models from data streams and presents a learning algorithm that detects persistent clusters from consecutive clustering snapshots in data streams. In this work, we first collect a sequence of cluster snapshots as the output clusters at selected query points and then identify the persistent clusters within a given timeframe. The algorithm is evaluated on collections of synthetic datasets. The experimental results have demonstrated the effectiveness of the algorithm in detecting such persistent clusters
A metaheuristic optimization approach for energy efficiency in the IoT networks
© 2020 John Wiley & Sons, Ltd. Recently Internet of Things (IoT) is being used in several fields like smart city, agriculture, weather forecasting, smart grids, waste management, etc. Even though IoT has huge potential in several applications, there are some areas for improvement. In the current work, we have concentrated on minimizing the energy consumption of sensors in the IoT network that will lead to an increase in the network lifetime. In this work, to optimize the energy consumption, most appropriate Cluster Head (CH) is chosen in the IoT network. The proposed work makes use of a hybrid metaheuristic algorithm, namely, Whale Optimization Algorithm (WOA) with Simulated Annealing (SA). To select the optimal CH in the clusters of IoT network, several performance metrics such as the number of alive nodes, load, temperature, residual energy, cost function have been used. The proposed approach is then compared with several state-of-the-art optimization algorithms like Artificial Bee Colony algorithm, Genetic Algorithm, Adaptive Gravitational Search algorithm, WOA. The results prove the superiority of the proposed hybrid approach over existing approaches
Distributed Fog computing for Internet of Things (IoT) based Ambient Data Processing and Analysis
Urban centers across the globe are under immense environmental distress due to an increase in air pollution, industrialization, and elevated living standards. The unmanageable and mushroom growth of industries and an exponential soar in population has made the ascent of air pollution intractable. To this end, the solutions that are based on the latest technologies, such as the Internet of things (IoT) and Artificial Intelligence (AI) are becoming increasingly popular and they have capabilities to monitor the extent and scale of air contaminants and would be subsequently useful for containing them. With centralized cloud-based IoT platforms, the ubiquitous and continuous monitoring of air quality and data processing can be facilitated for the identification of air pollution hot spots. However, owing to the inherent characteristics of cloud, such as large end-to-end delay and bandwidth constraint, handling the high velocity and large volume of data that are generated by distributed IoT sensors would not be feasible in the longer run. To address these issues, fog computing is a powerful paradigm, where the data are processed and filtered near the end of the IoT nodes and it is useful for improving the quality of service (QoS) of IoT network. To further improve the QoS, a conceptual model of distributed fog computing and a machine learning based data processing and analysis model is proposed for the optimal utilization of cloud resources. The proposed model provides a classification accuracy of 99% while using a Support Vector Machines (SVM) classifier. This model is also simulated in iFogSim toolkit. It affords many advantages, such as reduced load on the central server by locally processing the data and reporting the quality of air. Additionally, it would offer the scalability of the system by integrating more air quality monitoring nodes in the IoT network
Data Stream Clustering: A Review
Number of connected devices is steadily increasing and these devices
continuously generate data streams. Real-time processing of data streams is
arousing interest despite many challenges. Clustering is one of the most
suitable methods for real-time data stream processing, because it can be
applied with less prior information about the data and it does not need labeled
instances. However, data stream clustering differs from traditional clustering
in many aspects and it has several challenging issues. Here, we provide
information regarding the concepts and common characteristics of data streams,
such as concept drift, data structures for data streams, time window models and
outlier detection. We comprehensively review recent data stream clustering
algorithms and analyze them in terms of the base clustering technique,
computational complexity and clustering accuracy. A comparison of these
algorithms is given along with still open problems. We indicate popular data
stream repositories and datasets, stream processing tools and platforms. Open
problems about data stream clustering are also discussed.Comment: Has been accepted for publication in Artificial Intelligence Revie
Enhanced non-parametric sequence learning scheme for internet of things sensory data in cloud infrastructure
The Internet of Things (IoT) Cloud is an emerging technology that enables machine-to-machine, human-to-machine and human-to-human interaction through the Internet. IoT sensor devices tend to generate sensory data known for their dynamic and heterogeneous nature. Hence, it makes it elusive to be managed by the sensor devices due to their limited computation power and storage space. However, the Cloud Infrastructure as a Service (IaaS) leverages the limitations of the IoT devices by making its computation power and storage resources available to execute IoT sensory data. In IoT-Cloud IaaS, resource allocation is the process of distributing optimal resources to execute data request tasks that comprise data filtering operations. Recently, machine learning, non-heuristics, multi-objective and hybrid algorithms have been applied for efficient resource allocation to execute IoT sensory data filtering request tasks in IoT-enabled Cloud IaaS. However, the filtering task is still prone to some challenges. These challenges include global search entrapment of event and error outlier detection as the dimension of the dataset increases in size, the inability of missing data recovery for effective redundant data elimination and local search entrapment that leads to unbalanced workloads on available resources required for task execution. In this thesis, the enhancement of Non-Parametric Sequence Learning (NPSL), Perceptually Important Point (PIP) and Efficient Energy Resource Ranking- Virtual Machine Selection (ERVS) algorithms were proposed. The Non-Parametric Sequence-based Agglomerative Gaussian Mixture Model (NPSAGMM) technique was initially utilized to improve the detection of event and error outliers in the global space as the dimension of the dataset increases in size. Then, Perceptually Important Points K-means-enabled Cosine and Manhattan (PIP-KCM) technique was employed to recover missing data to improve the elimination of duplicate sensed data records. Finally, an Efficient Resource Balance Ranking- based Glow-warm Swarm Optimization (ERBV-GSO) technique was used to resolve the local search entrapment for near-optimal solutions and to reduce workload imbalance on available resources for task execution in the IoT-Cloud IaaS platform. Experiments were carried out using the NetworkX simulator and the results of N-PSAGMM, PIP-KCM and ERBV-GSO techniques with N-PSL, PIP, ERVS and Resource Fragmentation Aware (RF-Aware) algorithms were compared. The experimental results showed that the proposed NPSAGMM, PIP-KCM, and ERBV-GSO techniques produced a tremendous performance improvement rate based on 3.602%/6.74% Precision, 9.724%/8.77% Recall, 5.350%/4.42% Area under Curve for the detection of event and error outliers. Furthermore, the results indicated an improvement rate of 94.273% F1-score, 0.143 Reduction Ratio, and with minimum 0.149% Root Mean Squared Error for redundant data elimination as well as the minimum number of 608 Virtual Machine migrations, 47.62% Resource Utilization and 41.13% load balancing degree for the allocation of desired resources deployed to execute sensory data filtering tasks respectively. Therefore, the proposed techniques have proven to be effective for improving the load balancing of allocating the desired resources to execute efficient outlier (Event and Error) detection and eliminate redundant data records in the IoT-based Cloud IaaS Infrastructure
Incremental algorithm for Decision Rule generation in data stream contexts
Actualmente, la ciencia de datos está ganando mucha atención en diferentes sectores.
Concretamente en la industria, muchas aplicaciones pueden ser consideradas. Utilizar
técnicas de ciencia de datos en el proceso de toma de decisiones es una de esas
aplicaciones que pueden aportar valor a la industria. El incremento de la disponibilidad
de los datos y de la aparición de flujos continuos en forma de data streams hace
emerger nuevos retos a la hora de trabajar con datos cambiantes. Este trabajo presenta
una propuesta innovadora, Incremental Decision Rules Algorithm (IDRA), un
algoritmo que, de manera incremental, genera y modifica reglas de decisión para
entornos de data stream para incorporar cambios que puedan aparecer a lo largo del
tiempo. Este método busca proponer una nueva estructura de reglas que busca mejorar
el proceso de toma de decisiones, planteando una base de conocimiento descriptiva y
transparente que pueda ser integrada en una herramienta decisional. Esta tesis describe
la lógica existente bajo la propuesta de IDRA, en todas sus versiones, y propone una
variedad de experimentos para compararlas con un método clásico (CREA) y un
método adaptativo (VFDR). Conjuntos de datos reales, juntamente con algunos
escenarios simulados con diferentes tipos y ratios de error, se utilizan para comparar
estos algoritmos. El estudio prueba que IDRA, específicamente la versión reactiva de
IDRA (RIDRA), mejora la precisión de VFDR y CREA en todos los escenarios, tanto
reales como simulados, a cambio de un incremento en el tiempo.Nowadays, data science is earning a lot of attention in many different sectors.
Specifically in the industry, many applications might be considered. Using data
science techniques in the decision-making process is a valuable approach among the
mentioned applications. Along with this, the growth of data availability and the
appearance of continuous data flows in the form of data stream arise other challenges
when dealing with changing data. This work presents a novel proposal of an algorithm,
Incremental Decision Rules Algorithm (IDRA), that incrementally generates and
modify decision rules for data stream contexts to incorporate the changes that could
appear over time. This method aims to propose new rule structures that improve the
decision-making process by providing a descriptive and transparent base of knowledge
that could be integrated in a decision tool. This work describes the logic underneath
IDRA, in all its versions, and proposes a variety of experiments to compare them with
a classical method (CREA) and an adaptive method (VFDR). Some real datasets,
together with some simulated scenarios with different error types and rates are used to
compare these algorithms. The study proved that IDRA, specifically the reactive
version of IDRA (RIDRA), improves the accuracies of VFDR and CREA in all the
studied scenarios, both real and simulated, in exchange of more time
Feature Papers "Age-Friendly Cities & Communities: State of the Art and Future Perspectives"
The "Age-Friendly Cities & Communities: States of the Art and Future Perspectives" publication presents contemporary, innovative, and insightful narratives, debates, and frameworks based on an international collection of papers from scholars spanning the fields of gerontology, social sciences, architecture, computer science, and gerontechnology. This extensive collection of papers aims to move the narrative and debates forward in this interdisciplinary field of age-friendly cities and communities
Adaptive Clustering for Dynamic IoT Data Streams
The emergence of the Internet of Things (IoT) has led to the production of huge volumes of real-world streaming data. We need effective techniques to process IoT data streams and to gain insights and actionable information from realworld observations and measurements. Most existing approaches are application or domain dependent. We propose a method which determines how many different clusters can be found in a stream based on the data distribution. After selecting the number of clusters, we use an online clustering mechanism to cluster the incoming data from the streams. Our approach remains adaptive to drifts by adjusting itself as the data changes. We benchmark our approach against state-of-the-art stream clustering algorithms on data streams with data drift. We show how our method can be applied in a use case scenario involving near real-time traffic data. Our results allow to cluster, label and interpret IoT data streams dynamically according to the data distribution. This enables to adaptively process large volumes of dynamic data online based on the current situation. We show how our method adapts itself to the changes. We demonstrate how the number of clusters in a real-world data stream can be determined by analysing the data distributions
Adaptive Clustering for Dynamic IoT Data Streams
The emergence of the Internet of Things (IoT) has led to the production of huge volumes of real-world streaming data. We need effective techniques to process IoT data streams and to gain insights and actionable information from realworld
observations and measurements. Most existing approaches are application or domain dependent. We propose a method which determines how many different clusters can be found in a stream based on the data distribution. After selecting the number of clusters, we use an online clustering mechanism to cluster the incoming data from the streams. Our approach remains adaptive to drifts by adjusting itself as the data changes. We benchmark our approach against state-of-the-art stream clustering algorithms on data streams with data drift. We show
how our method can be applied in a use case scenario involving near real-time traffic data. Our results allow to cluster, label and interpret IoT data streams dynamically according to the data distribution. This enables to adaptively process large volumes of dynamic data online based on the current situation. We show
how our method adapts itself to the changes. We demonstrate how the number of clusters in a real-world data stream can be determined by analysing the data distributions