Search CORE

4 research outputs found

Efficient Parallel Processing of k-Nearest Neighbor Queries by Using a Centroid-based and Hierarchical Clustering Algorithm

Author: Gavagsaz Elaheh
Publication venue: 'Bilingual Publishing Co.'
Publication date: 26/05/2022
Field of study

The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes. Because of its operation, the application of this classification may be limited to problems with a certain number of instances, particularly, when run time is a consideration. However, the classification of large amounts of data has become a fundamental task in many real-world applications. It is logical to scale the k-Nearest Neighbor method to large scale datasets. This paper proposes a new k-Nearest Neighbor classification method (KNN-CCL) which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts. The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters. The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets. Finally, sets of experiments are conducted on the UCI datasets. The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance

Bilingual Publishing Co. (BPC): E-Journals

"Estimación de la curva de la demanda a corto plazo en función de una onda madre"

Author: Iza Quishpe Andrés Paul
Publication venue
Publication date: 01/02/2021
Field of study

El presente artículo se desarrolla para determinar la curva tipo madre o patrón de una base de datos histórica, que permita estimar el comportamiento de la demanda de consumo a corto plazo de un sistema eléctrico de potencia, mediante la aplicación de la metodología MapReduce (minería de datos) utilizando el programa Matlab, que permite realizar el manejo adecuado de datos históricos. En base a lo indicado, se vuelve preponderante el desarrollo de herramientas que permitan prever el crecimiento y comportamiento de la demanda de un sistema eléctrico, especialmente con el ingreso de generación intermitente distribuida y las diversas cargas industriales y especiales que pueden estar conectadas en los sistemas de distribución. Estas herramientas deben prever el manejo adecuado de una gran cantidad de información, que coadyuve al desarrollo de programas complementarios que les permita a las empresas eléctricas u operadores del sistema a prever la generación necesaria para cumplir con las condiciones de confiablidad y continuidad del suministro eléctrico al usuario final.This article is developed to determine the mother curve or pattern of a historical database, which allows estimating the behavior of consumer demand in the short term of an electrical power system, through the application of the MapReduce methodology (mining of data) using the Matlab program, which allows proper handling of historical data. Based on the above, the development of tools that allow forecasting the growth and behavior of the demand of an electrical system becomes preponderant, especially with the entry of distributed intermittent generation and the various industrial and special loads that may be connected in the systems. of distribution. These tools must provide for the proper handling of a large amount of information, which contributes to the development of complementary programs that allow electricity companies or system operators to predict the generation necessary to meet the conditions of reliability and continuity of the electricity supply to the final user

Repositorio Digital Universidad Politécnica Salesiana

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Author: Moutafis P. Mavrommatis G., Vassilakopoulos M., Sioutas S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Numerous modern applications, from social networking to astronomy, need efficient answering of queries on spatial data. One such query is the All k Nearest-Neighbor Query, or k Nearest-Neighbor Join, that takes as input two datasets and, for each object of the first one, returns the k nearest-neighbors from the second one. It is a combination of the k nearest-neighbor and join queries and is computationally demanding. Especially, when the datasets involved fall in the category of Big Data, a single machine cannot efficiently process it. Only in the last few years, papers proposing solutions for distributed computing environments have appeared in the literature. In this paper, we focus on parallel and distributed algorithms using the Apache Hadoop framework. More specifically, we focus on an algorithm that was recently presented in the literature and propose improvements to tackle three major challenges that distributed processing faces: improvement of load balancing (we implement an adaptive partitioning scheme based on Quadtrees), acceleration of local processing (we prune points during calculations by utilizing plane-sweep processing), and reduction of network traffic (we restructure and reduce the output size of the most demanding phase of computation). Moreover, by using real 2D and 3D datasets, we experimentally study the effect of each improvement and their combinations on performance of this literature algorithm. Experiments show that by carefully addressing the three aforementioned issues, one can achieve significantly better performance. Thereby, we conclude to a new scalable algorithm that adapts to the data distribution and significantly outperforms its predecessor. Moreover, we present an experimental comparison of our algorithm against other well-known MapReduce algorithms for the same query and show that these algorithms are also significantly outperformed. © 2019 Elsevier B.V

University of Thessaly Institutional Repository

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Author: Böhm
Cao
Chatzimilioudis
Chen
Dean
Eisenstein
Eldawy
Eldawy
Emrich
George Mavrommatis
Jiang
Koenig
Lu
Michael Vassilakopoulos
Nodarakis
Panagiotis Moutafis
Roumelis
Roussopoulos
Spyros Sioutas
Tang
Wang
White
Xia
Yokoyama
Yu
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref