36 research outputs found

    Optimasi Algoritma K-Means Clustering dengan Parallel Processing menggunakan Framework R

    Get PDF
    Parallel processing sering digunakan untuk melakukan optimasi execution time terhadap algoritma data mining. Pada penelitian ini, parallel processing digunakan untuk melakukan optimasi pada algoritma clustering K-Means. Implementasi algoritma K-means dilakukan dengan memanfaatkan package yang tersedia pada framework R. Algoritma K-Means dijalankan secara serial dan parallel. Untuk mendapatkan persentase optimasi, maka dilakukan perbandingan antara execution time pada parallel processing dan execution time pada serial processing. Penelitian ini menggunakan dataset Boston Housing yang umum digunakan pada data mining. Skenario pengujian dibedakan berdasarkan jumlah core dan jumlah centroid. Hasil pengujian menunjukkan bahwa parallel processing untuk tiap skenario memiliki execution time yang lebih kecil daripada serial processing. Optimasi yang dihasilkan cukup signifikan, yakni bernilai 20% hingga 52%. Optimasi tertinggi didapatkan pada jumlah core terbanyak dan jumlah centroid terbesar

    Computing Fast and Scalable Table Cartograms for Large Tables

    Get PDF
    Given an m x n table T of positive weights and a rectangle R with an area equal to the sum of the weights, a table cartogram computes a partition of R into m x n convex quadrilateral faces such that each face has the same adjacencies as its corresponding cell in T, and has an area equal to the cell's weight. In this thesis, we explored different table cartogram algorithms for a large table with thousands of cells and investigated the potential applications of large table cartograms. We implemented Evans et al.'s table cartogram algorithm that guarantees zero area error and adapted a diffusion-based cartographic transformation approach, FastFlow, to produce large table cartograms. We introduced a constraint optimization-based table cartogram generation technique, TCarto, leveraging the concept of force-directed layout. We implemented TCarto with column-based and quadtree-based parallelization to compute table cartograms for table with thousands of cells. We presented several potential applications of large table cartograms to create the diagrammatic representations in various real-life scenarios, e.g., for analyzing spatial correlations between geospatial variables, understanding clusters and densities in scatterplots, and creating visual effects in images (i.e., expanding illumination, mosaic art effect). We presented an empirical comparison among these three table cartogram techniques with two different real-life datasets: a meteorological weather dataset and a US State-to-State migration flow dataset. FastFlow and TCarto both performed well on the weather data table. However, for US State-to-State migration flow data, where the table contained many local optima with high value differences among adjacent cells, FastFlow generated concave quadrilateral faces. We also investigated some potential relationships among different measurement metrics such as cartographic error (accuracy), the average aspect ratio (the readability of the visualization), computational speed, and the grid size of the table. Furthermore, we augmented our proposed TCarto with angle constraint to enhance the readability of the visualization, conceding some cartographic error, and also inspected the potential relationship of the restricted angles with the accuracy and the readability of the visualization. In the output of the angle constrained TCarto algorithm on US State-to-State migration dataset, it was difficult to identify the rows and columns for a cell upto 20 degree angle constraint, but appeared to be identifiable for more than 40 degree angle constraint

    A microservice architecture for the processing of large geospatial data in the Cloud

    Get PDF
    With the growing number of devices that can collect spatiotemporal information, as well as the improving quality of sensors, the geospatial data volume increases constantly. Before the raw collected data can be used, it has to be processed. Currently, expert users are still relying on desktop-based Geographic Information Systems to perform processing workflows. However, the volume of geospatial data and the complexity of processing algorithms exceeds the capacities of their workstations. There is a paradigm shift from desktop solutions towards the Cloud, which offers virtually unlimited storage space and computational power, but developers of processing algorithms often have no background in computer science and hence no expertise in Cloud Computing. Our research hypothesis is that a microservice architecture and Domain-Specific Languages can be used to orchestrate existing geospatial processing algorithms, and to compose and execute geospatial workflows in a Cloud environment for efficient application development and enhanced stakeholder experience. We present a software architecture that contains extension points for processing algorithms (or microservices), a workflow management component for distributed service orchestration, and a workflow editor based on a Domain-Specific Language. The main aim is to provide both users and developers with the means to leverage the possibilities of the Cloud, without requiring them to have a deep knowledge of distributed computing. In order to conduct our research, we follow the Design Science Research Methodology. We perform an analysis of the problem domain and collect requirements as well as quality attributes for our architecture. To meet our research objectives, we design the architecture and develop approaches to workflow management and workflow modelling. We demonstrate the utility of our solution by applying it to two real-world use cases and evaluate the quality of our architecture based on defined scenarios. Finally, we critically discuss our results. Our contributions to the scientific community can be classified into three pillars. We present a scalable and modifiable microservice architecture for geospatial processing that supports distributed development and has a high availability. Further, we present novel approaches to service integration and orchestration in the Cloud as well as rule-based and dynamic workflow management without a priori design-time knowledge. For the workflow modelling we create a Domain-Specific Language that is based on a novel language design method

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    Methods and Tools for Management of Distributed Event Processing Applications

    Get PDF
    Die Erfassung und Verarbeitung von Ereignissen aus cyber-physischen Systemen bietet Anwendern die Möglichkeit, kontinuierlich über Leistungsdaten und aufkommende Probleme unterrichtet zu werden (Situational Awareness) oder Wartungsprozesse zustandsabhängig zu optimieren (Condition-based Maintenance). Derartige Szenarien verlangen aufgrund der Vielzahl und Frequenz der Daten sowie der Anforderung einer echtzeitnahen Auswertung den Einsatz geeigneter Technologien. Unter dem Namen Event Processing haben sich dabei Technologien etabliert, die in der Lage sind, Datenströme in Echtzeit zu verarbeiten und komplexe Ereignismuster auf Basis räumlicher, zeitlicher oder kausaler Zusammenhänge zu erkennen. Gleichzeitig sind heute in diesem Bereich verfügbare Systeme jedoch noch durch eine hohe technische Komplexität der zugrunde liegenden deklarativen Sprachen gekennzeichnet, die bei der Entwicklung echtzeitfähiger Anwendungen zu langsamen Entwicklungszyklen aufgrund notwendiger technischer Expertise führt. Gerade diese Anwendungen weisen allerdings häufig eine hohe Dynamik in Bezug auf Veränderungen von Anforderungen der zu erkennenden Situationen, aber auch der zugrunde liegenden Sensordaten hinsichtlich ihrer Syntax und Semantik auf. Der primäre Beitrag dieser Arbeit ermöglicht Fachanwendern durch die Abstraktion von technischen Details, selbständig verteilte echtzeitfähige Anwendungen in Form von sogenannten Echtzeit-Verarbeitungspipelines zu erstellen, zu bearbeiten und auszuführen. Die Beiträge der Arbeit lassen sich wie folgt zusammenfassen: 1. Eine Methodik zur Entwicklung echtzeitfähiger Anwendungen unter Berücksichtigung von Erweiterbarkeit sowie der Zugänglichkeit für Fachanwender. 2. Modelle zur semantischen Beschreibung der Charakteristika von Ereignisproduzenten, Ereignisverarbeitungseinheiten und Ereigniskonsumenten. 3. Ein System zur Ausführung von Verarbeitungspipelines bestehend aus geographisch verteilten Ereignisverarbeitungseinheiten. 4. Ein Software-Artefakt zur graphischen Modellierung von Verarbeitungspipelines sowie deren automatisierter Ausführung. Die Beiträge werden in verschiedenen Szenarien aus den Bereichen Produktion und Logistik vorgestellt, angewendet und evaluiert

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Virtual Satellite Network Simulator (VSNeS) - A novel engine to evaluate satellite networks over virtual infrastructure and networks

    Get PDF
    Space has been populated by a wide range of satellite systems from governmental and private space entities. Monolithic satellites have been ruling it by providing a custom design that accomplishes a specific mission. Nevertheless, novel user demands emerged have required global coverage, low revisit time, and ubiquitous service. The possibility to integrate in-orbit infrastructure to support current communications systems has been discussed persistently during the last years. Specifically, the concept of deploying networks composed of aircraft and spacecraft (creating the so-called Non-Terrestrial Networks), has emerged as a potential architecture to satisfy this new demand. This novel concept has enabled to investigate mobile technologies in space infrastructure. For example, this is the case of the Software-Defined Satellite, which aims at managing in-orbit infrastructure by using Software-Defined Network techniques. These novel concepts pose multiple challenges which dedicated developments shall address. Likewise, specific equipment and simulation environments shall support them. Currently, open source satellite network emulators have certain limitations or are not easily accessible. This project aims at presenting the Virtual Satellite Network Simulator, a novel simulation engine capable to represent satellites as well as ground nodes in virtual machines and deploy a virtual network that depicts the channel effects and dynamics. VSNeS has been generated from different modules, that thanks to the joint work is able to generate the virtualization. First of all, a Python3 program has been developed, which works as a manager and is responsible for running the rest of the modules according to the virtualized scenario. Furthermore, Kernel-based Virtual Machine has been implemented for the execution of the virtual machines. The channel management is done with the NetEm emulator. Finally, a graphical user interface is delivered by Cesium. This dissertation presents formally a preliminary design with the essential steps to select each technology. Then, the networking design is also discussed. Different tests are also shown in order to verify the correct functioning of the tool. In addition, tests about the performance of the final release have been performed. The program has been tested with the following protocols in different realistic scenarios: TCP, UDP, and ICMP. This allowed us to verify the correct operation of the program, checking the delays and channel losses. Moreover, it is empirically demonstrated that some protocols are not functional for geostationary satellites, due to the long latency caused by the large distances

    Bioinspired metaheuristic algorithms for global optimization

    Get PDF
    This paper presents concise comparison study of newly developed bioinspired algorithms for global optimization problems. Three different metaheuristic techniques, namely Accelerated Particle Swarm Optimization (APSO), Firefly Algorithm (FA), and Grey Wolf Optimizer (GWO) are investigated and implemented in Matlab environment. These methods are compared on four unimodal and multimodal nonlinear functions in order to find global optimum values. Computational results indicate that GWO outperforms other intelligent techniques, and that all aforementioned algorithms can be successfully used for optimization of continuous functions

    Experimental Evaluation of Growing and Pruning Hyper Basis Function Neural Networks Trained with Extended Information Filter

    Get PDF
    In this paper we test Extended Information Filter (EIF) for sequential training of Hyper Basis Function Neural Networks with growing and pruning ability (HBF-GP). The HBF neuron allows different scaling of input dimensions to provide better generalization property when dealing with complex nonlinear problems in engineering practice. The main intuition behind HBF is in generalization of Gaussian type of neuron that applies Mahalanobis-like distance as a distance metrics between input training sample and prototype vector. We exploit concept of neuron’s significance and allow growing and pruning of HBF neurons during sequential learning process. From engineer’s perspective, EIF is attractive for training of neural networks because it allows a designer to have scarce initial knowledge of the system/problem. Extensive experimental study shows that HBF neural network trained with EIF achieves same prediction error and compactness of network topology when compared to EKF, but without the need to know initial state uncertainty, which is its main advantage over EKF
    corecore