Search CORE

1,042 research outputs found

Map-Reduce for Processing GPS Data from Public Transport in Montevideo, Uruguay

Author: Massobrio Renzo
Nesmachnow Sergio
Pías Andrés
Vázquez Nicolás
Publication venue
Publication date: 17/11/2016
Field of study

This article addresses the problem of processing large volumes of historical GPS data from buses to compute quality-of-service metrics for urban transportation systems. We designed and implemented a solution to distribute the data processing on multiple processing units in a distributed computing infrastructure. For the experimental analysis we used historical data from Montevideo, Uruguay. The proposed solution scales properly when processing large volumes of input data, achieving a speedup of up to 22× when using 24 computing resources. As case studies, we used the historical data to compute the average speed of bus lines in Montevideo and identify troublesome locations, according to the delay and deviation of the times to reach each bus stop. Similar studies can be used by control authorities and policy makers to get an insight of the transportation system and improve the quality of service.Sociedad Argentina de Informática e Investigación Operativa (SADIO

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Author: Beckman Pete
Bicer Tekin
Iskra Kamil
Jin Sian
Sun Baixi
Tao Dingwen
Tian Jiannan
Yu Xiaodong
Zhang Chengming
Zhou Tao
Publication venue
Publication date: 03/11/2022
Field of study

CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.Comment: 14 pages, 15 figures, 5 tables, submitted to VLDB '2

arXiv.org e-Print Archive

Classification algorithms for Big Data with applications in the urban security domain

Author: Venturini Luca
Publication venue: Politecnico di Torino
Publication date
Field of study

A classification algorithm is a versatile tool, that can serve as a predictor for the future or as an analytical tool to understand the past. Several obstacles prevent classification from scaling to a large Volume, Velocity, Variety or Value. The aim of this thesis is to scale distributed classification algorithms beyond current limits, assess the state-of-practice of Big Data machine learning frameworks and validate the effectiveness of a data science process in improving urban safety. We found in massive datasets with a number of large-domain categorical features a difficult challenge for existing classification algorithms. We propose associative classification as a possible answer, and develop several novel techniques to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. The experiments, run on a real large-scale dataset with more than 4 billion records, confirmed the quality of the approach. To assess the state-of-practice of Big Data machine learning frameworks and streamline the process of integration and fine-tuning of the building blocks, we developed a generic, self-tuning tool to extract knowledge from network traffic measurements. The result is a system that offers human-readable models of the data with minimal user intervention, validated by experiments on large collections of real-world passive network measurements. A good portion of this dissertation is dedicated to the study of a data science process to improve urban safety. First, we shed some light on the feasibility of a system to monitor social messages from a city for emergency relief. We then propose a methodology to mine temporal patterns in social issues, like crimes. Finally, we propose a system to integrate the findings of Data Science on the citizenry’s perception of safety and communicate its results to decision makers in a timely manner. We applied and tested the system in a real Smart City scenario, set in Turin, Italy

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Spatiotemporal data partitioning for distributed random forest algorithm:Air quality prediction using imbalanced big spatiotemporal data on spark distributed framework

Author: Asgari Marjan
Farnaghi M.
Yang Wanhong
Publication venue
Publication date: 01/08/2022
Field of study

University of Twente Research Information

Optimum Parallel Processing Schemes to Improve the Computation Speed for Renewable Energy Allocation and Sizing Problems

Author: Ahmadi Bahman
Ceylan Oguzhan
Ozdemir Aydogan
Younesi Soheil
Publication venue
Publication date: 01/12/2022
Field of study

The optimum penetration of distributed generations into the distribution grid provides several technical and economic benefits. However, the computational time required to solve the constrained optimization problems increases with the increasing network scale and may be too long for online implementations. This paper presents a parallel solution of a multi-objective distributed generation (DG) allocation and sizing problem to handle a large number of computations. The aim is to find the optimum number of processors in addition to energy loss and DG cost minimization. The proposed formulation is applied to a 33-bus test system, and the results are compared with themselves and with the base case operating conditions using the optimal values and three popular multi-objective optimization metrics. The results show that comparable solutions with high-efficiency values can be obtained up to a certain number of processors

Directory of Open Access Journals

University of Twente Research Information

A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics

Author: Andrew Matthew
Blumers Ansel
Deo Milind
Goral Jan
Huang Hai
Kane Joshua
Li Zhen
Luo Lixiang
Tang Yu-Hang
Xia Yidong
Publication venue: 'Elsevier BV'
Publication date: 25/03/2019
Field of study

Mesoscopic simulations of hydrocarbon flow in source shales are challenging, in part due to the heterogeneous shale pores with sizes ranging from a few nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid and fluid-solid interactions in nano- to micro-scale shale pores, which are physically and chemically sophisticated, must be captured. To address those challenges, we present a GPU-accelerated package for simulation of flow in nano- to micro-pore networks with a many-body dissipative particle dynamics (mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the code offloads all intensive workloads on GPUs. Other advancements, such as smart particle packing and no-slip boundary condition in complex pore geometries, are also implemented for the construction and the simulation of the realistic shale pores from 3D nanometer-resolution stack images. Our code is validated for accuracy and compared against the CPU counterpart for speedup. In our benchmark tests, the code delivers nearly perfect strong scaling and weak scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we demonstrate, through a flow simulation in realistic shale pores, that the CPU counterpart requires 840 Power9 cores to rival the performance delivered by our package with four V100 GPUs on ORNL's Summit architecture. This simulation package enables quick-turnaround and high-throughput mesoscopic numerical simulations for investigating complex flow phenomena in nano- to micro-porous rocks with realistic pore geometries

arXiv.org e-Print Archive

Clemson University: TigerPrints

Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

Author: Al Jawarneh Isam Mashhour Hasan <1981>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 02/04/2020
Field of study

Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

AMS Tesi di Dottorato

Recommended from our members

MobileTrust: Secure Knowledge Integration in VANETs

Author: Demetriou G.
Hatzivasilis G.
Ioannidis S.
Katos V.
Soultatos O.
Spanoudakis G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

Vehicular Ad hoc NETworks (VANET) are becoming popular due to the emergence of the Internet of Things and ambient intelligence applications. In such networks, secure resource sharing functionality is accomplished by incorporating trust schemes. Current solutions adopt peer-to-peer technologies that can cover the large operational area. However, these systems fail to capture some inherent properties of VANETs, such as fast and ephemeral interaction, making robust trust evaluation of crowdsourcing challenging. In this article, we propose MobileTrust—a hybrid trust-based system for secure resource sharing in VANETs. The proposal is a breakthrough in centralized trust computing that utilizes cloud and upcoming 5G technologies to provide robust trust establishment with global scalability. The ad hoc communication is energy-efficient and protects the system against threats that are not countered by the current settings. To evaluate its performance and effectiveness, MobileTrust is modelled in the SUMO simulator and tested on the traffic features of the small-size German city of Eichstatt. Similar schemes are implemented in the same platform to provide a fair comparison. Moreover, MobileTrust is deployed on a typical embedded system platform and applied on a real smart car installation for monitoring traffic and road-state parameters of an urban application. The proposed system is developed under the EU-founded THREAT-ARREST project, to provide security, privacy, and trust in an intelligent and energy-aware transportation scenario, bringing closer the vision of sustainable circular economy

City Research Online

Bournemouth University Research Online