Search CORE

45 research outputs found

Differentiated Multiple Aggregations in Multidimensional Databases

Author: HASSAN Ali
Ravat Franck
Teste Olivier
Tournier Ronan
Zurfluh Gilles
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/2015
Field of study

International audienceMany models have been proposed for modeling multidimensional data warehouse and most consider a same function to determine how measure values are aggregated according to different data detail levels. We provide a conceptual model that supports (1) multiple aggregations, associating to the same measure a different aggregation function according to analysis axes or hierarchies, and (2) differentiated aggregation, allowing specific aggregations at each detail level. Our model is based on a graphical formalism that allows controlling the validity of aggregation functions (distributive, algebraic or holistic). We also show how conceptual modeling can be used, in an R-OLAP environment, for building lattices of pre-computed aggregates

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

The Privacy Preservation of Data Cubes

Author: LIU YAO
Publication venue
Publication date: 15/06/2006
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation Management

Author: Aloisio Giovanni
Andrade Nazareno
Antunes Nuno
Basso Tania
Blanquer Ignacio
Braz Tarciso
Cappiello Cinzia
Elia Donatello
Fiore Sandro
Fonseca Keiko Veronica Ono
Kozievitch Nadia P.
Lezzi Daniele
Meira Wagner
Mestre Demetrio Gomes
Moraes Regina
Palazzo Cosimo
Pires Carlos Eduardo
Vieira Marco
Vitali Monica
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

(c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Smart urban transportation management can be considered as a multifaceted big data challenge. It strongly relies on the information collected into multiple, widespread, and heterogeneous data sources as well as on the ability to extract actionable insights from them. Besides data, full stack (from platform to services and applications) Information and Communications Technology (ICT) solutions need to be specifically adopted to address smart cities challenges. Smart urban transportation management is one of the key use cases addressed in the context of the EUBra-BIGSEA (Europe-Brazil Collaboration of Big Data Scientific Research through Cloud-Centric Applications) project. This paper specifically focuses on the City Administration Dashboard, a public transport analytics application that has been developed on top of the EUBra-BIGSEA platform and used by the Municipality stakeholders of Curitiba, Brazil, to tackle urban traffic data analysis and planning challenges. The solution proposed in this paper joins together a scalable big and fast data analytics platform, a flexible and dynamic cloud infrastructure, data quality and entity matching algorithms as well as security and privacy techniques. By exploiting an interoperable programming framework based on Python Application Programming Interface (API), it allows an easy, rapid and transparent development of smart cities applications.This work was supported by the European Commission through the Cooperation Programme under EUBra-BIGSEA Horizon 2020 Grant [Este projeto e resultante da 3a Chamada Coordenada BR-UE em Tecnologias da Informacao e Comunicacao (TIC), anunciada pelo Ministerio de Ciencia, Tecnologia e Inovacao (MCTI)] under Grant 690116.Fiore, S.; Elia, D.; Pires, CE.; Mestre, DG.; Cappiello, C.; Vitali, M.; Andrade, N.... (2019). An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation Management. IEEE Access. 7:117652-117677. https://doi.org/10.1109/ACCESS.2019.2936941S117652117677

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

RiuNet

Data Analysis Challenges in the Future Energy Domain

Author: Eichinger Frank
Müller Emmanuel
Pathmaperuma Daniel
Vogt Harald
Publication venue: Chapman & Hall/CRC
Publication date: 01/01/2013
Field of study

KITopen

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Author: Ben Basat Ran
Cheng Zhuo
Liu Zaoxing
Manousis Antonis
Sekar Vyas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2022
Field of study

Today’s large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a “sketch of sketches” to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times

UCL Discovery

Enabling efficient and general subpopulation analytics in multidimensional data streams

Author: Ben Basat Ran
Cheng Zhuo
Liu Zaoxing
Manousis Antonis
Sekar Vyas
Publication venue: Association for Computing Machinery (ACM)
Publication date: 29/09/2022
Field of study

Today's large-scale services ( e.g. , video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a "sketch of sketches" to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times.Red Hat; CNS-2107086 - National Science Foundation; CNS-2106946 - National Science Foundation; CNS-2132643 - National Science FoundationPublished versio

Boston University Institutional Repository (OpenBU)

‎An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments

Author: Abderraouf Hafsaoui
Alfredo Cuzzocrea
Enzo Mumolo
Islam Belmerabet
Publication venue: Islamic Azad University, Bandar Abbas Branch
Publication date: 01/11/2023
Field of study

Cloud-based machine learning tools for enhanced Big Data applications}‎, ‎where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification‎, ‎which is very relevant at now in the specific context of \emph{Big Data}‎. ‎The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications‎. ‎Implementation-wise‎, ‎our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}‎, ‎which represent classical ``commodity'' settings for Cloud-based big data applications‎. ‎Given a number of known reference workloads‎, ‎and an unknown workload‎, ‎in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one‎. ‎The depicted scenario turns out to be useful in a plethora of modern information system applications‎. ‎We name this problem as \emph{coarse-grained workload classification}‎, ‎because‎, ‎instead of characterizing the unknown workload in terms of finer behaviors‎, ‎such as CPU‎, ‎memory‎, ‎disk‎, ‎or network intensive patterns‎, ‎we classify the whole unknown workload as one of the (possible) reference workloads‎. ‎Reference workloads represent a category of workloads that are relevant in a given applicative environment‎. ‎In particular‎, ‎we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}‎. ‎Today‎, ‎\emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms‎. ‎In virtualization frameworks‎, ‎workload classification is very useful for accounting‎, ‎security reasons‎, ‎or user profiling‎. ‎Hence‎, ‎our research makes more sense in such environments‎, ‎and it turns out to be very useful in a special context like Cloud Computing‎, ‎which is emerging now‎. ‎In this respect‎, ‎our approach consists of running several machine learning-based classifiers of different workload models‎, ‎and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}‎, ‎in order to magnify the accuracy of the final classification‎. ‎Experimental assessment and analysis clearly confirm the benefits derived from our classification framework‎. ‎The running programs which produce unknown workloads to be classified are treated in a similar way‎. ‎A fundamental aspect of this paper concerns the successful use of data fusion in workload classification‎. ‎Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination‎, ‎giving a classification accuracy of slightly less than

80\%

‎. ‎The acquisition of data from the running process‎, ‎the pre-processing algorithms‎, ‎and the workload classification are described in detail‎. ‎Various classical algorithms have been used for classification to classify the workloads‎, ‎and the results are compared‎

Directory of Open Access Journals