Search CORE

127 research outputs found

Context-aware Data Quality Assessment for Big Data

Author: Ardagna Danilo
Cappiello Cinzia
Samà Walter
Vitali Monica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Big data changed the way in which we collect and analyze data. In particular, the amount of available information is constantly growing and organizations rely more and more on data analysis in order to achieve their competitive ad- vantage. However, such amount of data can create a real value only if combined with quality: good decisions and actions are the results of correct, reliable and complete data. In such a scenario, methods and techniques for the data quality assessment can support the identification of suitable data to process. If in tra- ditional database numerous assessment methods are proposed, in the big data scenario new algorithms have to be designed in order to deal with novel require- ments related to variety, volume and velocity issues. In particular, in this paper we highlight that dealing with heterogeneous sources requires an adaptive ap- proach able to trigger the suitable quality assessment methods on the basis of the data type and context in which data have to be used. Furthermore, we show that in some situations it is not possible to evaluate the quality of the entire dataset due to performance and time constraints. For this reason, we suggest to focus the data quality assessment only on a portion of the dataset and to take into account the consequent loss of accuracy by introducing a confidence factor as a measure of the reliability of the quality assessment procedure. We propose a methodology to build a data quality adapter module which selects the best configuration for the data quality assessment based on the user main require- ments: time minimization, confidence maximization, and budget minimization. Experiments are performed by considering real data gathered from a smart city case study

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

Novel processes for smart grid information exchange and knowledge representation using the IEC common information model

Author: Hargreaves Nigel
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The IEC Common Information Model (CIM) is of central importance in enabling smart grid interoperability. Its continual development aims to meet the needs of the smart grid for semantic understanding and knowledge representation for a widening domain of resources and processes. With smart grid evolution the importance of information and data management has become an increasingly pressing issue not only because far more data is being generated using modern sensing, control and measuring devices but also because information is now becoming recognised as the ‘integral component’ that facilitates the optimal flexibility required of the smart grid. This thesis looks at the impacts of CIM implementation upon the landscape of smart grid issues and presents research from within National Grid contributing to three key areas in support of further CIM deployment. Taking the issue of Enterprise Information Management first, an information management framework is presented for CIM deployment at National Grid. Following this the development and demonstration of a novel secure cloud computing platform to handle such information is described. Power system application (PSA) models of the grid are partial knowledge representations of a shared reality. To develop the completeness of our understanding of this reality it is necessary to combine these representations. The second research contribution reports on a novel methodology for a CIM-based model repository to align PSA representations and provide a knowledge resource for building utility business intelligence of the grid. The third contribution addresses the need for greater integration of information relating to energy storage, an essential aspect of smart energy management. It presents the strategic rationale for integrated energy modeling and a novel extension to the existing CIM standards for modeling grid-scale energy storage. Significantly, this work has already contributed to a larger body of work on modeling Distributed Energy Resources currently under development at the Electric Power Research Institute (EPRI) in the USA.Dr. Martin Bradley on behalf of National Grid Plc. and the Engineering and Physical Sciences Research Council (EPSRC

Brunel University Research Archive

Recommended from our members

Intelligent decision support for maintenance: an overview and future trends

Author: A. Tiwari
Akkermans H.
Anaya V.
Bass T.
Bowden D.
C. Emmanouilidis
C. J. Turner
Duncan R. A. K.
Goyal D.
IEEE Task Force on Process Mining
Jantunen E.
Kanawaday A.
Koronios A.
Kubler S.
Marugán Pliego
McNaught K. R.
Nehinbe J. O.
Papakostas N.
Park J.
Penna R.
R. Roy
T. Tomiyama
Truong H. L.
Van Horenbeek A.
Vaughn R. B.
Vogel-Heuser B.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

The changing nature of manufacturing, in recent years, is evident in industry’s willingness to adopt network-connected intelligent machines in their factory development plans. A number of joint corporate/government initiatives also describe and encourage the adoption of Artificial Intelligence (AI) in the operation and management of production lines. Machine learning will have a significant role to play in the delivery of automated and intelligently supported maintenance decision-making systems. While e-maintenance practice provides aframework for internet-connected operation of maintenance practice the advent of IoT has changed the scale of internetworking and new architectures and tools are needed. While advances in sensors and sensor fusion techniques have been significant in recent years, the possibilities brought by IoT create new challenges in the scale of data and its analysis. The development of audit trail style practice for the collection of data and the provision of acomprehensive framework for its processing, analysis and use should be avaluable contribution in addressing the new data analytics challenges for maintenance created by internet connected devices. This paper proposes that further research should be conducted into audit trail collection of maintenance data, allowing future systems to enable ‘Human in the loop’ interactions

White Rose Research Online

In Homage of Change

Author: boredomresearch
Isley Vicky
Smith P.
Publication venue: JISC
Publication date: 01/11/2012
Field of study

Bournemouth University Research Online

Adding value to Linked Open Data using a multidimensional model approach based on the RDF Data Cube vocabulary

Author: Candela Gustavo
Escobar Esteban María Pilar
Marco Such Manuel
Peral Jesús
Trujillo Juan
Publication venue: 'Elsevier BV'
Publication date: 01/02/2020
Field of study

Most organisations using Open Data currently focus on data processing and analysis. However, although Open Data may be available online, these data are generally of poor quality, thus discouraging others from contributing to and reusing them. This paper describes an approach to publish statistical data from public repositories by using Semantic Web standards published by the W3C, such as RDF and SPARQL, in order to facilitate the analysis of multidimensional models. We have defined a framework based on the entire lifecycle of data publication including a novel step of Linked Open Data assessment and the use of external repositories as knowledge base for data enrichment. As a result, users are able to interact with the data generated according to the RDF Data Cube vocabulary, which makes it possible for general users to avoid the complexity of SPARQL when analysing data. The use case was applied to the Barcelona Open Data platform and revealed the benefits of the application of our approach, such as helping in the decision-making process.This work was supported in part by the Spanish Ministry of Science, Innovation and Universities through the Project ECLIPSE-UA under grant RTI2018-094283-B-C32

Repositorio Institucional de la Universidad de Alicante

Simulation of the performance of complex data-intensive workflows

Author: Llwaah Faris Adel Dawood
Publication venue: Newcastle University
Publication date: 01/01/2018
Field of study

PhD ThesisRecently, cloud computing has been used for analytical and data-intensive processes as it offers many attractive features, including resource pooling, on-demand capability and rapid elasticity. Scientific workflows use these features to tackle the problems of complex data-intensive applications. Data-intensive workflows are composed of many tasks that may involve large input data sets and produce large amounts of data as output, which typically runs in highly dynamic environments. However, the resources should be allocated dynamically depending on the demand changes of the work flow, as over-provisioning increases the cost and under-provisioning causes Service Level Agreement (SLA) violation and poor Quality of Service (QoS). Performance prediction of complex workflows is a necessary step prior to the deployment of the workflow. Performance analysis of complex data-intensive workflows is a challenging task due to the complexity of their structure, diversity of big data, and data dependencies, in addition to the required examination to the performance and challenges associated with running their workflows in the real cloud. In this thesis, a solution is explored to address these challenges, using a Next Generation Sequencing (NGS) workflow pipeline as a case study, which may require hundreds/ thousands of CPU hours to process a terabyte of data. We propose a methodology to model, simulate and predict runtime and the number of resources used by the complex data-intensive workflows. One contribution of our simulation methodology is that it provides an ability to extract the simulation parameters (e.g., MIPs and BW values) that are required for constructing a training set and a fairly accurate prediction of the run time for input for cluster sizes much larger than ones used in training of the prediction model. The proposed methodology permits the derivation of run time prediction based on historical data from the provenance fi les. We present the run time prediction of the complex workflow by considering different cases of its running in the cloud such as execution failure and library deployment time. In case of failure, the framework can apply the prediction only partially considering the successful parts of the pipeline, in the other case the framework can predict with or without considering the time to deploy libraries. To further improve the accuracy of prediction, we propose a simulation model that handles I/O contention

Newcastle University eTheses

Blockchain-based Digital Twins:Research Trends, Issues, and Future Challenges

Author: Hong Choong Seon
Hussain Rasheed
Jurdak Raja
Matulevičius Raimundas
Oracevic Alma
Salah Khaled
Suhail Sabah
Publication venue
Publication date: 22/03/2021
Field of study

Industrial processes rely on sensory data for decision-making processes, risk assessment, and performance evaluation. Extracting actionable insights from the collected data calls for an infrastructure that can ensure the dissemination of trustworthy data. For the physical data to be trustworthy, it needs to be cross validated through multiple sensor sources with overlapping fields of view. Cross-validated data can then be stored on the blockchain, to maintain its integrity and trustworthiness. Once trustworthy data is recorded on the blockchain, product lifecycle events can be fed into data-driven systems for process monitoring, diagnostics, and optimized control. In this regard, digital twins (DTs) can be leveraged to draw intelligent conclusions from data by identifying the faults and recommending precautionary measures ahead of critical events. Empowering DTs with blockchain in industrial use cases targets key challenges of disparate data repositories, untrustworthy data dissemination, and the need for predictive maintenance. In this survey, while highlighting the key benefits of using blockchain-based DTs, we present a comprehensive review of the state-of-the-art research results for blockchain-based DTs. Based on the current research trends, we discuss a trustworthy blockchain-based DTs framework. We also highlight the role of artificial intelligence in blockchain-based DTs. Furthermore, we discuss the current and future research and deployment challenges of blockchain-supported DTs that require further investigation.</p

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Explore Bristol Research

The architecture of discovery net : towards grid-based discovery services

Author: Wendel Patrick
Wendel Patrick
Publication venue
Publication date: 01/01/2008
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Linked Vocabulary Recommendation Tools for Internet of Things: A Survey

Author: Kolbe Niklas
Kubler Sylvain
Le Traon Yves
Robert Jérémy
Zaslavsky Arkady
Publication venue
Publication date: 01/01/2019
Field of study

The Semantic Web emerged with the vision of eased integration of heterogeneous, distributed data on the Web. The approach fundamentally relies on the linkage between and reuse of previously published vocabularies to facilitate semantic interoperability. In recent years, the Semantic Web has been perceived as a potential enabling technology to overcome interoperability issues in the Internet of Things (IoT), especially for service discovery and composition. Despite the importance of making vocabulary terms discoverable and selecting most suitable ones in forthcoming IoT applications, no state-of-the-art survey of tools achieving such recommendation tasks exists to date. This survey covers this gap, by specifying an extensive evaluation framework and assessing linked vocabulary recommendation tools. Furthermore, we discuss challenges and opportunities of vocabulary recommendation and related tools in the context of emerging IoT ecosystems. Overall, 40 recommendation tools for linked vocabularies were evaluated, both, empirically and experimentally. Some of the key ndings include that (i) many tools neglect to thoroughly address both, the curation of a vocabulary collection and e ective selection mechanisms; (ii) modern information retrieval techniques are underrepresented; and (iii) the reviewed tools that emerged from Semantic Web use cases are not yet su ciently extended to t today’s IoT projects

Deakin Research Online

Open Repository and Bibliography - Luxembourg

Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

Author: Mercier Michael
Publication venue: HAL CCSD
Publication date: 01/07/2019
Field of study

The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RésuméLa quantité de données produites, que ce soit dans la communauté scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a émergéface au traitement de grandes quantités de données sur les infrastructures informatiques distribuées. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisées pour l’exécution de charges de travail intensives en calcul. Cependant, la communauté HPC fait également face à un nombre croissant debesoin de traitement de grandes quantités de données dérivées de capteurs hautedéfinition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communauté HPC utilise déjà des outilsBig Data, qui ne sont pas toujours correctement intégrés, en particulier au niveaudu système de fichiers ainsi que du système de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les défis pour les infrastructures HPC, nousavons étudié plusieurs aspects de la convergence: nous avons d’abord proposé uneétude sur les méthodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de données. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelée BeBiDa basée sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous évaluons ce mécanisme en conditions réelles et en environnement simuléavec notre simulateur Batsim. En outre, nous fournissons des extensions à Batsimpour prendre en charge les entrées/sorties et présentons le développements d’unmodèle de système de fichiers générique accompagné d’un modèle d’applicationBig Data. Cela nous permet de compléter les expériences en conditions réellesde BeBiDa en simulation tout en étudiant le dimensionnement et les différentscompromis autours des systèmes de fichiers.Toutes les expériences et analyses de ce travail ont été effectuées avec la reproductibilité à l’esprit. Sur la base de cette expérience, nous proposons d’intégrerle flux de travail du développement et de l’analyse des données dans l’esprit dela reproductibilité, et de donner un retour sur nos expériences avec une liste debonnes pratiques