8 research outputs found

    Replacing missing values using trustworthy data values from web data sources

    Get PDF
    In practice, collected data usually are incomplete and contains missing value. Existing approaches in managing missing values overlook the importance of trustworthy data values in replacing missing values. In view that trusted completed data is very important in data analysis, we proposed a framework of missing value replacement using trustworthy data values from web data sources. The proposed framework adopted ontology to map data values from web data sources to the incomplete dataset. As data from web is conflicting with each other, we proposed a trust score measurement based on data accuracy and data reliability. Trust score is then used to select trustworthy data values from web data sources for missing values replacement. We successfully implemented the proposed framework using financial dataset and presented the findings in this paper. From our experiment, we manage to show that replacing missing values with trustworthy data values is important especially in a case of conflicting data to solve missing values problem

    A semantic-enhanced quality-based approach to handling data sources in enterprise service bus

    Get PDF
    Data quality plays an important role in success of organizations. Poor data quality might significantly affect organizations’ businesses since wrong decisions can be made based on data with poor quality. It is therefore necessary to make data quality information available to data users and allow them to select data sources based on their given requirements. Enterprise Service Bus (ESB) can be used to tackle data integration issues. However, data sources are maintained out of the ESB’s control. This leads to a problem faced by users when it comes to selecting the most suitable data source among available ones. In this article, we present an approach to handling data sources in ESB based on data-quality and semantic technology. This introduces a new level of abstraction that can improve the process of data quality handling with the help of semantic technologies. We evaluate our work using three different scenarios within the wind energy domain.publishedVersionNivå

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results

    Offshore Wind Data Integration

    Get PDF
    Doktorgradsavhandling i informasjons- og kommunikasjonsteknologi, Universitetet i Agder, Grimstad, 2014Using renewable energy to meet the future electricity consumption and to reduce environmental impact is a significant target of many countries around the world. Wind power is one of the most promising renewable energy technologies. In particular, the development of offshore wind power is increasing rapidly due to large areas of wind resources. However, offshore wind is encountering big challenges such as effective use of wind power plants, reduced cost of installation as well as operation and maintenance (O&M). Improved O&M is likely to reduce the hazard exposure of the employees, increase income, and support offshore activities more efficiently. In order to optimize the O&M, the importance of data exchange and knowledge sharing within the offshore wind industry must be realized. With more data available and accessible, it is possible to make better decisions, and thereby improve the recovery rates and reduce the operational costs. This dissertation proposes a holistic way of improving remote operations for offshore wind farms by using data integration. Particularly, semantics and integration aspects of data integration are investigated. The research looks at both theoretical foundations and practical implementations. As the outcome of the research, a framework for data integration of offshore wind farms has been developed. The framework consists of three main components: the semantic model, the data source handling, and the information provisioning. In particular, an offshore wind ontology has been proposed to explore the semantics of wind data and enable knowledge sharing and data exchange. The ontology is aligned with semantic sensor network ontology to support management of metadata in smart grids. That is to say, the ontology-based approach has been proven to be useful in managing data and metadata in the offshore wind and in smart grids. A quality-based approach is proposed to manage, select, and provide the most suitable data source for users based upon their quality requirements and an approach to formally describing derived data in ontologies is investigated
    corecore