26 research outputs found

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results

    A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry

    Get PDF
    Tackling Data Quality issues as part of Big Data can be challenging. For data cleansing activities, manual methods are not efficient due to the potentially very large amount of data. This paper aims to qualitatively assess the possibilities for using machine learning in the process of detecting data incompleteness and inaccuracy, since these two data quality dimensions were found to be the most significant by a previous research study conducted by the authors. A review of existing literature concludes that there is no unique machine learning algorithm most suitable to deal with both incompleteness and inaccuracy of data. Various algorithms are selected from existing studies and applied against a representative big (healthcare) dataset. Following experiments, it was also discovered that the implementation of machine learning algorithms in this context encounters several challenges for Big Data quality activities. These challenges are related to the amount of data particular machine learning algorithms can scale to and also to certain data type restrictions imposed by some machine learning algorithms. The study concludes that 1) data imputation works better with linear regression models, 2) clustering models are more efficient to detect outliers but fully automated systems may not be realistic in this context. Therefore, a certain level of human judgement is still needed

    Discovering the most important data quality dimensions in health big data using latent semantic analysis

    Get PDF
    Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of frameworks or guidelines focusing on how to carry out big data quality initiatives. The starting point of any data quality work is to determine the properties of data quality, termed ‘data quality dimensions’ (DQDs). Even these dimensions lack precise rigour in terms of definition in existing literature. This current research aims to contribute towards identifying the most important DQDs for big data in the health industry. It is a continuation of previous work, which, using relevant literature, identified five DQDs (accuracy, completeness, consistency, reliability and timeliness) as being the most important DQDs in health datasets. The previous work used a human judgement based research method known as an inner hermeneutic cycle (IHC). To remove the potential bias coming from the human judgement aspect, this research study used the same set of literature but applied a statistical research method (used to extract knowledge from a set of documents) known as latent semantic analysis (LSA). Use of LSA concluded that accuracy and completeness were the only similar DQDs classed as the most important in health Big Data for both IHC and LSA

    Exploring the application and usability of NFC for promoting self-learning on energy consumption of household electronic appliances

    Get PDF
    During the past decade, the significant increase in the adoption of consumer electronics has caused a rise in energy demand within the residential and household sectors globally. Since these electronics are dependent on electricity, the impact of these sectors on the environment is also deteriorating and it becomes important to take remedial action. For this, various websites and mobile applications have emerged that provide information to household users on energy consumption of devices and as well as reduction mechanisms. However, since these platforms are limited in various ways in their endeavor to promote self-learning on energy consumption reduction, awareness still remains an important barrier thus giving rise to the need for further investigation on innovative technologies and platforms. Even though Near Field Communication (NFC) could potentially be used, limited work has been conducted in relation to energy consumption of consumer electronics. As such, this paper delves into the application and usability of NFC for promoting self-learning on energy consumption of household electronic appliances through an Android based application called NFC Energy Tracker (NET)

    JarPi: a low-cost raspberry pi based personal assistant for small-scale fishermen

    Get PDF
    Small-scale fishermen face various occupational safety hazards due to unavailability of real-time weather information during fishing activities at sea. Whilst provision of such information could greatly reduce these risks, limited personal assistants exist that could support small scale fishermen in their activities at sea with real-time details on wind speed and direction, rainfall, humidity, geographical location and distance from shore, among others. Furthermore, large scale solutions are too expensive for this category of fishermen to afford. Even though the recent emergence of the Raspberry Pi showed to significantly decrease costs of computational systems, the application of this technology to build solutions for small-scale fishermen is yet to be investigated. As such, this paper investigates the implementation and deployment of a low-cost Raspberry Pi based personal assistant for small-scale fishermen, through a proposed device named JarPi

    Investigating data repair steps for EHR Big Data

    No full text
    This paper builds on previous research with the aim of optimizing data quality methodologies for Big Data systems, with a focus on Electronic Health Records. This optimization is performed for organisations aiming to follow a data-centric data quality strategy. One of the most important stages of a data quality lifecycle is involved with correcting dirty data detected. There is a lack of knowledge relative to the performance of existing data repair algorithms and tools in a Big Data context. This study performs a systemic review of data repair algorithms and tools, subsequently undertaking an experiment-based approach to evaluate those algorithms and tools while comparing it with a prototype built based on the results of a previous study. While some algorithms and tools could be seen to be marginally better than others, there was no algorithm or tool which was seen to be extremely adequate in the Big Data context. Thus, recommendations of improvements needed for data repair algorithms and tools for Big Data are given

    An Impact Investment Strategy

    No full text
    Impact investing is based on using the ESG framework as a tool to evaluate firms that engage in generating positive impact. Most impact investors and fund managers now integrate the ESG framework in their investment and stock-picking process. However, due to lack of standardisation of ESG reporting, it remains a challenge for investors and the public to identify the truly sustainable companies. We propose an additional measure of tax avoidance to identify firms that are socially responsible. When firms indulge in excessive tax avoidance behaviour, it may be viewed as unethical or socially irresponsible. We integrate the empirical association between corporate social responsibility (CSR) and tax avoidance into an investment strategy based on impact. We adopt an investment strategy based on firm‐level ESG ratings and tax avoidance practices. In a pure impact investment strategy based on ESG and tax avoidance, we find that investing in high‐ESG rated firms and low tax avoidance firms yields a buy and hold abnormal return of 3.4% per annum and 11.4% in a three-year investment horizon. Next, if impact investors were to combine traditional investment strategies based on risk with impact measures, we find that portfolios of high‐ESG and high price‐to‐book‐ratio firms earn a buy and hold abnormal return of 21.2%, while a portfolio of low tax avoidance and high price-to-book portfolios earns 29.8% in the long run. Collectively, our results suggest that, whilst impact investing does provide investors a return, it does not necessarily outperform traditional investment strategies. Our results are robust to other risk factors and the sector of the firm

    An Impact Investment Strategy

    No full text
    Impact investing is based on using the ESG framework as a tool to evaluate firms that engage in generating positive impact. Most impact investors and fund managers now integrate the ESG framework in their investment and stock-picking process. However, due to lack of standardisation of ESG reporting, it remains a challenge for investors and the public to identify the truly sustainable companies. We propose an additional measure of tax avoidance to identify firms that are socially responsible. When firms indulge in excessive tax avoidance behaviour, it may be viewed as unethical or socially irresponsible. We integrate the empirical association between corporate social responsibility and tax avoidance into an investment strategy based on impact. We adopt an investment strategy based on firm‐level ESG ratings and tax avoidance practices. In a pure impact investment strategy based on ESG and tax avoidance, we find that investing in high‐ESG rated firms and low tax avoidance firms yields a buy and hold abnormal return of 3.4% per annum and 11.4% in a 3 years investment horizon. Next, if impact investors were to combine traditional investment strategies based on risk with impact measures, we find that portfolios of high‐ESG and high price‐to‐book‐ratio firms earn a buy and hold abnormal return of 21.2%, while a portfolio of low tax avoidance and high price-to-book portfolios earns 29.8% in the long run. Collectively, our results suggest that, whilst impact investing does provide investors a return, it does not necessarily outperform traditional investment strategies. Our results are robust to other risk factors and the sector of the firm

    Analyzing the prospects and acceptance of mobile-based marine debris tracking

    No full text
    Marine litter has been considered as a growing concern within different coastal areas around the world and to address this issue, there have been apprehensions from various stakeholders including international regulatory bodies and governmental institutions, among others. Amongst the different technologies being promoted by key stakeholders, mobile-based marine debris tracking is being promoted due to the widespread utilization of mobile devices. However, although a few mobile based marine debris reporting, and tracking tools have emerged, limited research has been undertaken about the acceptance of such solution by end users. Assessment of acceptance of this technology is important in order to understand aspects that impact future adoption. To address this gap, this paper investigates and analyses the acceptance of mobile-based marine debris tracking. In order to achieve the purpose of this paper, an application called “Mau Marine-Litter Watch” was developed and assessed through application of the Technology Acceptance Model
    corecore