4 research outputs found

    AUTO-CDD: automatic cleaning dirty data using machine learning techniques

    Get PDF
    Cleaning the dirty data has become very critical significance for many years, especially in medical sectors. This is the reason behind widening research in this sector. To initiate the research, a comparison between currently used functions of handling missing values and Auto-CDD is presented. The developed system will guarantee to overcome processing unwanted outcomes in data Analytical process; second, it will improve overall data processing. Our motivation is to create an intelligent tool that will automatically predict the missing data. Starting with feature selection using Random Forest Gini Index values. Then by using three Machine Learning Paradigm trained model was developed and evaluated by two datasets from UCI (i.e. Diabetics and Student Performance). Evaluated outcomes of accuracy proved Random Forest Classifier and Logistic Regression gives constant accuracy at around 90%. Finally, it concludes that this process will help to get clean data for further analytical process

    A Machine Learning Classification Framework for Early Prediction of Alzheimer’s Disease

    Get PDF
    People today, in addition to their concerns about getting old and having to go through watching themselves grow weak and wrinkly, are facing an increasing fear of dementia. There are around 47 million people affected by dementia worldwide and the cost associated with providing them health and social care support is estimated to reach 2 trillion by 2030 which is almost equivalent to the 18th largest economy in the world. The most common form of dementia with the highest costs in health and social care is Alzheimer’s disease, which gradually kills neurons and causes patients to lose loving memories, the ability to recognise family members, childhood memories, and even the ability to follow simple instructions. Alzheimer’s disease is irreversible, unstoppable and has no known cure. Besides being a calamity to affected patients, it is a great financial burden on health providers. Health care providers also face a challenge in diagnosing the disease as current methods used to diagnose Alzheimer’s disease rely on manual evaluations of a patient’s medical history and mental examinations such as the Mini-Mental State Examination. These diagnostic methods often give a false diagnosis and were designed to identify Alzheimer’s after stage two when the part of all symptoms are evident. The problem is that clinicians are unable to stop or control the progress of Alzheimer’s disease, because of a lack of knowledge on the patterns that triggered the development of the disease. In this thesis, we explored and investigated Alzheimer’s disease from a computational perspective to uncover different risk factors and present a strategic framework called Early Prediction of Alzheimer’s Disease Framework (EPADf) that would give a future prediction of early-onset Alzheimer’s disease. Following extensive background research that resulted in the formalisation of the framework concept, prediction approaches, and the concept of ranking the risk factors based on clinical instinct, knowledge and experience using mathematical reasoning, we carried out experiments to get further insight and investigate the disease further using machine learning models. In this study, we used machine learning models and conducted two classification experiments for early prediction of Alzheimer’s disease, and one ranking experiment to rank its risk factors by importance. Besides these experiments, we also presented two logical approaches to search for patterns in an Alzheimer’s dataset, and a ranking algorithm to rank Alzheimer’s disease risk factors based on clinical evaluation. For the classification experiments we used five different Machine Learning models; Random Forest (RF), Random Oracle Model (ROM), a hybrid model combined of Levenberg-Marquardt neural network and Random Forest, combined using Fischer discriminate analysis (H2), Linear Neural Networks (LNN), and Multi-layer Perceptron Model (MLP). These models were deployed on a de-identified multivariable patient’s data, provided by the ADNI (Alzheimer’s disease Neuroimaging Initiative), to illustrate the effective use of data analysis to investigate Alzheimer’s disease biological and behavioural risk factors. We found that the continues enhancement of patient’s data and the use of combined machine learning models can provide an early cost-effective prediction of Alzheimer’s disease, and help in extracting insightful information on the risk factors of the disease. Based on this work and findings we have developed the strategic framework (EPADf) which is discussed in more depth in this thesis

    Analysis of building performance data

    Get PDF
    In recent years, the global trend for digitalisation has also reached buildings and facility management. Due to the roll out of smart meters and the retrofitting of buildings with meters and sensors, the amount of data available for a single building has increased significantly. In addition to data sets collected by measurement devices, Building Information Modelling has recently seen a strong incline. By maintaining a building model through the whole building life-cycle, the model becomes rich of information describing all major aspects of a building. This work aims to combine these data sources to gain further valuable information from data analysis. Better knowledge of the building’s behaviour due to high quality data available leads to more efficient building operations. Eventually, this may result in a reduction of energy use and therefore less operational costs. In this thesis a concept for holistic data acquisition from smart meters and a methodology for the integration of further meters in the measurement concept are introduced and validated. Secondly, this thesis presents a novel algorithm designed for cleansing and interpolation of faulty data. Descriptive data is extracted from an open meta data model for buildings which is utilised to further enrich the metered data. Additionally, this thesis presents a methodology for how to design and manage all information in a unified Data Warehouse schema. This Data Warehouse, which has been developed, maintains compatibility with an open meta data model by adopting the model’s specification into its data schema. It features the application of building specific Key Performance Indicators (KPI) to measure building performance. In addition a clustering algorithm, based on machine learning technology, is developed to identify behavioural patterns of buildings and their frequency of occurrence. All methodologies introduced in this work are evaluated through installations and data from three pilot buildings. The pilot buildings were selected to be of diverse types to prove the generic applicability of the above concepts. The outcome of this work successfully demonstrates that the combination of data sources available for buildings enable advanced data analysis. This largely increases the understanding of buildings and their behavioural patterns. A more efficient building operation and a reduction of energy usage can be achieved with this knowledge
    corecore