8 research outputs found

    A Machine Learning Approach for Detecting Unemployment using the Smart Metering Infrastructure

    Get PDF
    Technological advancements in the field of electrical energy distribution and utilization are revolutionizing the way consumers and utility providers interact. In addition to allowing utility companies to monitor the status of their network independently in autonomous fashion, data collected by smart meters as part of the wider advanced metering infrastructure, can be valuable for third parties, such as government authorities. The availability of the information, the granularity of the data, and the real-time nature of the smart meter, means that predictive analytics can be employed to profile consumers with high accuracy and approximate, for example, the number of individuals living in a house, the type of appliances being used, or the duration of occupancy, to name but a few applications. This paper presents a machine learning model comparison for unemployment prediction of single household occupants, based on features extracted from smart meter electricity readings. A number of nonlinear classifiers are compared, and benchmarked against a generalized linear model, and the results presented. To ensure the robustness of the classifiers, we use repeated cross validation. The results revealed that it is possible to predict employability status with Area Under Curve (AUC) = 74%, Sensitivity (SE) = 54% and Specificity (SP) = 83%, using a multilayer perceptron neural network with dropout, closely followed by the results produced by a distance weighted discrimination with polynomial kernel model. This shows the potential of using the smart metering infrastructure to provide additional autonomous services, such as unemployment detection, for governments using data collected from an advanced and distributed Internet of Things (IoT) sensor network

    Machine Learning Approaches for the Prediction of Obesity using Publicly Available Genetic Profiles

    Get PDF
    This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables - 13 SNPs - for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, K-nearest neighbours, support vector machines, random forest and multilayer neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated high accuracy value of 90.5%

    Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

    Get PDF
    In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and the important interactions between them. Basic statistical methods and techniques for the analysis of genetic SNP data from population-based genome-wide studies have been considered. Statistical association testing between individual SNPs and obesity was conducted under an additive model using logistic regression. Four subsets of loci after quality-control (QC) and association analysis were selected: P-values lower than 1x10-5 (5 SNPs), 1x10-4 (32 SNPs), 1x10-3 (248 SNPs) and 1x10-2 (2465 SNPs). A deep learning classifier is initialised using these sets of SNPs and fine-tuned to classify obese and non-obese observations. Using a deep learning classifier model and genetic variants with P-value < 1x10-2 (2465 SNPs) it was possible to obtain results (SE=0.9604, SP=0.9712, Gini=0.9817, LogLoss=0.1150, AUC=0.9908 and MSE=0.0300). As the P-value increased, an evident deterioration in performance was observed. Results demonstrate that single SNP analysis fails to capture the cumulative effect of less significant variants and their overall contribution to the outcome in disease prediction, which is captured using a deep learning framework

    Evaluation of Phenotype Classification Methods for Obesity using Direct to Consumer Genetic Data

    Get PDF
    Today, Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. Finally, we applied a combination of Recursive Feature Elimination feature selection and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach provides the identification of obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through Recursive Feature Elimination does not improve the performance of the classifier when compared to the totality of genetic variants identified in logistic regression

    An Ensemble Detection Model using Multinomial Classification of Stochastic Gas Smart Meter Data to Improve Wellbeing Monitoring in Smart Cities

    Get PDF
    Fuel poverty has a negative impact on the wellbeing of individuals within a household; affecting not only comfort levels but also increased levels of seasonal mortality. Wellbeing solutions within this sector are moving towards identifying how the needs of people in vulnerable situations can be improved or monitored by means of existing supply networks and public institutions. Therefore, the focus of this research is towards wellbeing monitoring solution, through the analysis of gas smart meter data. Gas smart meters replace the traditional analogue electro-mechanical and diaphragm-based meters that required regular reading. They have received widespread popularity over the last 10 years. This is primarily due to the fact that by using this technology, customers are able to adapt their consumption behaviours based on real-time information provided by In-Home Devices. Yet, the granular nature of the datasets generated has also meant that this technology is ideal for further scalable wellbeing monitoring applications. For example, the autonomous detection of households at risk of energy poverty is possible and of growing importance in order to face up to the impacts of fuel poverty, quality of life and wellbeing of low-income housing. However, despite their popularity (smart meters), the analysis of gas smart meter data has been neglected. In this paper, an ensemble model is proposed to achieve autonomous detection, supported by four key measures from gas usage patterns, consisting of i) a tariff detection, ii) a temporally-aware tariff detection, iii) a routine consumption detection and iv) an age-group detection. Using a cloud-based machine learning platform, the proposed approach yielded promising classification results of up to 84.1% Area Under Curve (AUC), when the Synthetic Minority Over-sampling Technique (SMOTE) was utilised

    Video Analysis for the Detection of Animals Using Convolutional Neural Networks and Consumer-Grade Drones

    Get PDF
    Determining animal distribution and density is important in conservation. The process is both timeconsuming and labour-intensive. Drones have been used to help mitigate human-intensive tasks by covering large geographical areas over a much shorter timescale. In this paper we investigate this idea further using a proof of concept to detect rhinos and cars from drone footage. The proof of concept utilises off-the-shelf technology and consumer grade drone hardware. The study demonstrates the feasibility of using machine learning (ML) to automate routine conservation tasks such as animal detection and tracking. The prototype has been developed using a DJI Mavic Pro 2 and tested over a Global System for Mobile Communications (GSM) network. The Faster RCNN Resnet 101 architecture is used for transfer learning. Inference is performed with a frame sampling technique to address the required trade-off between precision, processing speed and live video feed synchronisation. Inference models are hosted on a web platform and video streams from the drone (using OcuSync) are transmitted to a Real-Time Messaging Protocol (RTMP) server for subsequent classification. During training, the best model achieves a Mean Average Precision (mAP) of 0.83, Intersection Over Union @(IOU) 0.50 and 0.69 @IOU 0.75, respectively. On testing the system in Knowsley Safari our prototype was able to achieve the following: Sensitivity (Sen): 0.91(0.869,094), Specificity (Spec): 0.78(0.74,0.82) and an Accuracy (ACC): 0.84 (0.81,0.87) when detecting rhinos, and Sen: 1.00(1.00,1.00), Spec: 1.00(1.00,1.00) and an ACC:1.00(1.00,1.00) when detecting cars. © Canadian Science Publishin

    An Investigation into Healthcare-Data Patterns

    Get PDF
    Visualising complex data facilitates a more comprehensive stage for conveying knowledge. Within the medical data domain, there is an increasing requirement for valuable and accurate information. Patients need to be confident that their data is being stored safely and securely. As such, it is now becoming necessary to visualise data patterns and trends in real-time to identify erratic and anomalous network access behaviours. In this paper, an investigation into modelling data flow within healthcare infrastructures is presented; where a dataset from a Liverpool-based (UK) hospital is employed for the case study. Specifically, a visualisation of transmission control protocol (TCP) socket connections is put forward, as an investigation into the data complexity and user interaction events within healthcare networks. In addition, a filtering algorithm is proposed for noise reduction in the TCP dataset. Positive results from using this algorithm are apparent on visual inspection, where noise is reduced by up to 89.84%

    Analysis of Extremely Obese Individuals Using Deep Learning Stacked Autoencoders and Genome-Wide Genetic Data

    No full text
    The aetiology of polygenic obesity is multifactorial, which indicates that life-style and environmental factors may influence multiples genes to aggravate this disorder. Several low-risk single nucleotide polymorphisms (SNPs) have been associated with BMI. However, identified loci only explain a small proportion of the variation ob-served for this phenotype. The linear nature of genome wide association studies (GWAS) used to identify associations between genetic variants and the phenotype have had limited success in explaining the heritability variation of BMI and shown low predictive capacity in classification studies. GWAS ignores the epistatic interactions that less significant variants have on the phenotypic outcome. In this paper we utilise a novel deep learning-based methodology to reduce the high dimensional space in GWAS and find epistatic interactions between SNPs for classification purposes. SNPs were filtered based on the effects associations have with BMI. Since Bonferroni adjustment for multiple testing is highly conservative, an important proportion of SNPs involved in SNP-SNP interactions are ignored. Therefore, only SNPs with p-values < 1x10-2 were considered for subsequent epistasis analysis using stacked auto encoders (SAE). This allows the nonlinearity present in SNP-SNP interactions to be discovered through progressively smaller hidden layer units and to initialise a multi-layer feedforward artificial neural network (ANN) classifier. The classifier is fine-tuned to classify extremely obese and non-obese individuals. The best results were obtained with 2000 compressed units (SE=0.949153, SP=0.933014, Gini=0.949936, Lo-gloss=0.1956, AUC=0.97497 and MSE=0.054057). Using 50 compressed units it was possible to achieve (SE=0.785311, SP=0.799043, Gini=0.703566, Logloss=0.476864, AUC=0.85178 and MSE=0.156315)
    corecore