31 research outputs found

    Deep learning for the early detection of harmful algal blooms and improving water quality monitoring

    Get PDF
    Climate change will affect how water sources are managed and monitored. The frequency of algal blooms will increase with climate change as it presents favourable conditions for the reproduction of phytoplankton. During monitoring, possible sensory failures in monitoring systems result in partially filled data which may affect critical systems. Therefore, imputation becomes necessary to decrease error and increase data quality. This work investigates two issues in water quality data analysis: improving data quality and anomaly detection. It consists of three main topics: data imputation, early algal bloom detection using in-situ data and early algal bloom detection using multiple modalities.The data imputation problem is addressed by experimenting with various methods with a water quality dataset that includes four locations around the North Sea and the Irish Sea with different characteristics and high miss rates, testing model generalisability. A novel neural network architecture with self-attention is proposed in which imputation is done in a single pass, reducing execution time. The self-attention components increase the interpretability of the imputation process at each stage of the network, providing knowledge to domain experts.After data curation, algal activity is predicted using transformer networks, between 1 to 7 days ahead, and the importance of the input with regard to the output of the prediction model is explained using SHAP, aiming to explain model behaviour to domain experts which is overlooked in previous approaches. The prediction model improves bloom detection performance by 5% on average and the explanation summarizes the complex structure of the model to input-output relationships. Performance improvements on the initial unimodal bloom detection model are made by incorporating multiple modalities into the detection process which were only used for validation purposes previously. The problem of missing data is also tackled by using coordinated representations, replacing low quality in-situ data with satellite data and vice versa, instead of imputation which may result in biased results

    Identification of prognostic indicators of healthy and unhealthy conditions with a machine learning-based systems biology approach using gut microbiome data

    Get PDF
    Inflammatory bowel disease (IBD) is associated with alterations in the intestinal microbiome. However, the precise nature of these microbial changes remains unclear. With billions of microbes within the gut, novel and powerful computational techniques are required to identify the relevant shifts in the microbiota that contribute to healthy and unhealthy conditions. Machine learning (ML) allows a data-driven approach to identify these discrete dynamic changes. However, the interpretation and biological validation of the findings from ML algorithms remain a challenge. By combining ML and Systems Biology (SB) approaches, this thesis aims to characterise key microbial factors in IBD pathogenesis by extracting prognostic indicators from the human gut microbiome. The causal relationship between the changes in the gut microbiome and IBD is difficult to establish. Data from cross-sectional studies are plagued by confounding factors and inconsistencies between cohorts. Rich longitudinal datasets and integrated metagenomic, multi-omic, and electronic healthcare records can be used to overcome these limitations. In this PhD thesis, I have developed an integrated ML-based microbiome analysis pipeline to identify prognostic indicators for IBD from longitudinal microbiome data. Furthermore, using a variety of SB approaches, the interplay between the host and the microbiome has been explored to provide insights into the mechanisms during healthy and unhealthy conditions

    Proceedings of the 38th International Workshop on Statistical Modelling

    Get PDF

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications

    Infrastructure planning for electrified transportation

    Full text link
    Due to the climate crisis, the importance of reducing greenhouse gas (GHG) has been recognized by governments, private companies and the general public alike. Yet carbon capturing-based approaches are difficult to integrate with transportation, which is one of the largest GHG producing sectors, Therefore, electrification is the only viable approach to reduce emissions from transportation, by greatly increasing the market share of electric vehicles (EVs). However, the mass adoption of either (or both) of battery EVs (BEVs) and fuel cell EVs (FCEVs) require a large amount of supporting infrastructures, particularly the construction of EV charging stations (EVCSs) for BEVs and hydrogen refuelling stations (HRSs) for FCEVs. The goal of this study is to provide effective approaches for the sizing and sitting of EVCSs and HRSs to facilitate the deployment of BEVs and FCEVs. The background and an overview of the thesis are provided in Chapter 1, where the gaps in the current research are pointed out and the objectives of the thesis are formulated. Chapter 2 reviewed the current state of technologies regarding the hydrogen life cycle as well as the popular planning models for EVCSs and HRSs. In Chapter 3, to achieve a competitive strategy from the perspective of private companies, a market-based framework is proposed for the problem of EVCS planning by leveraging Graph Convolutional Network (GCN) and game theory. In Chapter 4, a multi-objective planning model is developed for EVCSs and the expansion of distribution network with significant renewable components while considering uncertainties in EV charging behaviour. Additionally, in Chapter 5, a planning model of HRS maximises the long-term profit while considering different practical constraints. The HRS planning model also addresses short-term demand uncertainty via redistribution. The models that are developed in this study are validated using either synthetic or real-world case studies, and the simulation results showed the effectiveness of the proposed models. Finally Chapter 6 summarises the major achievements of the thesis and provides directions for further research

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods