52 research outputs found

    Population Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programming

    Get PDF
    [Abstract] Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431D 2017/23Instituto de Salud Carlos III; PI17/0182

    Computer aided classification of histopathological damage in images of haematoxylin and eosin stained human skin

    Get PDF
    EngD ThesisExcised human skin can be used as a model to assess the potency, immunogenicity and contact sensitivity of potential therapeutics or cosmetics via the assessment of histological damage. The current method of assessing the damage uses traditional manual histological assessment, which is inherently subjective, time consuming and prone to intra-observer variability. Computer aided analysis has the potential to address issues surrounding traditional histological techniques through the application of quantitative analysis. This thesis describes the development of a computer aided process to assess the immune-mediated structural breakdown of human skin tissue. Research presented includes assessment and optimisation of image acquisition methodologies, development of an image processing and segmentation algorithm, identification and extraction of a novel set of descriptive image features and the evaluation of a selected subset of these features in a classification model. A new segmentation method is presented to identify epidermis tissue from skin with varying degrees of histopathological damage. Combining enhanced colour information with general image intensity information, the fully automated methodology segments the epidermis with a mean specificity of 97.7%, a mean sensitivity of 89.4% and a mean accuracy of 96.5% and segments effectively for different severities of tissue damage. A set of 140 feature measurements containing information about the tissue changes associated with different grades of histopathological skin damage were identified and a wrapper algorithm employed to select a subset of the extracted features, evaluating feature subsets based their prediction error for an independent test set in a Naïve Bayes Classifier. The final classification algorithm classified a 169 image set with an accuracy of 94.1%, of these images 20 were an unseen validation set for which the accuracy was 85.0%. The final classification method has a comparable accuracy to the existing manual method, improved repeatability and reproducibility and does not require an experienced histopathologist

    Aplicada à previsão de parâmetros farmacocinéticos

    Get PDF
    A Programação Genética (PG) é uma técnica de Aprendizagem de Máquina (Machine Learning (ML)) aplicada em problemas de otimização onde pretende-se achar a melhor solução num conjunto de possíveis soluções. A PG faz parte do paradigma conhecido por Computação Evolucionária (CE) que tem como inspiração à teoria da evolução natural das espécies para orientar a pesquisa das soluções. Neste trabalho, é avaliada a performance da PG no problema de previsão de parâmetros farmacocinéticos utilizados no processo de desenvolvimento de fármacos. Este é um problema de otimização onde, dado um conjunto de descritores moleculares de fármacos e os valores correspondentes dos parâmetros farmacocinéticos ou de sua atividade molecular, utiliza-se a PG para construir uma função matemática que estima tais valores. Para tal, foram utilizados dados de fármacos com os valores conhecidos de alguns parâmetros farmacocinéticos. Para avaliar o desempenho da PG na resolução do problema em questão, foram implementados diferentes modelos de PG com diferentes funções de fitness e configurações. Os resultados obtidos pelos diferentes modelos foram comparados com os resultados atualmente publicados na literatura e os mesmos confirmam que a PG é uma técnica promissora do ponto de vista da precisão das soluções encontradas, da capacidade de generalização e da correlação entre os valores previstos e os valores reais

    Biological investigation and predictive modelling of foaming in anaerobic digester

    Get PDF
    Anaerobic digestion (AD) of waste has been identified as a leading technology for greener renewable energy generation as an alternative to fossil fuel. AD will reduce waste through biochemical processes, converting it to biogas which could be used as a source of renewable energy and the residue bio-solids utilised in enriching the soil. A problem with AD though is with its foaming and the associated biogas loss. Tackling this problem effectively requires identifying and effectively controlling factors that trigger and promote foaming. In this research, laboratory experiments were initially carried out to differentiate foaming causal and exacerbating factors. Then the impact of the identified causal factors (organic loading rate-OLR and volatile fatty acid-VFA) on foaming occurrence were monitored and recorded. Further analysis of foaming and nonfoaming sludge samples by metabolomics techniques confirmed that the OLR and VFA are the prime causes of foaming occurrence in AD. In addition, the metagenomics analysis showed that the phylum bacteroidetes and proteobacteria were found to be predominant with a higher relative abundance of 30% and 29% respectively while the phylum actinobacteria representing the most prominent filamentous foam causing bacteria such as Norcadia amarae and Microthrix Parvicella had a very low and consistent relative abundance of 0.9% indicating that the foaming occurrence in the AD studied was not triggered by the presence of filamentous bacteria. Consequently, data driven models to predict foam formation were developed based on experimental data with inputs (OLR and VFA in the feed) and output (foaming occurrence). The models were extensively validated and assessed based on the mean squared error (MSE), root mean squared error (RMSE), R2 and mean absolute error (MAE). Levenberg Marquadt neural network model proved to be the best model for foaming prediction in AD, with RMSE = 5.49, MSE = 30.19 and R2 = 0.9435. The significance of this study is the development of a parsimonious and effective modelling tool that enable AD operators to proactively avert foaming occurrence, as the two model input variables (OLR and VFA) can be easily adjustable through simple programmable logic controller

    Subgroup discovery for structured target concepts

    Get PDF
    The main object of study in this thesis is subgroup discovery, a theoretical framework for finding subgroups in data—i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset. This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. We introduce the concept of representative subgroups, which we use not only to ensure the fairness of a sub-population with regard to a sensitive trait, such as race or gender, but also to go beyond known trends in the data. For entities with additional relational information that can be encoded as a graph, we introduce a novel measure of robust connectedness which improves on established alternative measures of density; we then provide a method that uses this measure to discover which named sub-populations are more well-connected. Our contributions within subgroup discovery crescent with the introduction of kernelised subgroup discovery: a novel framework that enables the discovery of subgroups on i.i.d. target concepts with virtually any kind of structure. Importantly, our framework additionally provides a concrete and efficient tool that works out-of-the-box without any modification, apart from specifying the Gramian of a positive definite kernel. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel. Our kernel allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, while we also propose meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel method.Der Hauptgegenstand dieser Arbeit ist die Subgruppenentdeckung (Subgroup Discovery), ein theoretischer Rahmen für das Auffinden von Subgruppen in Daten—d. h. benannte Teilpopulationen—deren Verhalten in Bezug auf ein bestimmtes Targetkonzept im Vergleich zum Rest des Datensatzes außergewöhnlich ist. Es handelt sich hierbei um ein leistungsfähiges Instrument, das einem menschlichen Publikum wichtige Informationen vermittelt. Allerdings ist es trotz bisherigen Fortschritte auf einfache Targetkonzepte beschränkt. In dieser Arbeit schlagen wir Algorithmen vor, die diesen Rahmen auf neuartige Anwendungsbereiche übertragen. Wir führen das Konzept der repräsentativen Untergruppen ein, mit dem wir nicht nur die Fairness einer Teilpopulation in Bezug auf ein sensibles Merkmal wie Rasse oder Geschlecht sicherstellen, sondern auch über bekannte Trends in den Daten hinausgehen können. Für Entitäten mit zusätzlicher relationalen Information, die als Graph kodiert werden kann, führen wir ein neuartiges Maß für robuste Verbundenheit ein, das die etablierten alternativen Dichtemaße verbessert; anschließend stellen wir eine Methode bereit, die dieses Maß verwendet, um herauszufinden, welche benannte Teilpopulationen besser verbunden sind. Unsere Beiträge in diesem Rahmen gipfeln in der Einführung der kernelisierten Subgruppenentdeckung: ein neuartiger Rahmen, der die Entdeckung von Subgruppen für u.i.v. Targetkonzepten mit praktisch jeder Art von Struktur ermöglicht. Wichtigerweise, unser Rahmen bereitstellt zusätzlich ein konkretes und effizientes Werkzeug, das ohne jegliche Modifikation funktioniert, abgesehen von der Angabe des Gramian eines positiv definitiven Kernels. Für den Einsatz innerhalb der kernelisierten Subgruppentdeckung, aber auch für jede andere Art von Kernel-Methode, führen wir zusätzlich einen neuartigen Random-Walk-Graph-Kernel ein. Unser Kernel ermöglicht die Feinabstimmung der Ausrichtung zwischen den Eckpunkten der beiden unter-Vergleich-gestelltenen Graphen während der Zählung der Random Walks, während wir auch sinnvolle strukturbewusste Vertex-Labels vorschlagen, um diese neue Fähigkeit zu nutzen. Mit diesen Beiträgen erweitern wir die Anwendbarkeit der Subgruppentdeckung gründlich und definieren wir sie im Endeffekt als Kernel-Methode neu

    The development, validation and demonstration of an automated rodent tracker and whisker detector

    Get PDF
    Quantitatively assessing behaviour to measure animal behaviour and motor control is challenging because there is a lack of unobtrusive behavioural models. Some studies have suggested that measuring whisker movements might be a good, quantitative behavioural model. However, whiskers are very thin, small and move very fast; and there is not yet an automated program that can detect whiskers in a fully-intact, freely-moving animal. Therefore, this thesis develops, validates and demonstrates a novel, fullyautomated rodent tracker and a whisker annotator, that simultaneously measures locomotion and exploration behaviours as well as whisker movements. The �rst step in designing an automated rodent tracker and whisker detector, is to extract a reliable ground truth from which to compare any tracked points to. Therefore, the Manual Whisker Annotator (MWA) was designed as a validator and calibrator for the subsequent trackers and detectors. The second step is to provide a reliable body and head contour. Therefore, the Automated Rodent Tracker (ART) was developed and validated, compared to a semi-automated tracker (Ethovision) and the MWA. Finally, a fully-automated whisker detector (FAWD) was designed and validated, using two existing semi-automatic whisker trackers (BWTT and Whisk) and the MWA. FAWD incorporates a variety of image-processing algorithms, including super sampling, dilation and subtraction and frangi �ltering to reliably detect whiskers. Both ART and FAWD were also successfully demonstrated on videos collected from SOD1 mice, a model of Amyotrophic Lateral Sclerosis, from day 30 to 120. The development of this software enables whisker movements and locomotion to be tracked in a repeatable fashion, and the fully-automated nature of the software means that many videos can be collected and quickly processed with minimal user input. This thesis develops and validates a suite of behavioural software that provides robust and quantitative measures of rodent behaviour for basic research or drug discovery. Future work will be to demonstrate this software on a larger range of rodent models of neurodegeneration, to further showcase the exibility and quantitative nature of this behavioural model

    Advancements and Breakthroughs in Ultrasound Imaging

    Get PDF
    Ultrasonic imaging is a powerful diagnostic tool available to medical practitioners, engineers and researchers today. Due to the relative safety, and the non-invasive nature, ultrasonic imaging has become one of the most rapidly advancing technologies. These rapid advances are directly related to the parallel advancements in electronics, computing, and transducer technology together with sophisticated signal processing techniques. This book focuses on state of the art developments in ultrasonic imaging applications and underlying technologies presented by leading practitioners and researchers from many parts of the world

    Development of an image processing method for automated, non-invasive and scale-independent monitoring of adherent cell cultures

    Get PDF
    Adherent cell culture is a key experimental method for biological investigations in diverse areas such as developmental biology, drug discovery and biotechnology. Light microscopy-based methods, for example phase contrast microscopy (PCM), are routinely used for visual inspection of adherent cells cultured in transparent polymeric vessels. However, the outcome of such inspections is qualitative and highly subjective. Analytical methods that produce quantitative results can be used but often at the expense of culture integrity or viability. In this work, an imaging-based strategy to adherent cell cultures monitoring was investigated. Automated image processing and analysis of PCM images enabled quantitative measurements of key cell culture characteristics. Two types of segmentation algorithms for the detection of cellular objects on PCM images were evaluated. The first one, based on contrast filters and dynamic programming was quick (<1s per 1280×960 image) and performed well for different cell lines, over a wide range of imaging conditions. The second approach, termed ‘trainable segmentation’, was based on machine learning using a variety of image features such as local structures and symmetries. It accommodated complex segmentation tasks while maintaining low processing times (<5s per 1280×960 image). Based on the output from these segmentation algorithms, imaging-based monitoring of a large palette of cell responses was demonstrated, including proliferation, growth arrest, differentiation, and cell death. This approach is non-invasive and applicable to any transparent culture vessel, including microfabricated culture devices where a lack of suitable analytical methods often limits their applicability. This work was a significant contribution towards the establishment of robust, standardised, and affordable monitoring methods for adherent cell cultures. Finally, automated image processing was combined with computer-controlled cultures in small-scale devices. This provided a first demonstration of how adaptive culture protocols could be established; i.e. culture protocols which are based on cellular response instead of arbitrary time points
    corecore