342 research outputs found

    Performance and Competitiveness of Tree-Based Pipeline Optimization Tool

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceAutomated machine learning (AutoML) is the process of automating the entire machine learn-ing workflow when applied to real-world problems. AutoML can increase data science produc-tivity while keeping the same performance and accuracy, allowing non-experts to use complex machine learning methods. Tree-based Pipeline Optimization Tool (TPOT) was one of the first AutoML methods created by data scientists and is targeted to optimize machine learning pipe-lines using genetic programming. While still under active development, TPOT is a very prom-ising AutoML tool. This Thesis aims to explore the algorithm and analyse its performance using real word data. Results show that evolution-based optimization is at least as accurate as TPOT initialization. The effectiveness of genetic operators, however, depends on the nature of the test case

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en BiotecnologĂ­a, IngenierĂ­a y TecnologĂ­a QuĂ­micaLĂ­nea de InvestigaciĂłn: IngenierĂ­a, Ciencia de Datos y BioinformĂĄticaClave Programa: DBICĂłdigo LĂ­nea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e InformĂĄtic

    Discovering Causal Relations and Equations from Data

    Full text link
    Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.Comment: 137 page

    Leveraging audio-visual speech effectively via deep learning

    Get PDF
    The rising popularity of neural networks, combined with the recent proliferation of online audio-visual media, has led to a revolution in the way machines encode, recognize, and generate acoustic and visual speech. Despite the ubiquity of naturally paired audio-visual data, only a limited number of works have applied recent advances in deep learning to leverage the duality between audio and video within this domain. This thesis considers the use of neural networks to learn from large unlabelled datasets of audio-visual speech to enable new practical applications. We begin by training a visual speech encoder that predicts latent features extracted from the corresponding audio on a large unlabelled audio-visual corpus. We apply the trained visual encoder to improve performance on lip reading in real-world scenarios. Following this, we extend the idea of video learning from audio by training a model to synthesize raw speech directly from raw video, without the need for text transcriptions. Remarkably, we find that this framework is capable of reconstructing intelligible audio from videos of new, previously unseen speakers. We also experiment with a separate speech reconstruction framework, which leverages recent advances in sequence modeling and spectrogram inversion to improve the realism of the generated speech. We then apply our research in video-to-speech synthesis to advance the state-of-the-art in audio-visual speech enhancement, by proposing a new vocoder-based model that performs particularly well under extremely noisy scenarios. Lastly, we aim to fully realize the potential of paired audio-visual data by proposing two novel frameworks that leverage acoustic and visual speech to train two encoders that learn from each other simultaneously. We leverage these pre-trained encoders for deepfake detection, speech recognition, and lip reading, and find that they consistently yield improvements over training from scratch.Open Acces

    Harnessing Evolution in-Materio as an Unconventional Computing Resource

    Get PDF
    This thesis illustrates the use and development of physical conductive analogue systems for unconventional computing using the Evolution in-Materio (EiM) paradigm. EiM uses an Evolutionary Algorithm to configure and exploit a physical material (or medium) for computation. While EiM processors show promise, fundamental questions and scaling issues remain. Additionally, their development is hindered by slow manufacturing and physical experimentation. This work addressed these issues by implementing simulated models to speed up research efforts, followed by investigations of physically implemented novel in-materio devices. Initial work leveraged simulated conductive networks as single substrate ‘monolithic’ EiM processors, performing classification by formulating the system as an optimisation problem, solved using Differential Evolution. Different material properties and algorithm parameters were isolated and investigated; which explained the capabilities of configurable parameters and showed ideal nanomaterial choice depended upon problem complexity. Subsequently, drawing from concepts in the wider Machine Learning field, several enhancements to monolithic EiM processors were proposed and investigated. These ensured more efficient use of training data, better classification decision boundary placement, an independently optimised readout layer, and a smoother search space. Finally, scalability and performance issues were addressed by constructing in-Materio Neural Networks (iM-NNs), where several EiM processors were stacked in parallel and operated as physical realisations of Hidden Layer neurons. Greater flexibility in system implementation was achieved by re-using a single physical substrate recursively as several virtual neurons, but this sacrificed faster parallelised execution. These novel iM-NNs were first implemented using Simulated in-Materio neurons, and trained for classification as Extreme Learning Machines, which were found to outperform artificial networks of a similar size. Physical iM-NN were then implemented using a Raspberry Pi, custom Hardware Interface and Lambda Diode based Physical in-Materio neurons, which were trained successfully with neuroevolution. A more complex AutoEncoder structure was then proposed and implemented physically to perform dimensionality reduction on a handwritten digits dataset, outperforming both Principal Component Analysis and artificial AutoEncoders. This work presents an approach to exploit systems with interesting physical dynamics, and leverage them as a computational resource. Such systems could become low power, high speed, unconventional computing assets in the future

    Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions

    Full text link
    As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders

    Evolution from the ground up with Amee – From basic concepts to explorative modeling

    Get PDF
    Evolutionary theory has been the foundation of biological research for about a century now, yet over the past few decades, new discoveries and theoretical advances have rapidly transformed our understanding of the evolutionary process. Foremost among them are evolutionary developmental biology, epigenetic inheritance, and various forms of evolu- tionarily relevant phenotypic plasticity, as well as cultural evolution, which ultimately led to the conceptualization of an extended evolutionary synthesis. Starting from abstract principles rooted in complexity theory, this thesis aims to provide a unified conceptual understanding of any kind of evolution, biological or otherwise. This is used in the second part to develop Amee, an agent-based model that unifies development, niche construction, and phenotypic plasticity with natural selection based on a simulated ecology. Amee is implemented in Utopia, which allows performant, integrated implementation and simulation of arbitrary agent-based models. A phenomenological overview over Amee’s capabilities is provided, ranging from the evolution of ecospecies down to the evolution of metabolic networks and up to beyond-species-level biological organization, all of which emerges autonomously from the basic dynamics. The interaction of development, plasticity, and niche construction has been investigated, and it has been shown that while expected natural phenomena can, in principle, arise, the accessible simulation time and system size are too small to produce natural evo-devo phenomena and –structures. Amee thus can be used to simulate the evolution of a wide variety of processes

    Build your own closed loop: Graph-based proof of concept in closed loop for autonomous networks

    Get PDF
    Next Generation Networks (NGNs) are expected to handle heterogeneous technologies, services, verticals and devices of increasing complexity. It is essential to fathom an innovative approach to automatically and efficiently manage NGNs to deliver an adequate end-to-end Quality of Experience (QoE) while reducing operational expenses. An Autonomous Network (AN) using a closed loop can self-monitor, self-evaluate and self-heal, making it a potential solution for managing the NGN dynamically. This study describes the major results of building a closed-loop Proof of Concept (PoC) for various AN use cases organized by the International Telecommunication Union Focus Group on Autonomous Networks (ITU FG-AN). The scope of this PoC includes the representation of closed-loop use cases in a graph format, the development of evolution/exploration mechanisms to create new closed loops based on the graph representations, and the implementation of a reference orchestrator to demonstrate the parsing and validation of the closed loops. The main conclusions and future directions are summarized here, including observations and limitations of the PoC

    Understanding Optimisation Processes with Biologically-Inspired Visualisations

    Get PDF
    Evolutionary algorithms (EAs) constitute a branch of artificial intelligence utilised to evolve solutions to solve optimisation problems abound in industry and research. EAs often generate many solutions and visualisation has been a primary strategy to display EA solutions, given that visualisation is a multi-domain well-evaluated medium to comprehend extensive data. The endeavour of visualising solutions is inherent with challenges resulting from high dimensional phenomenons and the large number of solutions to display. Recently, scholars have produced methods to mitigate some of these known issues when illustrating solutions. However, one key consideration is that displaying the final subset of solutions exclusively (rather than the whole population) discards most of the informativeness of the search, creating inadequate insight into the black-box EA. There is an unequivocal knowledge gap and requirement for methods which can visualise the whole population of solutions from an optimiser and subjugate the high-dimensional problems and scaling issues to create interpretability of the EA search process. Furthermore, a requirement for explainability in evolutionary computing has been demanded by the evolutionary computing community, which could take the form of visualisations, to support EA comprehension much like the support explainable artificial intelligence has brought to artificial intelligence. In this thesis, we report novel visualisation methods that can be used to visualise large and high-dimensional optimiser populations with the aim of creating greater interpretability during a search. We consider the nascent intersection of visualisation and explainability in evolutionary computing. The potential high informativeness of a visualisation method from an early chapter of this work forms an effective platform to develop an explainability visualisation method, namely the population dynamics plot, to attempt to inject explainability into the inner workings of the search process. We further support the visualisation of populations using machine learning to construct models which can capture the characteristics of an EA search and develop intelligent visualisations which use artificial intelligence to potentially enhance and support visualisation for a more informative search process. The methods developed in this thesis are evaluated both quantitatively and qualitatively. We use multi-feature benchmark problems to show the method’s ability to reveal specific problem characteristics such as disconnected fronts, local optima and bias, as well as potentially creating a better understanding of the problem landscape and optimiser search for evaluating and comparing algorithm performance (we show the visualisation method to be more insightful than conventional metrics like hypervolume alone). One of the most insightful methods developed in this thesis can produce a visualisation requiring less than 1% of the time and memory necessary to produce a visualisation of the same objective space solutions using existing methods. This allows for greater scalability and the use in short compile time applications such as online visualisations. Predicated by an existing visualisation method in this thesis, we then develop and apply an explainability method to a real-world problem and evaluate it to show the method to be highly effective at explaining the search via solutions in the objective spaces, solution lineage and solution variation operators to compactly comprehend, evaluate and communicate the search of an optimiser, although we note the explainability properties are only evaluated against the author’s ability and could be evaluated further in future work with a usability study. The work is then supported by the development of intelligent visualisation models that may allow one to predict solutions in optima (importantly local optima) in unseen problems by using a machine learning model. The results are effective, with some models able to predict and visualise solution optima with a balanced F1 accuracy metric of 96%. The results of this thesis provide a suite of visualisations which aims to provide greater informativeness of the search and scalability than previously existing literature. The work develops one of the first explainability methods aiming to create greater insight into the search space, solution lineage and reproductive operators. The work applies machine learning to potentially enhance EA understanding via visualisation. These models could also be used for a number of applications outside visualisation. Ultimately, the work provides novel methods for all EA stakeholders which aims to support understanding, evaluation and communication of EA processes with visualisation

    Target evaluation and low-thrust trajectory planning for near-Earth asteroid mining

    Full text link
    Near-Earth Asteroids (NEAs) are abundant with minerals that would be undoubtedly beneficial for future space exploration, as the utilization of these in-space resources could enable otherwise unaffordable missions. This thesis aims to address several remaining issues in asteroid mining mission planning, including target selection and ranking, multi-return low thrust trajectory design, NEA mining season determination, asteroid mining campaign designs, and other considerations. This study starts with a comprehensive asteroid resource investigation and an impulsive roundtrip accessibility analysis for the known 29,266 NEAs and 46% of them are found accessible. By combining the two studies, a NEA resource map is created, providing key knowledge of resource locations, types, reserves, and minimum delta-V requirements to retrieve the resources. Mining missions are then preliminarily constructed using impulsive trajectories for 13,481 NEAs, and a series of Figures of Merit (FoMs) are proposed. In total, over 900 accessible and known targets for mining water, Platinum Group Metals (PGMs) and silicates are ranked. Low-thrust mining missions are then studied. New Deep Neural Network (DNN) based models are constructed as the surrogate of the conventional optimization process. The new method reduces by 99.94% the low-thrust trajectory design time. Typical Solar Electric Propulsion (SEP) spacecraft configurations are used to design trajectories for supply delivery and resource transportation. The transportation capabilities of different spacecraft configurations are quantified. An asteroid mining campaign design framework is proposed, which integrates all the developed models, algorithms, and asteroid data. An example mining campaign on Bennu is presented, and an economic analysis is performed. The sensitivity analysis shows the low thrust mining missions are more resistant to changing economic parameters. Campaigns are then numerically designed and optimized for 76 known water-bearing and 58 potential PGM-bearing targets, using both impulsive and low thrust trajectories. The “NEA mining season”, which was an abstract concept, is validated. The mining seasons are categorized into three major types based on their feasibility for mining. Two 35-year water mining and PGM mining plans are generated. It is found the current known targets can form a 21,000billionPGMminingindustryanda21,000 billion PGM mining industry and a 13,000 billion water mining industry. It is found that low thrust-based mining is the key to a successful mining campaign, and that it may increase the profit by 2.8 ~ 8.7 times
    • 

    corecore