342 research outputs found
Performance and Competitiveness of Tree-Based Pipeline Optimization Tool
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceAutomated machine learning (AutoML) is the process of automating the entire machine learn-ing workflow when applied to real-world problems. AutoML can increase data science produc-tivity while keeping the same performance and accuracy, allowing non-experts to use complex machine learning methods. Tree-based Pipeline Optimization Tool (TPOT) was one of the first AutoML methods created by data scientists and is targeted to optimize machine learning pipe-lines using genetic programming. While still under active development, TPOT is a very prom-ising AutoML tool. This Thesis aims to explore the algorithm and analyse its performance using real word data. Results show that evolution-based optimization is at least as accurate as TPOT initialization. The effectiveness of genetic operators, however, depends on the nature of the test case
Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data
Programa de Doctorado en BiotecnologĂa, IngenierĂa y TecnologĂa QuĂmicaLĂnea de InvestigaciĂłn: IngenierĂa, Ciencia de Datos y BioinformĂĄticaClave Programa: DBICĂłdigo LĂnea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques.
Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e InformĂĄtic
Discovering Causal Relations and Equations from Data
Physics is a field of science that has traditionally used the scientific
method to answer questions about why natural phenomena occur and to make
testable models that explain the phenomena. Discovering equations, laws and
principles that are invariant, robust and causal explanations of the world has
been fundamental in physical sciences throughout the centuries. Discoveries
emerge from observing the world and, when possible, performing interventional
studies in the system under study. With the advent of big data and the use of
data-driven methods, causal and equation discovery fields have grown and made
progress in computer science, physics, statistics, philosophy, and many applied
fields. All these domains are intertwined and can be used to discover causal
relations, physical laws, and equations from observational data. This paper
reviews the concepts, methods, and relevant works on causal and equation
discovery in the broad field of Physics and outlines the most important
challenges and promising future lines of research. We also provide a taxonomy
for observational causal and equation discovery, point out connections, and
showcase a complete set of case studies in Earth and climate sciences, fluid
dynamics and mechanics, and the neurosciences. This review demonstrates that
discovering fundamental laws and causal relations by observing natural
phenomena is being revolutionised with the efficient exploitation of
observational data, modern machine learning algorithms and the interaction with
domain knowledge. Exciting times are ahead with many challenges and
opportunities to improve our understanding of complex systems.Comment: 137 page
Leveraging audio-visual speech effectively via deep learning
The rising popularity of neural networks, combined with the recent proliferation of online audio-visual media, has led to a revolution in the way machines encode, recognize, and generate acoustic and visual speech. Despite the ubiquity of naturally paired audio-visual data, only a limited number of works have applied recent advances in deep learning to leverage the duality between audio and video within this domain. This thesis considers the use of neural networks to learn from large unlabelled datasets of audio-visual speech to enable new practical applications. We begin by training a visual speech encoder that predicts latent features extracted from the corresponding audio on a large unlabelled audio-visual corpus. We apply the trained visual encoder to improve performance on lip reading in real-world scenarios. Following this, we extend the idea of video learning from audio by training a model to synthesize raw speech directly from raw video, without the need for text transcriptions. Remarkably, we find that this framework is capable of reconstructing intelligible audio from videos of new, previously unseen speakers. We also experiment with a separate speech reconstruction framework, which leverages recent advances in sequence modeling and spectrogram inversion to improve the realism of the generated speech. We then apply our research in video-to-speech synthesis to advance the state-of-the-art in audio-visual speech enhancement, by proposing a new vocoder-based model that performs particularly well under extremely noisy scenarios. Lastly, we aim to fully realize the potential of paired audio-visual data by proposing two novel frameworks that leverage acoustic and visual speech to train two encoders that learn from each other simultaneously. We leverage these pre-trained encoders for deepfake detection, speech recognition, and lip reading, and find that they consistently yield improvements over training from scratch.Open Acces
Harnessing Evolution in-Materio as an Unconventional Computing Resource
This thesis illustrates the use and development of physical conductive analogue systems for unconventional computing using the Evolution in-Materio (EiM) paradigm. EiM uses an Evolutionary Algorithm to configure and exploit a physical material (or medium) for computation. While EiM processors show promise, fundamental questions and scaling issues remain. Additionally, their development is hindered by slow manufacturing and physical experimentation. This work addressed these issues by implementing simulated models to speed up research efforts, followed by investigations of physically implemented novel in-materio devices.
Initial work leveraged simulated conductive networks as single substrate âmonolithicâ EiM processors, performing classification by formulating the system as an optimisation problem, solved using Differential Evolution. Different material properties and algorithm parameters were isolated and investigated; which explained the capabilities of configurable parameters and showed ideal nanomaterial choice depended upon problem complexity. Subsequently, drawing from concepts in the wider Machine Learning field, several enhancements to monolithic EiM processors were proposed and investigated. These ensured more efficient use of training data, better classification decision boundary placement, an independently optimised readout layer, and a smoother search space. Finally, scalability and performance issues were addressed by constructing in-Materio Neural Networks (iM-NNs), where several EiM processors were stacked in parallel and operated as physical realisations of Hidden Layer neurons. Greater flexibility in system implementation was achieved by re-using a single physical substrate recursively as several virtual neurons, but this sacrificed faster parallelised execution. These novel iM-NNs were first implemented using Simulated in-Materio neurons, and trained for classification as Extreme Learning Machines, which were found to outperform artificial networks of a similar size. Physical iM-NN were then implemented using a Raspberry Pi, custom Hardware Interface and Lambda Diode based Physical in-Materio neurons, which were trained successfully with neuroevolution. A more complex AutoEncoder structure was then proposed and implemented physically to perform dimensionality reduction on a handwritten digits dataset, outperforming both Principal Component Analysis and artificial AutoEncoders.
This work presents an approach to exploit systems with interesting physical dynamics, and leverage them as a computational resource. Such systems could become low power, high speed, unconventional computing assets in the future
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
As systems based on opaque Artificial Intelligence (AI) continue to flourish
in diverse real-world applications, understanding these black box models has
become paramount. In response, Explainable AI (XAI) has emerged as a field of
research with practical and ethical benefits across various domains. This paper
not only highlights the advancements in XAI and its application in real-world
scenarios but also addresses the ongoing challenges within XAI, emphasizing the
need for broader perspectives and collaborative efforts. We bring together
experts from diverse fields to identify open problems, striving to synchronize
research agendas and accelerate XAI in practical applications. By fostering
collaborative discussion and interdisciplinary cooperation, we aim to propel
XAI forward, contributing to its continued success. Our goal is to put forward
a comprehensive proposal for advancing XAI. To achieve this goal, we present a
manifesto of 27 open problems categorized into nine categories. These
challenges encapsulate the complexities and nuances of XAI and offer a road map
for future research. For each problem, we provide promising research directions
in the hope of harnessing the collective intelligence of interested
stakeholders
Evolution from the ground up with Amee â From basic concepts to explorative modeling
Evolutionary theory has been the foundation of biological research for about a century
now, yet over the past few decades, new discoveries and theoretical advances have rapidly
transformed our understanding of the evolutionary process. Foremost among them are
evolutionary developmental biology, epigenetic inheritance, and various forms of evolu-
tionarily relevant phenotypic plasticity, as well as cultural evolution, which ultimately led
to the conceptualization of an extended evolutionary synthesis. Starting from abstract
principles rooted in complexity theory, this thesis aims to provide a unified conceptual
understanding of any kind of evolution, biological or otherwise. This is used in the second
part to develop Amee, an agent-based model that unifies development, niche construction,
and phenotypic plasticity with natural selection based on a simulated ecology. Amee
is implemented in Utopia, which allows performant, integrated implementation and
simulation of arbitrary agent-based models. A phenomenological overview over Ameeâs
capabilities is provided, ranging from the evolution of ecospecies down to the evolution
of metabolic networks and up to beyond-species-level biological organization, all of
which emerges autonomously from the basic dynamics. The interaction of development,
plasticity, and niche construction has been investigated, and it has been shown that while
expected natural phenomena can, in principle, arise, the accessible simulation time and
system size are too small to produce natural evo-devo phenomena and âstructures. Amee thus can be used to simulate the evolution of a wide variety of processes
Build your own closed loop: Graph-based proof of concept in closed loop for autonomous networks
Next Generation Networks (NGNs) are expected to handle heterogeneous technologies, services, verticals and devices of increasing complexity. It is essential to fathom an innovative approach to automatically and efficiently manage NGNs to deliver an adequate end-to-end Quality of Experience (QoE) while reducing operational expenses. An Autonomous Network (AN) using a closed loop can self-monitor, self-evaluate and self-heal, making it a potential solution for managing the NGN dynamically. This study describes the major results of building a closed-loop Proof of Concept (PoC) for various AN use cases organized by the International Telecommunication Union Focus Group on Autonomous Networks (ITU FG-AN). The scope of this PoC includes the representation of closed-loop use cases in a graph format, the development of evolution/exploration mechanisms to create new closed loops based on the graph representations, and the implementation of a reference orchestrator to demonstrate the parsing and validation of the closed loops. The main conclusions and future directions are summarized here, including observations and limitations of the PoC
Understanding Optimisation Processes with Biologically-Inspired Visualisations
Evolutionary algorithms (EAs) constitute a branch of artificial intelligence utilised to evolve solutions to solve optimisation problems abound in industry and research. EAs often generate many solutions and visualisation has been a primary strategy to display EA solutions, given that visualisation is a multi-domain well-evaluated medium to comprehend extensive data. The endeavour of visualising solutions is inherent with challenges resulting from high dimensional phenomenons and the large number of solutions to display. Recently, scholars have produced methods to mitigate some of these known issues when illustrating solutions. However, one key consideration is that displaying the final subset of solutions exclusively (rather than the whole population) discards most of the informativeness of the search, creating inadequate insight into the black-box EA. There is an unequivocal knowledge gap and requirement for methods which can visualise the whole population of solutions from an optimiser and subjugate the high-dimensional problems and scaling issues to create interpretability of the EA search process. Furthermore, a requirement for explainability in evolutionary computing has been demanded by the evolutionary computing community, which could take the form of visualisations, to support EA comprehension much like the support explainable artificial intelligence has brought to artificial intelligence. In this thesis, we report novel visualisation methods that can be used to visualise large and high-dimensional optimiser populations with the aim of creating greater interpretability during a search. We consider the nascent intersection of visualisation and explainability in evolutionary computing. The potential high informativeness of a visualisation method from an early chapter of this work forms an effective platform to develop an explainability visualisation method, namely the population dynamics plot, to attempt to inject explainability into the inner workings of the search process. We further support the visualisation of populations using machine learning to construct models which can capture the characteristics of an EA search and develop intelligent visualisations which use artificial intelligence to potentially enhance and support visualisation for a more informative search process. The methods developed in this thesis are evaluated both quantitatively and qualitatively. We use multi-feature benchmark problems to show the methodâs ability to reveal specific problem characteristics such as disconnected fronts, local optima and bias, as well as potentially creating a better understanding of the problem landscape and optimiser search for evaluating and comparing algorithm performance (we show the visualisation method to be more insightful than conventional metrics like hypervolume alone). One of the most insightful methods developed in this thesis can produce a visualisation requiring less than 1% of the time and memory necessary to produce a visualisation of the same objective space solutions using existing methods. This allows for greater scalability and the use in short compile time applications such as online visualisations. Predicated by an existing visualisation method in this thesis, we then develop and apply an explainability method to a real-world problem and evaluate it to show the method to be highly effective at explaining the search via solutions in the objective spaces, solution lineage and solution variation operators to compactly comprehend, evaluate and communicate the search of an optimiser, although we note the explainability properties are only evaluated against the authorâs ability and could be evaluated further in future work with a usability study. The work is then supported by the development of intelligent visualisation models that may allow one to predict solutions in optima (importantly local optima) in unseen problems by using a machine learning model. The results are effective, with some models able to predict and visualise solution optima with a balanced F1 accuracy metric of 96%. The results of this thesis provide a suite of visualisations which aims to provide greater informativeness of the search and scalability than previously existing literature. The work develops one of the first explainability methods aiming to create greater insight into the search space, solution lineage and reproductive operators. The work applies machine learning to potentially enhance EA understanding via visualisation. These models could also be used for a number of applications outside visualisation. Ultimately, the work provides novel methods for all EA stakeholders which aims to support understanding, evaluation and communication of EA processes with visualisation
Target evaluation and low-thrust trajectory planning for near-Earth asteroid mining
Near-Earth Asteroids (NEAs) are abundant with minerals that would be undoubtedly beneficial for future space exploration, as the utilization of these in-space resources could enable otherwise unaffordable missions.
This thesis aims to address several remaining issues in asteroid mining mission planning, including target selection and ranking, multi-return low thrust trajectory design, NEA mining season determination, asteroid mining campaign designs, and other considerations.
This study starts with a comprehensive asteroid resource investigation and an impulsive roundtrip accessibility analysis for the known 29,266 NEAs and 46% of them are found accessible. By combining the two studies, a NEA resource map is created, providing key knowledge of resource locations, types, reserves, and minimum delta-V requirements to retrieve the resources.
Mining missions are then preliminarily constructed using impulsive trajectories for 13,481 NEAs, and a series of Figures of Merit (FoMs) are proposed. In total, over 900 accessible and known targets for mining water, Platinum Group Metals (PGMs) and silicates are ranked.
Low-thrust mining missions are then studied. New Deep Neural Network (DNN) based models are constructed as the surrogate of the conventional optimization process. The new method reduces by 99.94% the low-thrust trajectory design time. Typical Solar Electric Propulsion (SEP) spacecraft configurations are used to design trajectories for supply delivery and resource transportation. The transportation capabilities of different spacecraft configurations are quantified.
An asteroid mining campaign design framework is proposed, which integrates all the developed models, algorithms, and asteroid data. An example mining campaign on Bennu is presented, and an economic analysis is performed. The sensitivity analysis shows the low thrust mining missions are more resistant to changing economic parameters. Campaigns are then numerically designed and optimized for 76 known water-bearing and 58 potential PGM-bearing targets, using both impulsive and low thrust trajectories. The âNEA mining seasonâ, which was an abstract concept, is validated. The mining seasons are categorized into three major types based on their feasibility for mining. Two 35-year water mining and PGM mining plans are generated. It is found the current known targets can form a 13,000 billion water mining industry. It is found that low thrust-based mining is the key to a successful mining campaign, and that it may increase the profit by 2.8 ~ 8.7 times
- âŠ