2,017 research outputs found

    Faster inference from state space models via GPU computing

    Get PDF
    Funding: C.F.-J. is funded via a doctoral scholarship from the University of St Andrews, School of Mathematics and Statistics.Inexpensive Graphics Processing Units (GPUs) offer the potential to greatly speed up computation by employing their massively parallel architecture to perform arithmetic operations more efficiently. Population dynamics models are important tools in ecology and conservation. Modern Bayesian approaches allow biologically realistic models to be constructed and fitted to multiple data sources in an integrated modelling framework based on a class of statistical models called state space models. However, model fitting is often slow, requiring hours to weeks of computation. We demonstrate the benefits of GPU computing using a model for the population dynamics of British grey seals, fitted with a particle Markov chain Monte Carlo algorithm. Speed-ups of two orders of magnitude were obtained for estimations of the log-likelihood, compared to a traditional ‘CPU-only’ implementation, allowing for an accurate method of inference to be used where this was previously too computationally expensive to be viable. GPU computing has enormous potential, but one barrier to further adoption is a steep learning curve, due to GPUs' unique hardware architecture. We provide a detailed description of hardware and software setup, and our case study provides a template for other similar applications. We also provide a detailed tutorial-style description of GPU hardware architectures, and examples of important GPU-specific programming practices.Publisher PDFPeer reviewe

    Computational and experimental studies on the reaction mechanism of bio-oil components with additives for increased stability and fuel quality

    Get PDF
    As one of the world’s largest palm oil producers, Malaysia encountered a major disposal problem as vast amount of oil palm biomass wastes are produced. To overcome this problem, these biomass wastes can be liquefied into biofuel with fast pyrolysis technology. However, further upgradation of fast pyrolysis bio-oil via direct solvent addition was required to overcome it’s undesirable attributes. In addition, the high production cost of biofuels often hinders its commercialisation. Thus, the designed solvent-oil blend needs to achieve both fuel functionality and economic targets to be competitive with the conventional diesel fuel. In this thesis, a multi-stage computer-aided molecular design (CAMD) framework was employed for bio-oil solvent design. In the design problem, molecular signature descriptors were applied to accommodate different classes of property prediction models. However, the complexity of the CAMD problem increases as the height of signature increases due to the combinatorial nature of higher order signature. Thus, a consistency rule was developed reduce the size of the CAMD problem. The CAMD problem was then further extended to address the economic aspects via fuzzy multi-objective optimisation approach. Next, a rough-set based machine learning (RSML) model has been proposed to correlate the feedstock characterisation and pyrolysis condition with the pyrolysis bio-oil properties by generating decision rules. The generated decision rules were analysed from a scientific standpoint to identify the underlying patterns, while ensuring the rules were logical. The decision rules generated can be used to select optimal feedstock composition and pyrolysis condition to produce pyrolysis bio-oil of targeted fuel properties. Next, the results obtained from the computational approaches were verified through experimental study. The generated pyrolysis bio-oils were blended with the identified solvents at various mixing ratio. In addition, emulsification of the solvent-oil blend in diesel was also conducted with the help of surfactants. Lastly, potential extensions and prospective work for this study have been discuss in the later part of this thesis. To conclude, this thesis presented the combination of computational and experimental approaches in upgrading the fuel properties of pyrolysis bio-oil. As a result, high quality biofuel can be generated as a cleaner burning replacement for conventional diesel fuel

    Data-assisted modeling of complex chemical and biological systems

    Get PDF
    Complex systems are abundant in chemistry and biology; they can be multiscale, possibly high-dimensional or stochastic, with nonlinear dynamics and interacting components. It is often nontrivial (and sometimes impossible), to determine and study the macroscopic quantities of interest and the equations they obey. One can only (judiciously or randomly) probe the system, gather observations and study trends. In this thesis, Machine Learning is used as a complement to traditional modeling and numerical methods to enable data-assisted (or data-driven) dynamical systems. As case studies, three complex systems are sourced from diverse fields: The first one is a high-dimensional computational neuroscience model of the Suprachiasmatic Nucleus of the human brain, where bifurcation analysis is performed by simply probing the system. Then, manifold learning is employed to discover a latent space of neuronal heterogeneity. Second, Machine Learning surrogate models are used to optimize dynamically operated catalytic reactors. An algorithmic pipeline is presented through which it is possible to program catalysts with active learning. Third, Machine Learning is employed to extract laws of Partial Differential Equations describing bacterial Chemotaxis. It is demonstrated how Machine Learning manages to capture the rules of bacterial motility in the macroscopic level, starting from diverse data sources (including real-world experimental data). More importantly, a framework is constructed though which already existing, partial knowledge of the system can be exploited. These applications showcase how Machine Learning can be used synergistically with traditional simulations in different scenarios: (i) Equations are available but the overall system is so high-dimensional that efficiency and explainability suffer, (ii) Equations are available but lead to highly nonlinear black-box responses, (iii) Only data are available (of varying source and quality) and equations need to be discovered. For such data-assisted dynamical systems, we can perform fundamental tasks, such as integration, steady-state location, continuation and optimization. This work aims to unify traditional scientific computing and Machine Learning, in an efficient, data-economical, generalizable way, where both the physical system and the algorithm matter

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Study of the Soil Water Movement in Irrigated Agriculture

    Get PDF
    In irrigated agriculture, the study of the various ways water infiltrates into the soils is necessary. In this respect, soil hydraulic properties, such as soil moisture retention curve, diffusivity, and hydraulic conductivity functions, play a crucial role, as they control the infiltration process and the soil water and solute movement. This Special Issue presents the recent developments in the various aspects of soil water movement in irrigated agriculture through a number of research topics that tackle one or more of the following challenges: irrigation systems and one-, two-, and three-dimensional soil water movement; one-, two-, and three-dimensional infiltration analysis from a disc infiltrometer; dielectric devices for monitoring soil water content and methods for assessment of soil water pressure head; soil hydraulic properties and their temporal and spatial variability under the irrigation situations; saturated–unsaturated flow model in irrigated soils; soil water redistribution and the role of hysteresis; soil water movement and drainage in irrigated agriculture; salt accumulation, soil salinization, and soil salinity assessment; effect of salts on hydraulic conductivity; and soil conditioners and mulches that change the upper soil hydraulic properties and their effect on soil water movement

    Insect neuroethology of reinforcement learning

    Get PDF
    Historically, reinforcement learning is a branch of machine learning founded on observations of how animals learn. This involved collaboration between the fields of biology and artificial intelligence that was beneficial to both fields, creating smarter artificial agents and improving the understanding of how biological systems function. The evolution of reinforcement learning during the past few years was rapid but substantially diverged from providing insights into how biological systems work, opening a gap between reinforcement learning and biology. In an attempt to close this gap, this thesis studied the insect neuroethology of reinforcement learning, that is, the neural circuits that underlie reinforcement-learning-related behaviours in insects. The goal was to extract a biologically plausible plasticity function from insect-neuronal data, use this to explain biological findings and compare it to more standard reinforcement learning models. Consequently, a novel dopaminergic plasticity rule was developed to approximate the function of dopamine as the plasticity mechanism between neurons in the insect brain. This allowed a range of observed learning phenomena to happen in parallel, like memory depression, potentiation, recovery, and saturation. In addition, by using anatomical data of connections between neurons in the mushroom body neuropils of the insect brain, the neural incentive circuit of dopaminergic and output neurons was also explored. This, together with the dopaminergic plasticity rule, allowed for dynamic collaboration amongst parallel memory functions, such as acquisition, transfer, and forgetting. When tested on olfactory conditioning paradigms, the model reproduced the observed changes in the activity of the identified neurons in fruit flies. It also replicated the observed behaviour of the animals and it allowed for flexible behavioural control. Inspired by the visual navigation system of desert ants, the model was further challenged in the visual place recognition task. Although a relatively simple encoding of the olfactory information was sufficient to explain odour learning, a more sophisticated encoding of the visual input was required to increase the separability among the visual inputs and enable visual place recognition. Signal whitening and sparse combinatorial encoding were sufficient to boost the performance of the system in this task. The incentive circuit enabled the encoding of increasing familiarity along a known route, which dropped proportionally to the distance of the animal from that route. Finally, the proposed model was challenged in delayed reinforcement tasks, suggesting that it might take the role of an adaptive critic in the context of reinforcement learning

    Computational approaches to Explainable Artificial Intelligence: Advances in theory, applications and trends

    Get PDF
    Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tasks, including clinical diagnostics, and has unlocked solutions to previously intractable problems in virtual agent design, robotics, genomics, neuroimaging, computer vision, and industrial automation. In this paper, the most relevant advances from the last few years in Artificial Intelligence (AI) and several applications to neuroscience, neuroimaging, computer vision, and robotics are presented, reviewed and discussed. In this way, we summarize the state-of-the-art in AI methods, models and applications within a collection of works presented at the 9 International Conference on the Interplay between Natural and Artificial Computation (IWINAC). The works presented in this paper are excellent examples of new scientific discoveries made in laboratories that have successfully transitioned to real-life applications

    Statistical analysis and modelling of proteomic and genetic network data illuminate hidden roles of proteins and their connections

    Get PDF
    While many stable protein complexes are known, the dynamic interactome is still underexplored. Experimental techniques such as single-tag affinity purification, aim to close the gap and identify transient interactions, but need better filtering tools to discriminate between true interactors and noise. This thesis develops and contrasts two complementary approaches to the analysis of protein-protein interaction (PPI) networks derived from noisy experiments. The majority of data used for the analysis come from in vitro experiments aggregated from known databases (IntAct, BioGRID, BioPlex), but is also complemented by experiments done by our collaborators from the Ueffing group in the Institute of Ophthalmic Research, Tübingen University (Germany). Chapter 3 presents the statistical approach to the data analysis. It focuses on the case of a single dataset with target and control data in order to determine the significant interactions for the target. The procedure follows an expected trajectory of preprocessing, quality control, statistical testing, correction and discussion of results. The approach is tailored to the specific dataset, experiment design and related assumptions. This is specifically relevant for the missing value imputation where multiple approaches are discussed and a new method, building upon a previous method, is proposed and validated. Chapter 4 presents a different approach for the filtering of experimental results for PPIs. It is a statistic, WeSA (weighted socio-affinity), which improves upon previous methods of scoring PPIs from affinity proteomics data. It uses network analysis techniques to analyse the full PPI network without the need for controls. WeSA is tested on protein-protein networks of varying accuracy, including the curated IntAct dataset, the unfiltered records in BioGRID, and the large BioPlex dataset. The model is also tested against the previous same-goal method. While the function itself proves superior, another major advantage is that it can efficiently combine and compare observations across studies and can therefore be used to aggregate and clean results from incoming experiments in the context of all already available data. In the final part, uses of WeSA beyond wild-type PPI networks are analysed. The framework is proposed as a novel way to effectively compare mechanistic differences between variants of the same protein (e.g. mutant vs wild type). I also explore the use of WeSA to study other biological and non-biological networks such as genome-wide association studies (GWAS) and gene-phenotype associations, with encouraging results. In conclusion, this work presents and compares a variety of mathematical, statistical and computational approaches adapted, combined and/or developed specifically for the task of obtaining a better overview of protein-protein interaction networks. The novel methods performance is validated and, specifically, WeSA, is extensively tested and analysed, including beyond the field of PPI networks

    Single molecule MATAC-seq reveals key determinants for DNA replication efficiency

    Get PDF
    The stochastic nature of origin activation results in significant variability in the way genome replication is carried out from cell to cell. The reason for the diversity in efficiency and timing of individual origins has remained an unresolved issue for a long time. Cell-to-cell variability has been demonstrated to play a crucial role in cellular plasticity and cancer in mammalian cells. Although population-based methods have provided valuable insight into biological processes, it is necessary to use single molecule techniques to uncover events that are hidden by the population average. Many biological processes, such as DNA replication, transcription, and gene expression, are closely linked to the local chromatin structure. In yeast, although DNA replication origins have conserved DNA sequences, they display remarkable differences in timing and efficiency. Some origins initiate replication earlier during S-phase or more frequently than others, resulting in a high degree of heterogeneity among the cells in a population, with no two cells having the exact same replication profile. Our hypothesis is that the local nucleosomal structure may affect the DNA replication profile of individual origins. To explore this relationship, we have developed Methylation Accessibility of Targeted Chromatin domain Sequencing (MATAC-Seq) to determine single-molecule chromatin accessibility maps of specific genomic locations after targeted purification in their native chromatin context. Our analysis of selected early-efficient (EE) and late-inefficient (LI) replication origins in Saccharomyces cerevisiae using MATAC-Seq revealed significant cell-to-cell heterogeneity in their chromatin states. Disrupting the INO80 or ISW2 chromatin remodeling complexes led to changes at individual nucleosomal positions that correspond to changes in replication efficiency. Our results show that a chromatin state with a narrow size of accessible origin DNA in combination with well-positioned surrounding nucleosomes and an open +2 linker region was a strong predictor for efficient origin activation. MATAC-Seq provides a single-molecule assay for chromatin accessibility that reveals the large spectrum of alternative chromatin states that coexist at a given locus, which was previously masked in population-based experiments. This provides a mechanistic basis for origin activation heterogeneity that occurs during DNA replication in eukaryotic cells. As a result, our single-molecule assay for chromatin accessibility will be ideal for defining single-molecule heterogeneity across many biological processes, such as transcription, replication, or DNA repair in vitro and ex vivo.Die stochastische Natur der Aktivierung von Replikationsursprüngen führt zu einer signifikanten Variabilität in der Art und Weise, wie die DNA Replikation von Zelle zu Zelle durchgeführt wird. Der Grund für die Diversität in Effizienz und Zeitpunkt der individuellen Aktivierung von Ursprüngen blieb lange ein ungelöstes Problem. Es wurde gezeigt, dass die Zell-zu-Zell-Variabilität eine entscheidende Rolle bei der zellulären Plastizität und Krebs in Säugetierzellen spielt. Obwohl populationsbasierte Methoden wertvolle Einblicke in biologische Prozesse geliefert haben, ist es notwendig, Einzelmolekültechniken zu verwenden, um Ereignisse aufzudecken, die durch das Durchschnittsverhalten aller Moleküle verborgen sind. Viele biologische Prozesse wie DNA-Replikation, Transkription und Genexpression sind eng mit der lokalen Chromatinstruktur verbunden. Obwohl die DNA-Replikationsursprünge in Hefe konservierte DNA-Sequenzen aufweisen, zeigen sie bemerkenswerte Unterschiede im Zeitpunkt und Effizienz der Replikation. Einige Ursprünge initiieren die Replikation früher während der S-Phase oder häufiger als andere, was zu einem hohen Grad an Heterogenität zwischen den Zellen in einer Population führt, wobei keine zwei Zellen das exakt gleiche Replikationsprofil aufweisen. Unsere Hypothese ist, dass die lokale nukleosomale Struktur das DNA-Replikationsprofil beeinflussen kann. Um diese Beziehung zu untersuchen, haben wir Methylation Accessibility of Targeted Chromatin Domain Sequencing (MATAC-Seq) entwickelt, um Einzelmolekül-Chromatin-Zugänglichkeitskarten spezifischer genomischer Orte nach gezielter Reinigung in ihrem nativen Chromatin-Kontext zu bestimmen. Unsere Analyse ausgewählter früh-effizient (EE) und spät-ineffizient (LI) feuernde Replikationsursprünge in Saccharomyces cerevisiae mit MATAC-Seq ergab eine signifikante Zell-zu-Zell-Heterogenität in ihren Chromatinzuständen. Die genetische Deletion der INO80- oder ISW2-Chromatin-Remodeling Komplexe führte zu Veränderungen an einzelnen nukleosomalen Positionen, die mit Veränderungen der Replikationseffizienz korrespondierten. Unsere Ergebnisse zeigten, dass ein Chromatinzustand mit einem engen Fenster an zugänglicher Replikationsursprungs-DNA in Kombination mit gut positionierten umgebenden Nukleosomen und einer offenen +2-Linkerregion ein starker Prädiktor für eine effiziente Ursprungsaktivierung war. MATAC-Seq bietet einen Einzelmolekül-Assay für die Zugänglichkeit von Chromatin, der das große Spektrum alternativer Chromatinzustände aufzeigt, die an einem bestimmten genomischen Lokus koexistieren, der zuvor in populationsbasierten Experimenten maskiert war. Dies liefert eine mechanistische Grundlage für die Heterogenität der Ursprungsaktivierung, die während der DNA-Replikation in eukaryotischen Zellen auftritt. Infolgedessen ist unser Einzelmolekül-Assay 5 für Chromatin-Zugänglichkeit ideal für die Definition der Einzelmolekül-Heterogenität über viele biologische Prozesse hinweg, wie z. B. Transkription, Replikation oder DNA-Reparatur in vitro und ex vivo
    corecore