1,181 research outputs found

    Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data

    Full text link
    Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.Comment: Submitted to the GPTP XIX Workshop, June 2-4 2022, University of Michigan, Ann Arbor, Michiga

    New Trends in Artificial Intelligence: Applications of Particle Swarm Optimization in Biomedical Problems

    Get PDF
    Optimization is a process to discover the most effective element or solution from a set of all possible resources or solutions. Currently, there are various biological problems such as extending from biomolecule structure prediction to drug discovery that can be elevated by opting standard protocol for optimization. Particle swarm optimization (PSO) process, purposed by Dr. Eberhart and Dr. Kennedy in 1995, is solely based on population stochastic optimization technique. This method was designed by the researchers after inspired by social behavior of flocking bird or schooling fishes. This method shares numerous resemblances with the evolutionary computation procedures such as genetic algorithms (GA). Since, PSO algorithms is easy process to subject with minor adjustment of a few restrictions, it has gained more attention or advantages over other population based algorithms. Hence, PSO algorithms is widely used in various research fields like ranging from artificial neural network training to other areas where GA can be used in the system

    First-principles calculation of DNA looping in tethered particle experiments

    Get PDF
    We calculate the probability of DNA loop formation mediated by regulatory proteins such as Lac repressor (LacI), using a mathematical model of DNA elasticity. Our model is adapted to calculating quantities directly observable in Tethered Particle Motion (TPM) experiments, and it accounts for all the entropic forces present in such experiments. Our model has no free parameters; it characterizes DNA elasticity using information obtained in other kinds of experiments. [...] We show how to compute both the "looping J factor" (or equivalently, the looping free energy) for various DNA construct geometries and LacI concentrations, as well as the detailed probability density function of bead excursions. We also show how to extract the same quantities from recent experimental data on tethered particle motion, and then compare to our model's predictions. [...] Our model successfully reproduces the detailed distributions of bead excursion, including their surprising three-peak structure, without any fit parameters and without invoking any alternative conformation of the LacI tetramer. Indeed, the model qualitatively reproduces the observed dependence of these distributions on tether length (e.g., phasing) and on LacI concentration (titration). However, for short DNA loops (around 95 basepairs) the experiments show more looping than is predicted by the harmonic-elasticity model, echoing other recent experimental results. Because the experiments we study are done in vitro, this anomalously high looping cannot be rationalized as resulting from the presence of DNA-bending proteins or other cellular machinery. We also show that it is unlikely to be the result of a hypothetical "open" conformation of the LacI tetramer.Comment: See the supplement at http://www.physics.upenn.edu/~pcn/Ms/TowlesEtalSuppl.pdf . This revised version accepted for publication at Physical Biolog

    Development of novel orthogonal genetic circuits, based on extracytoplasmic function (ECF) σ factors

    Get PDF
    The synthetic biology field aims to apply the engineering 'design-build-test-learn' cycle for the implementation of synthetic genetic circuits modifying the behavior of biological systems. In order to reach this goal, synthetic biology projects use a set of fully characterized biological parts that subsequently are assembled into complex synthetic circuits following a rational, model-driven design. However, even though the bottom-up design approach represents an optimal starting point to assay the behavior of the synthetic circuits under defined conditions, the rational design of such circuits is often restricted by the limited number of available DNA building blocks. These usually consist only of a handful of transcriptional regulators that additionally are often borrowed from natural biological systems. This, in turn, can lead to cross-reactions between the synthetic circuit and the host cell and eventually to loss of the original circuit function. Thus, one of the challenges in synthetic biology is to design synthetic circuits that perform the designated functions with minor cross-reactions (orthogonality). To overcome the restrictions of the widely used transcriptional regulators, this project aims to apply extracytoplasmic function (ECF) σ factors in the design novel orthogonal synthetic circuits. ECFs are the smallest and simplest alternative σ factors that recognize highly specific promoters. ECFs represent one of the most important mechanisms of signal transduction in bacteria, indeed, their activity is often controlled by anti-σ factors. Even though it was shown that the overexpression of heterologous anti-σ factors can generate an adverse effect on cell growth, they represent an attractive solution to control ECF activity. Finally, to date, we know thousands of ECF σ factors, widespread among different bacterial phyla, that are identifiable together with the cognate promoters and anti-σ factors, using bioinformatic approaches. All the above-mentioned features make ECF σ factors optimal candidates as core orthogonal regulators for the design of novel synthetic circuits. In this project, in order to establish ECF σ factors as standard building blocks in the synthetic biology field, we first established a high throughput experimental setup. This relies on microplate reader experiments performed using a highly sensitive luminescent reporter system. Luminescent reporters have a superior signal-to-noise ratio when compared to fluorescent reporters since they do not suffer from the high auto-fluorescence background of the bacterial cell. However, they also have a drawback represented by the constant light emission that can generate undesired cross-talk between neighboring wells on a microplate. To overcome this limitation, we developed a computational algorithm that corrects for luminescence bleed-through and estimates the “true” luminescence activity for each well of a microplate. We show that the correcting algorithm preserves low-level signals close to the background and that it is universally applicable to different experimental conditions. In order to simplify the assembly of large ECF-based synthetic circuits, we designed an ECF toolbox in E. coli. The toolbox allows for the combinatorial assembly of circuits into expression vectors, using a library of reusable genetic parts. Moreover, it also offers the possibility of integrating the newly generated synthetic circuits into four different phage attachment (att) sites present in the genome of E. coli. This allows for a flawless transition between plasmid-encoded and chromosomally integrated genetic circuits, expanding the possible genetic configurations of a given synthetic construct. Moreover, our results demonstrate that the four att sites are orthogonal in terms of the gene expression levels of the synthetic circuits. With the purpose of rationally design ECF-based synthetic circuits and taking advantage of the ECF toolbox, we characterized the dynamic behavior of a set of 15 ECF σ factors, their cognate promoters, and relative anti-σs. Overall, we found that ECFs are non-toxic and functional and that they display different binding affinities for the cognate target promoters. Moreover, our results show that it is possible to optimize the output dynamic range of the ECF-based switches by changing the copy number of the ECFs and target promoters, thus, tuning the input/output signal ratio. Next, by combining up to three ECF-switches, we generated a set of “genetic-timer circuits”, the first synthetic circuits harboring more than one ECF. ECF-based timer circuits sequentially activate a series of target genes with increasing time delays, moreover, the behavior of the circuits can be predicted by a set of mathematical models. In order to improve the dynamic response of the ECF-based constructs, we introduced anti-σ factors in our synthetic circuits. By doing so we first confirmed that anti-σ factors can exert an adverse effect on the growth of E. coli, thus we explored possible solutions. Our results demonstrate that anti-σ factors toxicity can be partially alleviated by generating truncated, soluble variants of the anti-σ factors and, eventually, completely abolished via chromosomal integration of the anti-σ factor-based circuits. Finally, after demonstrating that anti-σ factors can be used to generate a tunable time delay among ECF expression and target promoter activation, we designed ECF/AS-suicide circuits. Such circuits allow for the time-delayed cell-death of E. coli and will serve as a prototype for the further development of ECF/AS-based lysis circuits

    Computing with bacterial constituents, cells and populations: from bioputing to bactoputing

    Get PDF
    The relevance of biological materials and processes to computing—aliasbioputing—has been explored for decades. These materials include DNA, RNA and proteins, while the processes include transcription, translation, signal transduction and regulation. Recently, the use of bacteria themselves as living computers has been explored but this use generally falls within the classical paradigm of computing. Computer scientists, however, have a variety of problems to which they seek solutions, while microbiologists are having new insights into the problems bacteria are solving and how they are solving them. Here, we envisage that bacteria might be used for new sorts of computing. These could be based on the capacity of bacteria to grow, move and adapt to a myriad different fickle environments both as individuals and as populations of bacteria plus bacteriophage. New principles might be based on the way that bacteria explore phenotype space via hyperstructure dynamics and the fundamental nature of the cell cycle. This computing might even extend to developing a high level language appropriate to using populations of bacteria and bacteriophage. Here, we offer a speculative tour of what we term bactoputing, namely the use of the natural behaviour of bacteria for calculating

    Targeted Proteomics in Characterizing Iron-Deprived Cyanobacteria : Insights Into The Regulation of Iron-Sulfur Cluster Biogenesis

    Get PDF
    Cyanobacteria comprise a diverse group of widely distributed gram‐negative bacteria, which have the unique capacity amongst prokaryotes to perform oxygenic photosynthesis. Acclimation to iron deprivation, a key theme in this Thesis, involves several metabolic changes, which are closely interlinked with the high demand for iron cofactors in the photosynthetic electron transfer chains of these organisms. In order to gain in‐depth information on the protein-level changes occurring under iron deprivation, a targeted MS‐based proteomics method, selected reaction monitoring (SRM), was developed for the cyanobacterial model species Synechocystis sp. PCC 6803. Altogether 106 proteins were selected as SRM‐target proteins, representing various key metabolic pathways and possible metabolic nodes linked with iron‐dependent enzymatic reactions. The SRM analysis resulted in a high‐quality dataset, which verified several responses to iron deprivation, such as those related to remodeling of photosynthetic apparatus and induction of several iron acquisition proteins. New information was obtained, for example, on the elevated levels of proteins acting as electron sinks to alleviate the overreduction of the photosynthetic electron transfer chain caused by the increase in Photosystem II to Photosystem I ratio under iron deprivation. As a demonstration of the sensitivity of the system, 64 of the quantified target proteins had not been detected in earlier discovery‐based proteomics assays. The validated SRM method was subsequently applied to study the regulation of iron‐sulfur biogenesis in Synechocystis sufR and isaR1 mutant strains. SufR has been characterized as a transcriptional repressor of the sufBCDS operon, which is responsible of the Fe‐S cluster biogenesis in cyanobacteria. Deletion of sufR caused drastic induction of the sufBCDS operon proteins under Fe‐sufficiency, while the proteins carrying Fe‐S cofactors were downregulated. Under extended Fe‐depletion, increased expression of the apo‐form Fe‐S proteins was observed in comparison to the wild type strain under the same conditions. IsaR1, a small regulatory RNA, was found to be responsible for the repression of several genes for Fe and Fe‐S proteins as well as tetrapyrrole biosynthesis genes under Fe‐deprivation. Hence, IsaR1 affects the photosynthetic apparatus both directly and indirectly upon acclimation to iron deprivation. Importantly, both SufR (transcriptional repressor protein) and IsaR1 (small regulatory RNA) were identified to repress the sufBCDS operon; SufR under Fe‐sufficiency and IsaR1 under Fe‐deprivation. Such complex mixed regulatory circuit highlights the importance of tight control over the biogenesis of Fe‐S clusters as part of the metabolic acclimation to varying iron conditions

    Assessment of Next Generation Sequencing Technologies for \u3ci\u3eDe novo\u3c/i\u3e and Hybrid Assemblies of Challenging Bacterial Genomes

    Get PDF
    In past decade, tremendous progress has been made in DNA sequencing methodologies in terms of throughput, speed, read-lengths, along with a sharp decrease in per base cost. These technologies, commonly referred to as next-generation sequencing (NGS) are complimented by the development of hybrid assembly approaches which can utilize multiple NGS platforms. In the first part of my dissertation I performed systematic evaluations and optimizations of nine de novo and hybrid assembly protocols across four novel microbial genomes. While each had strengths and weaknesses, via optimization using multiple strategies I obtained dramatic improvements in overall assembly size and quality. To select the best assembly, I also proposed the novel rDNA operon validation approach to evaluate assembly accuracy. Additionally, I investigated the ability of third-generation PacBio sequencing platform and achieved automated finishing of Clostridium autoethanogenum without any accessory data. These complete genome sequences facilitated comparisons which revealed rDNA operons as a major limitation for short read technologies, and also enabled comparative and functional genomics analysis. To facilitate future assessment and algorithms developments of NGS technologies we publically released the sequence datasets for C. autoethanogenum which span three generations of sequencing technologies, containing six types of data from four NGS platforms. To assess limitations of NGS technologies, assessment of unassembled regions within Illumina and PacBio assemblies was performed using eight microbial genomes. This analysis confirmed rDNA operons as major breakpoints within Illumina assembly while gaps within PacBio assembly appears to be an unaccounted for event and assembly quality is cumulative effect of read-depth, read-quality, sample DNA quality and presence of phage DNA or mobile genetic elements. In a final collaborative study an enrichment protocol was applied for isolation of live endophytic bacteria from roots of the tree Populus deltoides. This protocol achieved a significant reduction in contaminating plant DNA and enabled use these samples for single-cell genomics analysis for the first time. Whole genome sequencing of selected single-cell genomes was performed, assembly and contamination removal optimized, and followed by the bioinformatics, phylogenetic and comparative genomics analyses to identify unique characteristics of these uncultured microorganisms
    corecore