103 research outputs found

    #COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol

    Get PDF
    We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized

    A novel computational framework for fast, distributed computing and knowledge integration for microarray gene expression data analysis

    Get PDF
    The healthcare burden and suffering due to life-threatening diseases such as cancer would be significantly reduced by the design and refinement of computational interpretation of micro-molecular data collected by bioinformaticians. Rapid technological advancements in the field of microarray analysis, an important component in the design of in-silico molecular medicine methods, have generated enormous amounts of such data, a trend that has been increasing exponentially over the last few years. However, the analysis and handling of these data has become one of the major bottlenecks in the utilization of the technology. The rate of collection of these data has far surpassed our ability to analyze the data for novel, non-trivial, and important knowledge. The high-performance computing platform, and algorithms that utilize its embedded computing capacity, has emerged as a leading technology that can handle such data-intensive knowledge discovery applications. In this dissertation, we present a novel framework to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and integration computational routines, specifically for cancer genomics applications. The research presents a unique computational paradigm for the rapid, accurate, and efficient selection of relevant marker genes, while providing parametric controls to ensure flexibility of its application. The proposed paradigm consists of the following key computational steps: (a) preprocess, normalize the gene expression data; (b) discretize the data for knowledge mining application; (c) partition the data using two proposed methods: partitioning with overlapped windows and adaptive selection; (d) perform knowledge discovery on the partitioned data-spaces for association rule discovery; (e) integrate association rules from partitioned data and knowledge spaces on distributed processor nodes using a novel knowledge integration algorithm; and (f) post-analysis and functional elucidation of the discovered gene rule sets. The framework is implemented on a shared-memory multiprocessor supercomputing environment, and several experimental results are demonstrated to evaluate the algorithms. We conclude with a functional interpretation of the computational discovery routines for enhanced biological physiological discovery from cancer genomics datasets, while suggesting some directions for future research

    The Silent Arms Race: The Role of the Supercomputer During the Cold War, 1947-1963

    Get PDF
    One of the central features of the Cold War is the Arms Race. The United States and the Union of Soviet Socialist republics vied for supremacy over the globe for a fifty-year period in which there were several arms races; atomic weapons, thermonuclear weapons and various kinds of conventional weapons. However, there is another arms race that goes unsung during this period of history and that is in the area of supercomputing. The other types of arms races are taken for granted by historians and others, but the technological competition between the superpowers would have been impossible without the historically silent arms race in the area of supercomputers. The construction of missiles, jets as well as the testing of nuclear weapons had serious implications for international relations. Often perception is more important than fact. Perceived power maintained a deterrent effect on the two superpowers. If one superpower suspected that they, in fact, had an advantage over the other then the balance of power would be upset and more aggressive measures might have been taken in various fronts of the conflict, perhaps leading to war. Due to this, it was necessary to maintain a balance of power not only in weapons but in supercomputing as well. Considering the role that the computers played, it is time for closer historical scrutiny

    Computational Discovery of Structured Non-coding RNA Motifs in Bacteria

    Get PDF
    This dissertation describes a range of computational efforts to discover novel structured non-coding RNA (ncRNA) motifs in bacteria and generate hypotheses regarding their potential functions. This includes an introductory description of key advances in comparative genomics and RNA structure prediction as well as some of the most commonly found ncRNA candidates. Beyond that, I describe efforts for the comprehensive discovery of ncRNA candidates in 25 bacterial genomes and a catalog of the various functions hypothesized for these new motifs. Finally, I describe the Discovery of Intergenic Motifs PipeLine (DIMPL) which is a new computational toolset that harnesses the power of support vector machine (SVM) classifiers to identify bacterial intergenic regions most likely to contain novel structured ncRNA and automates the bulk of the subsequent analysis steps required to predict function. In totality, the body of work will enable the large scale discovery of novel structured ncRNA motifs at a far greater pace than possible before

    MACRONUTRIENTS SHAPE MICROBIAL COMMUNITIES, GENE EXPRESSION AND PROTEIN EVOLUTION

    Get PDF
    Nutrient limitation of the principle macronutrients carbon, nitrogen and phosphorus are known to influence community structure, success of individual species, and over long enough time could, in theory, shape the evolution of proteins organisms use to cope with nutrient stress. This dissertation explores macronutrients incorporation into bacterial communities, how organisms modulate gene expression to cope with periodic nutrient stress, and how long-term limitation might shape cellular stoichiometry to reduce biochemical nutrient demand. In the first chapter, arctic natural microbial communities are investigated, and a strong seasonal shift of bacterial and archaeal N utilization from ammonium during the summer to urea during the winter is demonstrated via 15N-based stable isotope probing (SIP). In combination with collaborative 13C-bicarbonate based SIP studies, these data point to the potential for urea fueled nitrification as an important source of primary production during the arctic winter. The second chapter examines the nutrient limited transcriptome of a harmful bloom forming algae, Scrippsiella trochoidea CCMP 3099 to investigate its cellular response to nitrogen or phosphorus stress. Transcriptome data indicates that N limitation in S. trochoidea modulates gene expression to compensate for oxidative stress and appears to switch from inorganic nitrate metabolism to dissolved organic sources. The third chapter aims to understand how, over long time scales of phytoplankton and protists evolution, N limitation might alter the stoichiometry of the proteome to reduce overall nutrient utilization. It was tested whether the nutritional mode (autotrophy, mixotrophy, and heterotrophy) might be a predictor of the overall balance of macroelements in predicted protein products. The hypothesis that organisms living in more N limiting environments produce N-deplete protein products (based on side-chain chemistry), is rejected. Conversely, predicted proteins in the transcriptomes of mixotrophs appear enriched in amino acids with greater C content. The stoichiometry of the in silico translated proteome has a weak correlation to environmental nutrients (not significant for nitrate, but significant for phosphate). The last chapter is an extension of the primary research goals with respect to algal transcriptomes put forth in this dissertation. The chapter’s aim is to integrate scholarship and teaching by introducing cutting edge research results into a case study designed for introductory biology students to teach the central dogma of molecular biology in terms of genomics technologies. This chapter incorporates “Central Dogma of Molecular Biology”, “big data”, “cells as systems”, and the “flow of information” with societal issues and problem solving of the harmful bloom forming dinoflagellate Karenia brevis

    Investigating the Molecular Mechanisms of Splicing-Perturbing Small Molecules with Massively Parallel Sequencing in a Myotonic Dystrophy 1 Model

    Get PDF
    52 pages. A thesis presented to the Department of Chemistry and Biochemistry, and the Clark Honors College of the University of Oregon in partial fulfillment of the requirements for degree of Bachelor of Science, Spring 2014.Myotonic dystrophy is the most common form of adult-onset muscular dystrophy and appears in two forms: myotonic dystrophy 1 (OM 1) and 2 (DM2). Both diseases arc characterized by progressive muscle degeneration, myotonia, iridescent, cataracts, and in severe cases neurodcgcncration and cardiac dysfunction. Both forms of myotonic dystrophy are caused by an expansion of repeat DNA in distinct loci in the genome. In DM1, a CTG repeat is expanded from less than 50 repeats in normal individuals to up to several thousand repeats in DM1 patients. The molecular basis of this disease relics on the transcription of these repeats from DNA into RNA. Small molecules that can specifically target the repeats at the DNA level and inhibit their transcription - and thus alleviate the disease symptoms - represent a prime target for the development of therapeutics. This study investigates the capacity of two small molecules, pentamidine and actinomycin D, to reverse the molecular symptoms of DM1 through transcriptional inhibition and provides insight into their DNA target specificity and potential mechanisms of action

    An island-based approach for RNA-SEQ differential expression analysis.

    Get PDF
    High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles, offering several advantages over old techniques such as microarrays. This technique provides the ability to develop precise methodologies for a variety of RNA-Seq applications including gene expression quantification, novel transcript and exon discovery, differential expression (DE) and splice variant detection. The detection of significantly changing features (e.g. genes, transcript isoforms, exons) in expression across biological samples is a primary application of RNA-Seq. Uncovering which features are significantly differentially expressed between samples can provide insight into their functions. One major limitation with the majority of recently developed methods for RNA-Seq differential expression is the dependency on annotated biological features to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Thus, any significant changes occurring in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IBSeq), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. IBSeq segmentation determines individual islands of expression based on windowed read counts that can be compared across experimental conditions to determine differential island expression. In order to detect differentially expressed features, the significance of DE islands corresponding to each feature are combined using combined p-value methods. We evaluated the performance of our approach by comparing it to a number of existing gene DE methods using several benchmark MAQC RNA-Seq datasets. Using the area under ROC curve (auROC) as a performance metric, results show that IBSeq clearly outperforms all other methods compared

    Developing a bioinformatics framework for proteogenomics

    Get PDF
    In the last 15 years, since the human genome was first sequenced, genome sequencing and annotation have continued to improve. However, genome annotation has not kept up with the accelerating rate of genome sequencing and as a result there is now a large backlog of genomic data waiting to be interpreted both quickly and accurately. Through advances in proteomics a new field has emerged to help improve genome annotation, termed proteogenomics, which uses peptide mass spectrometry data, enabling the discovery of novel protein coding genes, as well as the refinement and validation of known and putative protein-coding genes. The annotation of genomes relies heavily on ab initio gene prediction programs and/or mapping of a range of RNA transcripts. Although this method provides insights into the gene content of genomes it is unable to distinguish protein-coding genes from putative non-coding RNA genes. This problem is further confounded by the fact that only 5% of the public protein sequence repository at UniProt/SwissProt has been curated and derived from actual protein evidence. This thesis contends that it is critically important to incorporate proteomics data into genome annotation pipelines to provide experimental protein-coding evidence. Although there have been major improvements in proteogenomics over the last decade there are still numerous challenges to overcome. These key challenges include the loss of sensitivity when using inflated search spaces of putative sequences, how best to interpret novel identifications and how best to control for false discoveries. This thesis addresses the existing gap between the use of genomic and proteomic sources for accurate genome annotation by applying a proteogenomics approach with a customised methodology. This new approach was applied within four case studies: a prokaryote bacterium; a monocotyledonous wheat plant; a dicotyledonous grape plant; and human. The key contributions of this thesis are: a new methodology for proteogenomics analysis; 145 suggested gene refinements in Bradyrhizobium diazoefficiens (nitrogen-fixing bacteria); 55 new gene predictions (57 protein isoforms) in Vitis vinifera (grape); 49 new gene predictions (52 protein isoforms) in Homo sapiens (human); and 67 new gene predictions (70 protein isoforms) in Triticum aestivum (bread wheat). Lastly, a number of possible improvements for the studies conducted in this thesis and proteogenomics as a whole have been identified and discussed

    Molecules to marinescapes: the characterization of microbial life in the Arctic Ocean

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 2016Microbes are the base of all marine food webs and comprise >90% of all living biomass in the world’s oceans. Microbial life and functioning in high-latitude seas is characterized by the predominance of unknown species that encode uncharacterized genes, replenish nutrients, and modulate ecosystem health by interfacing with disease processes. This research elucidates eukaryotic microbial diversity and functionality in Arctic and sub-Arctic marine environments by describing the culturable and genetic diversity of eukaryotic microbes and the life histories of marine fungi belonging to the Chytridiomycota. This work includes the description of two new mesomycetozoean species, the assembled and annotated genome of Sphaeroforma sirkka, the first description of a cryptic carbon cycle (the mycoloop) mediated by fungi from any marine environment, and the description of large-scale eukaryotic microbial diversity patterns driven by temperature and latitude in the eastern Bering Sea. These results help establish a valuable baseline of microbial diversity in high latitude seas
    corecore