Search CORE

4,446 research outputs found

Comparative Analyses of De Novo Transcriptome Assembly Pipelines for Diploid Wheat

Author: Pavlovikj Natasha
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/05/2022
Field of study

Gene expression and transcriptome analysis are currently one of the main focuses of research for a great number of scientists. However, the assembly of raw sequence data to obtain a draft transcriptome of an organism is a complex multi-stage process usually composed of pre-processing, assembling, and post-processing. Each of these stages includes multiple steps such as data cleaning, error correction and assembly validation. Different combinations of steps, as well as different computational methods for the same step, generate transcriptome assemblies with different accuracy. Thus, using a combination that generates more accurate assemblies is crucial for any novel biological discoveries. Implementing accurate transcriptome assembly requires a great knowledge of different algorithms, bioinformatics tools and software that can be used in an analysis pipeline. Many pipelines can be represented as automated scalable scientific workflows that can be run simultaneously on powerful distributed and computational resources, such as Campus Clusters, Grids, and Clouds, and speed-up the analyses. In this thesis, we 1) compared and optimized de novo transcriptome assembly pipelines for diploid wheat; 2) investigated the impact of a few key parameters for generating accurate transcriptome assemblies, such as digital normalization and error correction methods, de novo assemblers and k-mer length strategies; 3) built distributed and scalable scientific workflow for blast2cap3, a step from the transcriptome assembly pipeline for protein-guided assembly, using the Pegasus Workflow Management System (WMS); and 4) deployed and examined the scientific workflow for blast2cap3 on two different computational platforms. Based on the analysis performed in this thesis, we conclude that the best transcriptome assembly is produced when the error correction method is used with Velvet Oases and the “multi-k” strategy. Moreover, the performed experiments show that the Pegasus WMS implementation of blast2cap3 reduces the running time for more than 95% compared to its current serial implementation. The results presented in this thesis provide valuable insight for designing good de novo transcriptome assembly pipeline and show the importance of using scientific workflows for executing computationally demanding pipelines. Advisor: Jitender S. Deogu

DigitalCommons@University of Nebraska

Coupling streaming AI and HPC ensembles to achieve 100-1000x faster biomolecular simulations

Author: Brace Alexander
Foster Ian
Jha Shantenu
Lee Hyungro
Ma Heng
Munson Todd
Ramanathan Arvind
Trifan Anda
Turilli Matteo
Yakushin Igor
Publication venue
Publication date: 12/07/2022
Field of study

Machine learning (ML)-based steering can improve the performance of ensemble-based simulations by allowing for online selection of more scientifically meaningful computations. We present DeepDriveMD, a framework for ML-driven steering of scientific simulations that we have used to achieve orders-of-magnitude improvements in molecular dynamics (MD) performance via effective coupling of ML and HPC on large parallel computers. We discuss the design of DeepDriveMD and characterize its performance. We demonstrate that DeepDriveMD can achieve between 100-1000x acceleration for protein folding simulations relative to other methods, as measured by the amount of simulated time performed, while covering the same conformational landscape as quantified by the states sampled during a simulation. Experiments are performed on leadership-class platforms on up to 1020 nodes. The results establish DeepDriveMD as a high-performance framework for ML-driven HPC simulation scenarios, that supports diverse MD simulation and ML back-ends, and which enables new scientific insights by improving the length and time scales accessible with current computing capacity

arXiv.org e-Print Archive

Recommended from our members

Citizen-led Work using Social Computing and Procedural Guidance

Author: Pandey Vineet
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Online platforms enable people to interact with friends, family, and the world at large. How might people go beyond sharing stories and ideas to building and testing theories in the real world? While many are motivated to dig deeper into their lived experience, limited expertise and lack of platform support make complex activities like experimentation dauntingly hard. Novices benefit greatly from expert guidance: this thesis advocates baking the guidance into the interface itself.This dissertation introduces procedural guidance to build just-in-time expertise for difficult tasks. Procedural guidance has multiple advantages: it is minimal, leverages teachable moments, and can be ability-specific. This dissertation instantiates this insight of procedural guidance through a sequence of increasingly complex social computing systems: Gut Instinct for curating ideas, Docent for generating hypotheses, and Galileo for citizen-led experiments.Gut Instinct hosts online learning materials and enables people to collaboratively brainstorm potential influences on people’s microbiome. Docent explicitly teaches people to create hypotheses by combining personal insights and online learning with task-specific scaffolding. Finally, Galileo reifies experimentation in the software, provides multiple roles for contribution, and automatically manages interdependencies. Multiple evaluations—controlled experiments and field deployments with online communities including American Gut participants—demonstrate that procedural guidance enables people to transform intuitions to hypotheses and structurally-sound experiments. By enabling people to draw on lived experience, this dissertation harbingers a future where people can convert their intuitions to actionable plans and implement these plans with online communities. This dissertation concludes by discussing opportunities for complex work using social computing platforms

eScholarship - University of California

Nanoinformatics 2010 Program

Author: Baker Nathan A
Chaka Anne
Cohen Yoram
Colvin Vicki
Fritts Martin
Geraci Charles L.
Hoover Mark D
Ku Sharon
Kulinowski Kristen M
Lippell Phil
Luo James
McLennan Michael
Morse Jeffrey
Ostraat Michele L
Rajan Krishna
Reznik-Zellen Rebecca
Schad Peter
Tuominen Mark T.
Publication venue
Publication date: 01/11/2010
Field of study

InterNano Nanomanufacturing Repository

SIMDAT

Author: Boniface M.J.
Upstill C.
Publication venue
Publication date: 01/11/2005
Field of study

Southampton (e-Prints Soton)

Essential oil phytocomplex activity, a review with a focus on multivariate analysis for a network pharmacology-informed phytogenomic approach

Author: Buriani A.
Carrara M.
Caudullo G.
Fortinguerra S.
Sorrenti V.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Thanks to omic disciplines and a systems biology approach, the study of essential oils and phytocomplexes has been lately rolling on a faster track. While metabolomic fingerprinting can provide an effective strategy to characterize essential oil contents, network pharmacology is revealing itself as an adequate, holistic platform to study the collective effects of herbal products and their multi-component and multi-target mediated mechanisms. Multivariate analysis can be applied to analyze the effects of essential oils, possibly overcoming the reductionist limits of bioactivity-guided fractionation and purification of single components. Thanks to the fast evolution of bioinformatics and database availability, disease-target networks relevant to a growing number of phytocomplexes are being developed. With the same potential actionability of pharmacogenomic data, phytogenomics could be performed based on relevant disease-target networks to inform and personalize phytocomplex therapeutic application

Archivio istituzionale della ricerca - Università di Padova

Integrative biological simulation praxis: Considerations from physics, philosophy, and data/model curation practices

Author: Faundez Victor
Sarma Gopal P.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

Integrative biological simulations have a varied and controversial history in the biological sciences. From computational models of organelles, cells, and simple organisms, to physiological models of tissues, organ systems, and ecosystems, a diverse array of biological systems have been the target of large-scale computational modeling efforts. Nonetheless, these research agendas have yet to prove decisively their value among the broader community of theoretical and experimental biologists. In this commentary, we examine a range of philosophical and practical issues relevant to understanding the potential of integrative simulations. We discuss the role of theory and modeling in different areas of physics and suggest that certain sub-disciplines of physics provide useful cultural analogies for imagining the future role of simulations in biological research. We examine philosophical issues related to modeling which consistently arise in discussions about integrative simulations and suggest a pragmatic viewpoint that balances a belief in philosophy with the recognition of the relative infancy of our state of philosophical understanding. Finally, we discuss community workflow and publication practices to allow research to be readily discoverable and amenable to incorporation into simulations. We argue that there are aligned incentives in widespread adoption of practices which will both advance the needs of integrative simulation efforts as well as other contemporary trends in the biological sciences, ranging from open science and data sharing to improving reproducibility.Comment: 10 page

arXiv.org e-Print Archive

PhilSci Archive

Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality

Author: McDade Kevin
Publication venue
Publication date: 01/01/2017
Field of study

Bioinformatics can be divided into two phases, the first phase is conversion of raw data into processed data and the second phase is using processed data to obtain scientific results. It is important to consider the first “workflow” phase carefully, as there are many paths on the way to a final processed dataset. Some workflow paths may be different enough to influence the second phase, thereby, leading to ambiguity in the scientific literature. Workflow evaluation in bioinformatics enables the investigator to carefully plan how to process their data. A system that uses real data to determine the quality of a workflow can be based on the inherent biological relationships in the data itself. To our knowledge, a general software framework that performs real data-driven evaluation of bioinformatics workflows does not exist. The Evaluation and Utility of workFLOW (EUFLOW) decision-theoretic framework, developed and tested on gene expression data, enables users of bioinformatics workflows to evaluate alternative workflow paths using inherent biological relationships. EUFLOW is implemented as an R package to enable users to evaluate workflow data. EUFLOW is a framework which also permits user-guided utility and loss functions, which enables the type of analysis to be considered in the workflow path decision. This framework was originally developed to address the quality of identifier mapping services between UNIPROT accessions and Affymetrix probesets to facilitate integrated analysis1. An extension to this framework evaluates Affymetrix probeset filtering methods on real data from endometrial cancer and TCGA ovarian serous carcinoma samples.2 Further evaluation of RNASeq workflow paths demonstrates generalizability of the EUFLOW framework. Three separate evaluations are performed including: 1) identifier filtering of features with biological attributes, 2) threshold selection parameter choice for low gene count features, and 3) commonly utilized RNASeq data workflow paths on The Cancer Genome Atlas data. The EUFLOW decision-theoretic framework developed and tested in my dissertation enables users of bioinformatics workflows to evaluate alternative workflow paths guided by inherent biological relationships and user utility

D-Scholarship@Pitt

ProQuest OAI Repository