Search CORE

2,902 research outputs found

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

Challenges in identifying cancer genes by analysis of exome sequencing data.

Author: Bandyopadhyay Sourav
Carter Hannah
Friend Stephen
Hofree Matan
Ideker Trey
Kreisberg Jason F
Mischel Paul S
Publication venue: eScholarship, University of California
Publication date: 01/07/2016
Field of study

Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13-60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed

PubMed Central

eScholarship - University of California

Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease

Author: Ferreira-Gonzalez Andrea
Hegde Madhuri
Mao Rong
Santani Avni
Voelkerding Karl V.
Weck Karen E.
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

Context.-With the decrease in the cost of sequencing, the clinical testing paradigm has shifted from single gene to gene panel and now whole-exome and whole-genome sequencing. Clinical laboratories are rapidly implementing next-generation sequencing-based whole-exome and whole-genome sequencing. Because a large number of targets are covered by whole-exome and whole-genome sequencing, it is critical that a laboratory perform appropriate validation studies, develop a quality assurance and quality control program, and participate in proficiency testing. Objective.-To provide recommendations for wholeexome and whole-genome sequencing assay design, validation, and implementation for the detection of germline variants associated in inherited disorders. Data Sources.-An example of trio sequencing, filtration and annotation of variants, and phenotypic consideration to arrive at clinical diagnosis is discussed. Conclusions.-It is critical that clinical laboratories planning to implement whole-exome and whole-genome sequencing design and validate the assay to specifications and ensure adequate performance prior to implementation. Test design specifications, including variant filtering and annotation, phenotypic consideration, guidance on consenting options, and reporting of incidental findings, are provided. These are important steps a laboratory must take to validate and implement whole-exome and whole-genome sequencing in a clinical setting for germline variants in inherited disorders

Carolina Digital Repository

VCU Scholars Compass

Identification of rare variants in Alzheimer\u27s disease

Author: Cruchaga Carlos
Lord Jenny
Lu Alexander J
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

Digital Commons@Becker

A Path to Implement Precision Child Health Cardiovascular Medicine.

Author: Brian Reemtsen
J. Paul Finn
Juan Alejos
Marlin Touma
Marlin Touma
Nancy Halnon
Stanley F. Nelson
Yibin Wang
Yibin Wang
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Congenital heart defects (CHDs) affect approximately 1% of live births and are a major source of childhood morbidity and mortality even in countries with advanced healthcare systems. Along with phenotypic heterogeneity, the underlying etiology of CHDs is multifactorial, involving genetic, epigenetic, and/or environmental contributors. Clear dissection of the underlying mechanism is a powerful step to establish individualized therapies. However, the majority of CHDs are yet to be clearly diagnosed for the underlying genetic and environmental factors, and even less with effective therapies. Although the survival rate for CHDs is steadily improving, there is still a significant unmet need for refining diagnostic precision and establishing targeted therapies to optimize life quality and to minimize future complications. In particular, proper identification of disease associated genetic variants in humans has been challenging, and this greatly impedes our ability to delineate gene-environment interactions that contribute to the pathogenesis of CHDs. Implementing a systematic multileveled approach can establish a continuum from phenotypic characterization in the clinic to molecular dissection using combined next-generation sequencing platforms and validation studies in suitable models at the bench. Key elements necessary to advance the field are: first, proper delineation of the phenotypic spectrum of CHDs; second, defining the molecular genotype/phenotype by combining whole-exome sequencing and transcriptome analysis; third, integration of phenotypic, genotypic, and molecular datasets to identify molecular network contributing to CHDs; fourth, generation of relevant disease models and multileveled experimental investigations. In order to achieve all these goals, access to high-quality biological specimens from well-defined patient cohorts is a crucial step. Therefore, establishing a CHD BioCore is an essential infrastructure and a critical step on the path toward precision child health cardiovascular medicine

Directory of Open Access Journals

eScholarship - University of California

Mutational Analysis of Uterine Cervical Cancer That Survived Multiple Rounds of Radiotherapy

Author: Endang Nuryadi
Endang Nuryadi
エンダングヌリヤディ
Publication venue: 群馬大学医学部
Publication date: 22/03/2019
Field of study

学位記番号：医博甲175

Gunma University Academic Information Repository

In Silico Derivation of HLA-Specific Alloreactivity Potential from Whole Exome Sequencing of Stem Cell Transplant Donors and Recipients: Understanding the Quantitative Immuno-biology of Allogeneic Transplantation

Author: Batalo Michael
Buck Gregory A.
Griffith Phil
Hess Michael L.
Jameson-Lee Max
Khalid Haniya
Koparde Vishal
Manjili Masoud H.
Neale Michael C.
Roberts Catherine H.
Sampson Juliana K.
Scalora Allison F.
Serrano Myrna G.
Sheth Nihar U.
Toor Amir A.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Donor T cell mediated graft vs. host effects may result from the aggregate alloreactivity to minor histocompatibility antigens (mHA) presented by the HLA in each donor-recipient pair (DRP) undergoing stem cell transplantation (SCT). Whole exome sequencing has demonstrated extensive nucleotide sequence variation in HLA-matched DRP. Non-synonymous single nucleotide polymorphisms (nsSNPs) in the GVH direction (polymorphisms present in recipient and absent in donor) were identified in 4 HLA-matched related and 5 unrelated DRP. The nucleotide sequence flanking each SNP was obtained utilizing the ANNOVAR software package. All possible nonameric-peptides encoded by the non-synonymous SNP were then interrogated in-silico for their likelihood to be presented by the HLA class I molecules in individual DRP, using the Immune-Epitope Database (IEDB) SMM algorithm. The IEDB-SMM algorithm predicted a median 18,396 peptides/DRP which bound HLA with an IC50 of <500nM, and 2254 peptides/DRP with an IC50 of <50nM. Unrelated donors generally had higher numbers of peptides presented by the HLA. A similarly large library of presented peptides was identified when the data was interrogated using the Net MHCPan algorithm. These peptides were uniformly distributed in the various organ systems. The bioinformatic algorithm presented here demonstrates that there may be a high level of minor histocompatibility antigen variation in HLA-matched individuals, constituting an HLA-specific alloreactivity potential. These data provide a possible explanation for how relatively minor adjustments in GVHD prophylaxis yield relatively similar outcomes in HLA matched and mismatched SCT recipients.Comment: Abstract: 235, Words: 6422, Figures: 7, Tables: 3, Supplementary figures: 2, Supplementary tables:

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Lehigh Valley Health Network: LVHN Scholarly Works

Deep-coverage whole genome sequences and blood lipids among 16,324 individuals.

Author: Abecasis Goncalo
Alver Maris
Bloom Jonathan M
Chaffin Mark
Correa Adolfo
Cupples L Adrienne
Engreitz Jesse M
Ernst Jason
Esko Tonu
Ganna Andrea
Johnson W Craig
Kathiresan Sekar
Kellis Manolis
Khera Amit V
Lander Eric S
Manichaikul Ani
Mitchell Braxton
Montasser May
Natarajan Pradeep
Neale Benjamin M
NHLBI TOPMed Lipids Working Group
O'Connell Jeffrey R
Peloso Gina M
Perry James A
Poterba Timothy
Rich Stephen S
Ripatti Samuli
Rotter Jerome I
Ruotsalainen Sanni E
Salomaa Veikko
Seed Cotton
Surakka Ida L
Vasan Ramachandran S
Willer Cristen J
Wilson James G
Zekavat Seyedeh Maryam
Zhou Wei
Publication venue: eScholarship, University of California
Publication date: 01/08/2018
Field of study

Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia

DSpace@MIT

Directory of Open Access Journals

eScholarship - University of California

George Washington University: Health Sciences Research Commons (HSRC)