Search CORE

8 research outputs found

Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases

Author: Baldridge Dustin
et al.
Kobren Shilpa Nadimpalli
Paul Alexander J
Wegner Daniel J
Publication venue: Digital Commons@Becker
Publication date: 01/06/2021
Field of study

PURPOSE: Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. METHODS: We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. RESULTS: We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. CONCLUSION: The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases

Digital Commons@Becker

Detecting and Analyzing Variation in Protein Interactions

Author: Kobren Shilpa Nadimpalli
Publication venue: Princeton, NJ : Princeton University
Publication date: 01/01/2018
Field of study

Proteins carry out a dazzling multitude of functions by interacting with DNA, RNA, other proteins and various other molecules within our cells. Together these interactions comprise complex networks that differ naturally across cells within an organism, across individuals in a population, and across species. Although such variation is critical for normal organismal functioning, mutations affecting protein interactions are also known to underlie a wide range of human diseases. In this dissertation, I introduce novel computational approaches that explore the extent to which specific protein interactions vary across species, across healthy individuals, and across individuals with cancer. To start, I focus on interaction variation across species. It is well established that changes in protein-DNA interactions underlie a wide range of observable differences across species. These differences are primarily thought to stem from changes in the DNA sites that transcription factor (TF) proteins bind to, although changes in the binding properties of TFs themselves have also been observed. Determining the prevalence of such TF changes, however, remains infeasible using current experimental approaches. Here, I develop and apply a comparative genomics framework to systematically quantify changes in the DNA-binding properties of orthologous TFs across species spanning ~45 million years of evolutionary divergence. I demonstrate that, contrary to expectation, cross-species regulatory network divergence resulting from changes in non-duplicated DNA-binding proteins is pervasive. These findings reveal a widespread yet largely unstudied source of divergence across transcriptional regulatory programs in animals. Next, I turn my attention to interaction variation across individuals. In order to comprehensively quantify this, I first combine large-scale sequence, domain and structure information to pinpoint sites within protein domains---the fundamental structural units in proteins---that are involved in binding DNA, RNA, peptides, ions, metabolites, or other small molecules. This domain-based approach enables us to identify putative interaction sites in over 60% of human genes, representing a 2.4-fold improvement over comparable state-of-the-art approaches for this task. I next demonstrate that whereas domain-inferred interaction sites are significantly depleted of natural variants across ~60,000 healthy individuals, these same sites are significantly enriched for cancer mutations across ~11,000 tumor samples. My analysis demonstrates that the cellular network variation that occurs across healthy individuals is unlikely to be due to changes within proteins; in contrast, mutations acquired in cancers appear to preferentially alter cellular networks by perturbing the proteins themselves. Finally, I show how we can leverage an interaction-based viewpoint to uncover mutated genes that play causal roles in human cancers. In particular, I aim to uncover genes whose interaction interfaces are significantly altered in tumors. Towards this end, I develop a robust computational framework that integrates my per-domain-position binding propensities with additional sources of biological data regarding protein functionality. I demonstrate that by analytically computing the significance of patterns of mutations, my approach is able to achieve a dramatic improvement in runtime over atypical empirical permutation test for this task. Moreover, my interaction-based method not only recapitulates known cancer driver genes faster and with greater precision than previous methods, but it also uncovers relatively rarely-mutated genes with likely roles in cancer. Through focusing on the somatic alteration of protein interaction interfaces in tumors, my method can inform the perturbed molecular mechanisms across known and putative cancer genes, thereby enabling valuable insights that may help guide personalized cancer treatments

Dataspace

Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases.

Author: Kobren Shilpa Nadimpalli,
Publication venue
Publication date: 26/07/2021
Field of study

Ezid

Recommended from our members

Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions.

Author: Kobren Shilpa Nadimpalli
Singh Mona
Publication venue
Publication date: 01/01/2019
Field of study

Domains are fundamental subunits of proteins, and while they play major roles in facilitating protein-DNA, protein-RNA and other protein-ligand interactions, a systematic assessment of their various interaction modes is still lacking. A comprehensive resource identifying positions within domains that tend to interact with nucleic acids, small molecules and other ligands would expand our knowledge of domain functionality as well as aid in detecting ligand-binding sites within structurally uncharacterized proteins. Here, we introduce an approach to identify per-domain-position interaction 'frequencies' by aggregating protein co-complex structures by domain and ascertaining how often residues mapping to each domain position interact with ligands. We perform this domain-based analysis on ∼91000 co-complex structures, and infer positions involved in binding DNA, RNA, peptides, ions or small molecules across 4128 domains, which we refer to collectively as the InteracDome. Cross-validation testing reveals that ligand-binding positions for 2152 domains are highly consistent and can be used to identify residues facilitating interactions in ∼63-69% of human genes. Our resource of domain-inferred ligand-binding sites should be a great aid in understanding disease etiology: whereas these sites are enriched in Mendelian-associated and cancer somatic mutations, they are depleted in polymorphisms observed across healthy populations. The InteracDome is available at http://interacdome.princeton.edu

Princeton University Open Access Repository

An Integrative Approach Uncovers Genes with Perturbed Interactions in Cancer

Author: Bernard Chazelle
Mona Singh
Shilpa Nadimpalli Kobren
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref

Recommended from our members

RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci

Author: Aguiar-Pulido Vanessa
Danzi Matt C
Dolzhenko Egor
Fazal Sarah
Kobren Shilpa Nadimpalli
Lucas Francesca
Marwaha Shruti
Reuter Chloe
Sunyaev Shamil
Tekin Mustafa
Wheeler Matthew
Wuchty Stefan
Xu Isaac
Züchner Stephan
Publication venue
Publication date: 31/01/2024
Field of study

Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies

University of Miami: Scholarship Miami

Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases.

Author: Alkelai Anna
Baldridge Dustin
Bastarache Lisa
Bican Anna
Blue Elizabeth
Cogan Joy
Esteves Cecilia
Huang Alden
Kobren Shilpa Nadimpalli
Kohane Isaac S
Krier Joel B
LeBlanc Kimberly
Lee Hane
Liu Pengfei
Marwaha Shruti
Murdock David R
Paul Alexander J
Pusey Barbara N
Sunyaev Shamil R
Undiagnosed Diseases Network
Velinder Matt
Wegner Daniel J
Züchner Stephan
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

PurposeGenomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful.MethodsWe collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols.ResultsWe found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases.ConclusionThe largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases

PubMed Central

eScholarship - University of California

University of Miami: Scholarship Miami

Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions

Author: Ainscough
Barrera
Bashton
Berman
Betts
Cohen
Eddy
Fan
Finn
Finn
Finn
Forslund
Gerstberger
Ghersi
Gress
Gress
Grossman
Hanks
Henikoff
Hong
Hosur
Isserlin
Jeggo
Kato
Kim
Lek
Letunic
Liu
Lunde
Luscombe
Marchler-Bauer
Mona Singh
Mosca
Mosca
Noyes
Ochoa
Ooi
O’Boyle
Pabo
Persikov
Persikov
Pieper
Raghavachari
Rogers
Sahni
Sahni
Saksela
Shilpa Nadimpalli Kobren
Shoemaker
Sigrist
Sudha
Swamidass
The UniProt
Vaquerizas
Wang
Winter
Wishart
Wishart
Xu
Yang
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref