Search CORE

4 research outputs found

Pheno2Geno - High-throughput generation of genetic markers and maps from molecular phenotypes for crosses between inbred strains

Author: CA Hackett
D Arends
D Arends
Danny Arends
ES Lander
G Gort
H Tae
HJ Westra
J Quackenbush
Joeri K van der Velde
Konrad Zych
KW Broman
LB Snoek
MA West
MA West
MJ Truco
O Loudet
O Trelles
R Alberts
R Alberts
R Core Team
RC Jansen
RC Jansen
RC Jansen
RE Voorrips
Ritsert C Jansen
Ronny VL Joosen
RV Joosen
T Benaglia
Wilco Ligterink
Y Li
Yang Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Exceptionally long-range haplotypes in chromosome 6 maintained in an endemic African population

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

A Unified Framework for the Prioritization of Variants of Uncertain Significance in Hereditary Breast and Ovarian Cancer Patients

Author: Caminsky Natasha G
Publication venue: Scholarship@Western
Publication date: 21/09/2015
Field of study

A significant proportion of hereditary breast and ovarian cancer (HBOC) patients receive uninformative genetic testing results, an issue exacerbated by the overwhelming quantity of variants of uncertain significance identified. This thesis describes a framework where, aside from protein coding changes, information theory (IT)-based sequence analysis identifies and prioritizes pathogenic variants occurring within sequence elements predicted to be recognized by proteins involved in mRNA splicing, transcription, and untranslated region binding and structure. To support the utilization of IT analysis, we established IT-based variant interpretation accuracy by performing a comprehensive review of mutations altering mRNA splicing in rare and common diseases. Custom probes targeting 20 complete HBOC genes for sequencing in 379 BRCA-uninformative patients identified 47,501 unique variants and we prioritized 429 variants in both BRCA and non-BRCA genes. Our approach focuses attention on a limited set of variants from a spectrum of functional mutation types for downstream functional and co-segregation analysis

Scholarship@Western

Computational Modelling of Human Transcriptional Regulation by an Information Theory-based Approach

Author: Lu Ruipeng
Publication venue: Scholarship@Western
Publication date: 12/04/2018
Field of study

ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs. This thesis presents a novel motif discovery pipeline by adding the recursive masking and thresholding functionalities to Bipad to improve detection of primary binding motifs. Analyzing 765 ENCODE ChIP-seq datasets with this pipeline generated contiguous and bipartite information theory-based PWMs (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The accuracy of these iPWMs were determined via four independent validation methods, including detection of experimentally proven TFBSs, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. Novel cofactor motifs supported previously unreported TF coregulatory interactions. This thesis further presents a unified framework to identify variants in hereditary breast and ovarian cancer (HBOC), successfully applying these iPWMs to prioritize TFBS variants in 20 complete genes of HBOC patients. The spatial distribution and information composition of cis-regulatory modules (e.g. TFBS clusters) in promoters substantially determine gene expression patterns and TF target genes. Multiple algorithms were developed to detect TFBS clusters, including the information density-based clustering (IDBC) algorithm that simultaneously considers the spatial and information densities of TFBSs. Prior studies predicting tissue-specific gene expression levels and differentially expressed (DE) TF targets used log likelihood ratios to quantify TFBS strengths and merged adjacent TFBSs into clusters. This thesis presents a machine learning framework that uses the Bray-Curtis function to quantify the similarity between tissue-wide expression profiles of genes, and IDBC-identified clusters from iPWM-detected TFBSs to predict gene expression profiles and DE direct TF targets. Multiple clusters enable gene expression to be robust against TFBS mutations

Scholarship@Western