4 research outputs found
Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntingtonâs disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntingtonâs disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies
A systems genetics approach for sleep regulation
Sleep is a daily behavior important for health. Many people studied sleep with more or less sophisticated technologies over time, and yet it has not revealed all its mysteries. To help uncover the molecular consequences of sleep deprivation, the Franken group have assembled a systems genetics resource interrogating the BXD mouse panel. The genotypes and sleep-wake phenome were characterized, along with intermediate phenotypes: the transcriptome in brain and in liver, and the targeted metabolome in the blood plasma. I have used this rich multi-omics BXD dataset for computational investigation and development of analytical methods for data and knowledge integration to expand the current understanding of sleep regulation. First, in collaboration with Maxime Jan we used this real-world example of data and bioinformatic analysis management to highlight multi-omics challenges and solutions used to help internal or external reusability. This includes more details on the quality check and validations of the methods, the use of Rmarkdown reports for more higher levels parts of the analyses, a metadata workflow document illustrating and referencing the different code and data files, and a web site for exploration of the results. The robustness of the results was also assessed through the change to the newest version of the mouse genome reference assembly used. Then, the classical pipeline to analyse RNA-sequencing reads uses one mouse reference for all samples, irrespective of the strain of the samples, which is potentially creates a reference bias. Therefore, to improve the genetic-specificity of the read mapping, I customized the standard assembly based on one parental strain with variants from the BXD population. An important step was adding a tailored imputation of the population genetic variants using haplotypes blocks/regions to achieve a sufficient resolution for each line-specific reference. This strategy alleviated the reference bias and allowed to detect proportionally more eQTLs with the custom BXD-specific references than with the standard reference. Lastly, I assembled a multi-layer prior knowledge network and integrated the gene expression sleep-specific on it. This integration of data-driven and knowledge driven approach sets the basis for a way to generate hypotheses based on multiple genes to explain the genetic and environmental interactions culminating in the different sleep phenotypes.
--
Le sommeil est un comportement quotidien important pour la santĂ©. De nombreuses personnes ont Ă©tudiĂ© le sommeil avec des technologies plus ou moins sophistiquĂ©es au fil du temps, et il nâa cependant pas encore rĂ©vĂ©lĂ© tous ses mystĂšres. Pour aider a` dĂ©couvrir les consĂ©quences molĂ©culaires de la privation de sommeil, le groupe Franken a assemblĂ© une ressource de gĂ©nĂ©tique des systĂšmes relative aux lignĂ©es de souris BXD. Les gĂ©notypes et le phĂ©nome de sommeil-Ă©veil ont Ă©tĂ© charactĂ©risĂ©s, ainsi que des phĂ©notypes intermĂ©diaires : dâune part le transcriptome dans le cerveau et le foie, dâautre part le mĂ©tabolome ciblĂ© dans le plasma sanguin. Jâai utilisĂ© ce riche jeu de donnĂ©es multi-omics sur les BXD pour le dĂ©veloppement de mĂ©thodes analytiques pour lâintĂ©gration de donnees et de connaissances afin dâĂ©tendre la comprĂ©hension actuelle de la regulation du sommeil. Dâabord, en collaboration avec Maxime Jan, nous avons utilisĂ© cet exemple rĂ©el de la gestion des donnĂ©es et de lâanalyse bioinformatique pour mettre en Ă©vidence les dĂ©fis multi-omics et les solutions utilisĂ©es pour que le travail puisse ĂȘtre rĂ©utilisĂ© Ă lâinterne ou Ă lâexterne. Cela inclut plus de dĂ©tails sur le contrËole de qualitĂ© et les validations des mĂ©thodes, lâutilisation de rapports Rmarkdown pour les parties de plus haut niveau dâabstraction des analyses, un document concernant les mĂ©ta-donnĂ©es du flux de travail pour illustrer et rĂ©fĂ©rencer les diffĂ©rents scripts et fichiers de donnĂ©es et un site web pour lâexploration des rĂ©sultats. La stabilitĂ© des rĂ©sultats a Ă©galement Ă©tĂ© Ă©valuĂ©e au travers du changement de version de lâassemblĂ©e de rĂ©ference utilisĂ©e. Puis, la pipeline traditionnelle pour analyser des reads de sĂ©quenžcage dâARN utilise une rĂ©fĂ©rence murine pour tous les Ă©chantillons, quelle que soit leur souche. Afin dâamĂ©liorer la spĂ©cificitĂ© gĂ©nĂ©tique du mapping des reads, jâai utilisĂ© et personnalisĂ© lâassemblĂ©e standard basĂ©e sur une souche parentale avec les variants de la population BXD. Lâimputation des variants gĂ©nĂ©tiques en utilisant les blocs/rĂ©gions haplotypes
Ă©tait importante pour obtenir une rĂ©solution suffisante pour chacune des lignĂ©es. Cette stratĂ©gie a diminuĂ© le biais de rĂ©fĂ©rence et a permis de dĂ©tecter proportionnellement plus dâeQTLs avec les rĂ©fĂ©rences spĂ©cifiques aux BXD quâavec la rĂ©fĂ©rence traditionnelle. Finalement, jâai assemblĂ© un rĂ©seau Ă plusieurs couches de connaissances prĂ©alables et y ait intĂ©grĂ© lâexpression des gĂšnes contenant la composante spĂ©cique au sommeil. LâintĂ©gration des approches basĂ©es sur les donnĂ©es et les connaissances prĂ©alables met en place la base pour un moyen de gĂ©nĂ©rer des hypothĂšses basĂ©es sur plusieurs gĂšnes pour expliquer les interactions gĂ©nĂ©tiques et environmentales provoquant les diffĂ©rents phĂ©notypes du sommeil
Evaluating mapping parameters.
A. The performance on local eQTLs of selected mapping settings on cortex samples (average of the NSD and SD conditions) is measured by the percentage of expressed genes that have a significant local eQTL. The BXD-specific references were used. C. As in A but for liver samples.</p