4 research outputs found

    Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing

    Get PDF
    Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington’s disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington’s disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies

    A systems genetics approach for sleep regulation

    No full text
    Sleep is a daily behavior important for health. Many people studied sleep with more or less sophisticated technologies over time, and yet it has not revealed all its mysteries. To help uncover the molecular consequences of sleep deprivation, the Franken group have assembled a systems genetics resource interrogating the BXD mouse panel. The genotypes and sleep-wake phenome were characterized, along with intermediate phenotypes: the transcriptome in brain and in liver, and the targeted metabolome in the blood plasma. I have used this rich multi-omics BXD dataset for computational investigation and development of analytical methods for data and knowledge integration to expand the current understanding of sleep regulation. First, in collaboration with Maxime Jan we used this real-world example of data and bioinformatic analysis management to highlight multi-omics challenges and solutions used to help internal or external reusability. This includes more details on the quality check and validations of the methods, the use of Rmarkdown reports for more higher levels parts of the analyses, a metadata workflow document illustrating and referencing the different code and data files, and a web site for exploration of the results. The robustness of the results was also assessed through the change to the newest version of the mouse genome reference assembly used. Then, the classical pipeline to analyse RNA-sequencing reads uses one mouse reference for all samples, irrespective of the strain of the samples, which is potentially creates a reference bias. Therefore, to improve the genetic-specificity of the read mapping, I customized the standard assembly based on one parental strain with variants from the BXD population. An important step was adding a tailored imputation of the population genetic variants using haplotypes blocks/regions to achieve a sufficient resolution for each line-specific reference. This strategy alleviated the reference bias and allowed to detect proportionally more eQTLs with the custom BXD-specific references than with the standard reference. Lastly, I assembled a multi-layer prior knowledge network and integrated the gene expression sleep-specific on it. This integration of data-driven and knowledge driven approach sets the basis for a way to generate hypotheses based on multiple genes to explain the genetic and environmental interactions culminating in the different sleep phenotypes. -- Le sommeil est un comportement quotidien important pour la santĂ©. De nombreuses personnes ont Ă©tudiĂ© le sommeil avec des technologies plus ou moins sophistiquĂ©es au fil du temps, et il n’a cependant pas encore rĂ©vĂ©lĂ© tous ses mystĂšres. Pour aider a` dĂ©couvrir les consĂ©quences molĂ©culaires de la privation de sommeil, le groupe Franken a assemblĂ© une ressource de gĂ©nĂ©tique des systĂšmes relative aux lignĂ©es de souris BXD. Les gĂ©notypes et le phĂ©nome de sommeil-Ă©veil ont Ă©tĂ© charactĂ©risĂ©s, ainsi que des phĂ©notypes intermĂ©diaires : d’une part le transcriptome dans le cerveau et le foie, d’autre part le mĂ©tabolome ciblĂ© dans le plasma sanguin. J’ai utilisĂ© ce riche jeu de donnĂ©es multi-omics sur les BXD pour le dĂ©veloppement de mĂ©thodes analytiques pour l’intĂ©gration de donnees et de connaissances afin d’étendre la comprĂ©hension actuelle de la regulation du sommeil. D’abord, en collaboration avec Maxime Jan, nous avons utilisĂ© cet exemple rĂ©el de la gestion des donnĂ©es et de l’analyse bioinformatique pour mettre en Ă©vidence les dĂ©fis multi-omics et les solutions utilisĂ©es pour que le travail puisse ĂȘtre rĂ©utilisĂ© Ă  l’interne ou Ă  l’externe. Cela inclut plus de dĂ©tails sur le contrˆole de qualitĂ© et les validations des mĂ©thodes, l’utilisation de rapports Rmarkdown pour les parties de plus haut niveau d’abstraction des analyses, un document concernant les mĂ©ta-donnĂ©es du flux de travail pour illustrer et rĂ©fĂ©rencer les diffĂ©rents scripts et fichiers de donnĂ©es et un site web pour l’exploration des rĂ©sultats. La stabilitĂ© des rĂ©sultats a Ă©galement Ă©tĂ© Ă©valuĂ©e au travers du changement de version de l’assemblĂ©e de rĂ©ference utilisĂ©e. Puis, la pipeline traditionnelle pour analyser des reads de sĂ©quenžcage d’ARN utilise une rĂ©fĂ©rence murine pour tous les Ă©chantillons, quelle que soit leur souche. Afin d’amĂ©liorer la spĂ©cificitĂ© gĂ©nĂ©tique du mapping des reads, j’ai utilisĂ© et personnalisĂ© l’assemblĂ©e standard basĂ©e sur une souche parentale avec les variants de la population BXD. L’imputation des variants gĂ©nĂ©tiques en utilisant les blocs/rĂ©gions haplotypes Ă©tait importante pour obtenir une rĂ©solution suffisante pour chacune des lignĂ©es. Cette stratĂ©gie a diminuĂ© le biais de rĂ©fĂ©rence et a permis de dĂ©tecter proportionnellement plus d’eQTLs avec les rĂ©fĂ©rences spĂ©cifiques aux BXD qu’avec la rĂ©fĂ©rence traditionnelle. Finalement, j’ai assemblĂ© un rĂ©seau Ă  plusieurs couches de connaissances prĂ©alables et y ait intĂ©grĂ© l’expression des gĂšnes contenant la composante spĂ©cique au sommeil. L’intĂ©gration des approches basĂ©es sur les donnĂ©es et les connaissances prĂ©alables met en place la base pour un moyen de gĂ©nĂ©rer des hypothĂšses basĂ©es sur plusieurs gĂšnes pour expliquer les interactions gĂ©nĂ©tiques et environmentales provoquant les diffĂ©rents phĂ©notypes du sommeil

    Evaluating mapping parameters.

    No full text
    A. The performance on local eQTLs of selected mapping settings on cortex samples (average of the NSD and SD conditions) is measured by the percentage of expressed genes that have a significant local eQTL. The BXD-specific references were used. C. As in A but for liver samples.</p
    corecore