4 research outputs found

    A method for multiplexed full-length single-molecule sequencing of the human mitochondrial genome

    Get PDF
    Data processing; DNA sequencing; Genomic analysisTratamiento de datos; Secuenciación de ADN; Análisis genómicoTractament de dades; Seqüenciació d'ADN; Anàlisi genòmicaMethods to reconstruct the mitochondrial DNA (mtDNA) sequence using short-read sequencing come with an inherent bias due to amplification and mapping. They can fail to determine the phase of variants, to capture multiple deletions and to cover the mitochondrial genome evenly. Here we describe a method to target, multiplex and sequence at high coverage full-length human mitochondrial genomes as native single-molecules, utilizing the RNA-guided DNA endonuclease Cas9. Combining Cas9 induced breaks, that define the mtDNA beginning and end of the sequencing reads, as barcodes, we achieve high demultiplexing specificity and delineation of the full-length of the mtDNA, regardless of the structural variant pattern. The long-read sequencing data is analysed with a pipeline where our custom-developed software, baldur, efficiently detects single nucleotide heteroplasmy to below 1%, physically determines phase and can accurately disentangle complex deletions. Our workflow is a tool for studying mtDNA variation and will accelerate mitochondrial research.This research has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824110 – EASI-Genomics (I.G.G.) and the ERC Synergy project BCLL@las under grant agreement No 810287 (I.G.G.). Institutional support was from the Spanish Instituto de Salud Carlos III, Fondo de Investigaciones Sanitarias and cofunded with ERDF funds (PI19/01772). We acknowledge the institutional support of the Spanish Ministry of Science and Innovation through the Instituto de Salud Carlos III and the 2014–2020 Smart Growth Operating Program, to the EMBL partnership and institutional co-financing with the European Regional Development Fund (MINECO/FEDER, BIO2015-71792-P). We also acknowledge the support of the Centro de Excelencia Severo Ochoa, and the Generalitat de Catalunya through the Departament de Salut, Departament d’Empresa i Coneixement and the CERCA Programme to the institute

    Deep Molecular Characterization of Milder Spinal Muscular Atrophy Patients Carrying the c.859G>C Variant in SMN2

    Get PDF
    Next-generation sequencing; Phenotype–genotype correlations; Spinal muscular atrophySeqüenciació de nova generació; Correlacions fenotip-genotip; Atròfia muscular espinalSecuenciación de nueva generación; Correlaciones fenotipo-genotipo; Atrofia muscular espinalSpinal muscular atrophy (SMA) is a severe neuromuscular disorder caused by biallelic loss or pathogenic variants in the SMN1 gene. Copy number and modifier intragenic variants in SMN2, an almost identical paralog gene of SMN1, are known to influence the amount of complete SMN proteins. Therefore, SMN2 is considered the main phenotypic modifier of SMA, although genotype–phenotype correlation is not absolute. We present eleven unrelated SMA patients with milder phenotypes carrying the c.859G>C-positive modifier variant in SMN2. All were studied by a specific NGS method to allow a deep characterization of the entire SMN region. Analysis of two homozygous cases for the variant allowed us to identify a specific haplotype, Smn2-859C.1, in association with c.859G>C. Two other cases with the c.859G>C variant in their two SMN2 copies showed a second haplotype, Smn2-859C.2, in cis with Smn2-859C.1, assembling a more complex allele. We also identified a previously unreported variant in intron 2a exclusively linked to the Smn2-859C.1 haplotype (c.154-1141G>A), further suggesting that this region has been ancestrally conserved. The deep molecular characterization of SMN2 in our cohort highlights the importance of testing c.859G>C, as well as accurately assessing the SMN2 region in SMA patients to gain insight into the complex genotype–phenotype correlations and improve prognostic outcomes.This research was funded by grants from Biogen (ESP-SMG-17-11256), Roche, GaliciAME and Spanish Instituto de Salud Carlos III, Fondo de Investigaciones Sanitarias and co-funded with ERDF funds (grant no. FIS PI18/000687). A grant from Horizon 2020 IMI2 Screen4Care is acknowledged by E.B., and L.T., E.F.T., R.J., J.S., L.C.-C., F.M., E.B., and L.T. are members of the ERN NMD Network for Rare Diseases. E.F.T. is a member of the ERN ITHACA Network for Rare Diseases

    Effect of domestication in the pig genome

    Get PDF
    La domesticación animal es un proceso realmente importante en la historia del hombre en el cual se seleccionaron diferentes rasgos de interés de los animales, como puede ser un crecimiento más rápido o una mayor docilidad. Para estudiar la domesticación a nivel genético es necesario identificar una serie de marcadores relacionados con este proceso evolutivo. Los avances en las tecnologías de secuenciación han mejorado considerablemente la investigación de la genómica de la domesticación, pudiendo determinar los cambios genéticos que causan esa transformación de especie salvaje a doméstica. El objetivo principal de esta tesis es la evaluación del efecto de la domesticación en el genoma del cerdo mediante el análisis de la diversidad genética en poblaciones domésticas y salvajes. En la primera parte se ha realizado un análisis de la diferenciación y del desequilibrio de ligamiento para detectar las diferencias entre cerdos domésticos y salvajes, utilizando la vía metabólica como unidad de análisis. Mediante el estudio de la diferenciación, utilizando el estadístico Fst, obtenemos una serie de rutas significativas relacionadas con el comportamiento y el desarrollo, que fueron algunos de los primeros rasgos seleccionados en cerdo. Sin embargo, al realizar el análisis del desequilibrio, mediante el estadístico nSL, detectamos diferencias en rutas relacionadas con la reproducción del animal, rasgo seleccionado recientemente. Por otro lado, realizamos una red de co-asociación entre todas las vías metabólicas significativamente diferentes entre cerdos domésticos y salvajes, obteniendo 3 clústeres diferenciados, uno relacionado con el crecimiento y la regulación hormonal, otro con el sistema nervioso simpático y el último con la reproducción. En la segunda parte, realizamos un análisis de la fuerza de la selección a nivel genómico en cerdos domésticos y salvajes, utilizando dos poblaciones domésticas, Ibérico y Large White, las cuales son muy diferentes entre ellas. Mientras que Ibérico es una raza autóctona que ha sufrido recientemente una gran reducción del tamaño poblacional, Large White es una raza comercial internacional que ha sido mejorada de manera artificial, además de introgresada con cerdos asiáticos. Para analizar la fuerza de la selección utilizamos el parámetro α, que estima la proporción de sustituciones no-sinónimas que son adaptativas, utilizando cuatro estimadores diferentes de la variabilidad, cada uno enfocado a una parte del espectro de frecuencias: Fu&Li (solo singletons), Watterson (todo el espectro dando más peso a las bajas frecuencias), Tajima (todo el espectro de manera uniforme) y Fay&Wu (incrementa el peso de manera proporcional a la frecuencia). Sin embargo, al analizar los patrones de selección no encontramos más señales comunes entre las razas domesticadas que al compararlas con la salvaje. En cambio, encontramos un mayor efecto de la demografía en la selección, Ibérico tiene una variabilidad muy baja debido a su bajo tamaño poblacional, lo cual se muestra en los patrones de selección obtenidos, que se asemejan a una reducción poblacional; mientras que Large White tiene una mayor variabilidad debido posiblemente a la presencia de alelos asiáticos en su genoma, obteniendo patrones explicados por la presencia tanto de mutaciones deletéreas como beneficiosas, además de una expansión poblacional y/o migración. Por último, hemos desarrollado una aplicación web para poder analizar archivos VCF, la cual puede ayudarnos a identificar posibles errores o sesgos, principalmente relacionados con la cobertura del SNP.Animal domestication is an important process in the human history in which different traits of the animals were selected, such as faster growth or greater docility. To study domestication at the genetic level it is necessary to identify the markers related to this evolutionary process. Advances in sequencing technologies have improved the investigation of the genomics of domestication, which has allowed to determine the genetic changes that cause this transformation from wild to domestic species. The main goal of this thesis is the evaluation of the domestication effect in the pig genome through the analysis of genetic diversity in domestic and wild populations. In the first part, analyses of differentiation and linkage disequilibrium were performed to detect differences between domestic and wild pigs, using the pathway as the unit of analysis. Through the study of differentiation, using the Fst statistic, we obtained significant pathways related to behavior and development, which were some of the first selected traits in pigs. On the other hand, when performing the disequilibrium analysis, using the nSL statistic, we detected differences in pathways related to the reproduction of the animal, a recently selected trait. Besides, we made a co-association network using all pathways that are significantly different between domestic and wild pigs, obtaining three differentiated clusters, one related to growth and hormonal regulation, another with the sympathetic nervous system and the last with the reproduction. In the second part, we performed an analysis of the strength of selection at the genome level in domestic and wild pigs, using two very different domestic populations, Iberian and Large White. Iberian breed is an autochthonous breed that has recently suffered a strong reduction in the effective population size, Large White is an international commercial breed that has been artificially improved and introgressed with Asian pigs. To analyze the strength of the selection we use the parameter α, which estimates the proportion of non-synonymous substitutions that are adaptive, using four different estimators of variability, each focused on a part of the frequency spectrum: Fu&Li (only singletons), Watterson (whole spectrum giving more weight at low frequencies), Tajima (whole spectrum weighted uniformly) and Fay&Wu (increases the weight proportionally with the frequency). However, when analyzing the selection patterns, we did not find more common signals between the two domestic breeds than between domestic and wild ones. Instead, we found a larger effect of demography on the selection, Iberian has a very low variability due to its low population size, which is shown in the obtained selection patterns, which resemble a population reduction; while Large White has a larger variability, possibly due to the presence of Asian alleles in its genome, obtaining patterns that can be explained by the presence of both deleterious and beneficial mutations, together with a population expansion and/or migration. Finally, we have developed a web-based application to analyze VCF files, which can help identify possible errors or biases, mainly related to the SNP coverage

    Complex SMN Hybrids Detected in a Cohort of 31 Patients With Spinal Muscular Atrophy

    Get PDF
    Spinal muscular atrophyAtròfia muscular espinalAtrofia muscular espinalBackground and Objectives Spinal muscular atrophy (SMA) is a recessive neuromuscular disorder caused by the loss or presence of point pathogenic variants in the SMN1 gene. The main positive modifier of the SMA phenotype is the number of copies of the SMN2 gene, a paralog of SMN1, which only produces around 10%–15% of functional SMN protein. The SMN2 copy number is inversely correlated with phenotype severity; however, discrepancies between the SMA type and the SMN2 copy number have been reported. The presence of SMN2-SMN1 hybrids has been proposed as a possible modifier of SMA disease. Methods We studied 31 patients with SMA, followed at a single center and molecularly diagnosed by Multiplex Ligand-Dependent Probe Amplification (MLPA), with a specific next-generation sequencing protocol to investigate their SMN2 genes in depth. Hybrid characterization also included bioinformatics haplotype phasing and specific PCRs to resolve each SMN2-SMN1 hybrid structure. Results We detected SMN2-SMN1 hybrid genes in 45.2% of the patients (14/31), the highest rate reported to date. This represents a total of 25 hybrid alleles, with 9 different structures, of which only 4 are detectable by MLPA. Of particular interest were 2 patients who presented 4 SMN2-SMN1 hybrid copies each and no pure SMN2 copies, an event reported here for the first time. No clear trend between the presence of hybrids and a milder phenotype was observed, although 5 of the patients with hybrid copies showed a better-than-expected phenotype. The higher hybrid detection rate in our cohort may be due to both the methodology applied, which allows an in-depth characterization of the SMN genes and the ethnicity of the patients, mainly of African origin. Discussion Although hybrid genes have been proposed to be beneficial for patients with SMA, our work revealed great complexity and variability between hybrid structures; therefore, each hybrid structure should be studied independently to determine its contribution to the SMA phenotype. Large-scale studies are needed to gain a better understanding of the function and implications of SMN2-SMN1 hybrid copies, improving genotype-phenotype correlations and prediction of the evolution of patients with SMA.This work was partially funded by grants from Biogen (ESP-SMG-17-11256), Roche, GaliciAME and the Spanish Instituto de Salud Carlos III, Fondo de Investigaciones Sanitarias and co-funded with ERDF funds (FIS PI18/000687)
    corecore