22 research outputs found

    La bioinformática al servicio de la genómica

    Get PDF
    Este trabajo de tesis aborda distintos ámbitos de aplicación de técnicas bioinformáticas a la resolución de problemas surgidos del manejo, análisis, almacenamiento y consulta de grandes volúmenes de datos genómicos. Los principales retos a los que esta tesis ha tratado de dar respuesta han sido los siguientes: - Procesar la información más básica de las tecnologías de genotipado de alto rendimiento, a fin de permitir obtener de manera rápida y sencilla una serie de parámetros y estadísticas básicas características de un experimento independientemente de la tecnología elegida. - Facilitar la publicación y consulta de resultados de genotipado a baja y media escala, tanto de SNPs como de STRs, así como su interacción con los repositorios de variabilidad accesibles públicamente. - Estudiar la viabilidad de gestionar localmente un repositorio propio de variabilidad humana basado en los recursos disponibles, tanto de información externa como de infraestructura interna. - Transferir el conocimiento obtenido. Aportar herramientas existentes o soluciones ad hoc a los problemas que pueda presentar la investigación genética en el campo de la biología computacional

    Performance evaluation of deleteriousness prediction methods for intronic SNVs in next generation sequences

    Get PDF
    Introduction: Alterations in splicing sites (ss) are estimated to explain approximately 10% of human disease causal variants. Mutations outside the ss but affecting ?regulatory elements? can be up to 25%. Accurate deleteriousness prediction for intronic variants is crucial for diagnostic purposes. Many deleteriousness prediction methods have been developed, but their relative values are still unclear in practical applications. We comprehensively evaluated the predictive performance of two complementary deleteriousness-scoring methods using information from real patients. Material and Methods: We selected the dbscSNV (both ADA and RF scores) and SPIDEX algorithms, that study variants in splicing consensus regions or in regulatory regions respectively. The tools, either alone or in combination, were tested on 29294 gene intronic SNVs that have previously been characterised by ClinVar as either ?pathogenic? (430) or ?benign? (28864). The sensitivity, specificity and positive and negative predictive values were calculated. Moreover, we applied the algorithms to WES data from undiagnosed patients, and we analysed the mRNA sequence from genes that fitted the patient?s phenotype. Results: The highest sensitivity corresponds to dbscSNV with 96.55% while the best specificity is for SPIDEX with 95.78%. When considering the 3 scores (SPIDEX, dbscSNV ADA and RF Score), the sensitivity and specificity values were 60.7% and 94.6%. The Positive and Negative Predictive Value were 14.45% and 99.39%. The results for 20 undiagnosed cases are presented. Conclusions: Besides the low positive predictive value, the combination of both algorithms leads less than 1% of false negatives, so their routine use can be recommended for diagnostic purposes

    SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

    Get PDF
    Background: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. Results: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 109 genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. Conclusion: In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.The grants from the Xunta de Galicia (PGIDIT06PXIB208079PR) and Fundación de Investigación Médica Mutua Madrileña awarded to AS partially supported this projectS

    A Generalized Model to Estimate the Statistical Power in Mitochondrial Disease Studies Involving 2×k Tables

    Get PDF
    Mitochondrial DNA (mtDNA) variation (i.e. haplogroups) has been analyzed in regards to a number of multifactorial diseases. The statistical power of a case-control study determines the a priori probability to reject the null hypothesis of homogeneity between cases and controlsThe research leading to these results has received funding from the “Ministerio de Ciencia e Innovación” (SAF2008-02971) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) given to A.S.S

    SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 licenseThis work was supported by Ministerio de Economía y Competitividad (Spain) (http://www.mineco.gob.es) grants TIN2013-41129-P and TIN2014-54565-JIN. There was no additional external funding received for this studyS

    A Genome-Wide Study of Modern-Day Tuscans: Revisiting Herodotus's Theory on the Origin of the Etruscans

    Get PDF
    Background: The origin of the Etruscan civilization (Etruria, Central Italy) is a long-standing subject of debate among scholars from different disciplines. The bulk of the information has been reconstructed from ancient texts and archaeological findings and, in the last few years, through the analysis of uniparental genetic markers. Methods: By meta-analyzing genome-wide data from The 1000 Genomes Project and the literature, we were able to compare the genomic patterns (.540,000 SNPs) of present day Tuscans (N = 98) with other population groups from the main hypothetical source populations, namely, Europe and the Middle East. Results: Admixture analysis indicates the presence of 25–34% of Middle Eastern component in modern Tuscans. Different analyses have been carried out using identity-by-state (IBS) values and genetic distances point to Eastern Anatolia/Southern Caucasus as the most likely geographic origin of the main Middle Eastern genetic component observed in the genome of modern Tuscans. Conclusions: The data indicate that the admixture event between local Tuscans and Middle Easterners could have occurred in Central Italy about 2,600–3,100 years ago (y.a.). On the whole, the results validate the theory of the ancient historian Herodotus on the origin of Etruscans.The research leading to these results has received funding from the ‘‘Ministerio de Ciencia e Innovacio´n’’ (SAF2011-26983) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) (A.S.) and Consellerı´a de Sanidade/Xunta de Galicia (RHI07/2-intensificacio´n de la actividad investigadora and 10PXIB918184PR), Instituto Carlos III (Intensificacio´n de la actividad investigadora) and Fondo de Investigacio´n Sanitaria (FIS; PI07/0069, PI10/00540 and PI13/ 02382) of the Plan Nacional de I+D+I and ‘fondos FEDER’ (F.M.T.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.S

    Longitudinal analysis on parasite diversity in honeybee colonies: new taxa, high frequency of mixed infections and seasonal patterns of variation

    Get PDF
    To evaluate the influence that parasites have on the losses of Apis mellifera it is essential to monitor their presence in the colonies over time. Here we analysed the occurrence of nosematids, trypanosomatids and neogregarines in five homogeneous colonies for up to 21 months until they collapsed. The study, which combined the use of several molecular markers with the application of a massive parallel sequencing technology, provided valuable insights into the epidemiology of these parasites: (I) it enabled the detection of parasite species rarely reported in honeybees (Nosema thomsoni, Crithidia bombi, Crithidia acanthocephali) and the identification of two novel taxa; (II) it revealed the existence of a high rate of co-infections (80% of the samples harboured more than one parasite species); (III) it uncovered an identical pattern of seasonal variation for nosematids and trypanosomatids, that was different from that of neogregarines; (IV) it showed that there were no significant differences in the fraction of positive samples, nor in the levels of species diversity, between interior and exterior bees; and (V) it unveiled that the variation in the number of parasite species was not directly linked with the failure of the colonies

    Building a custom large-scale panel of novel microhaplotypes for forensic identification using MiSeq and Ion S5 massively parallel sequencing systems

    Get PDF
    A large number of new microhaplotype loci were identified in the human genome by applying a directed search with selection criteria emphasizing short haplotype length (<120 nucleotides) and maximum levels of polymorphism in the composite SNPs. From these searches, 107 autosomal microhaplotypes and 11 X chromosome microhaplotypes were selected, with well-spaced autosomal positions to ensure their independence in relationship tests. The 118 microhaplotypes were assembled into a single multiplex assay for the analysis of forensic DNA with massively parallel sequencing (MPS). A single AmpliSeq-adapted primer set was made for Illumina MiSeq and Thermo Fisher Ion S5 MPS platforms and the performance of the assay was comprehensively evaluated in both systems. Five microhaplotypes showed critical sequencing failures in both MPS platforms and were removed, while a further 13 required manual checks and the application of sequence quality thresholds in one or both systems to ensure the successful analysis of low-level DNA in these loci. The targeting of short microhaplotype spans during marker selection, with an average length of 51 nucleotides in the 118 loci, led to a high level of sensitivity for the panel when sequencing the very degraded DNA typically encountered in forensic casework and the identification of missing personsThe studies reported and authors MdlP, CP, MVL are supported by MAPA: Multiple Allele Polymorphism Analysis (BIO2016-78525-R), a research project funded by the Spanish Research State Agency (AEI), and co-financed with ERDF funds. MdlP is supported by a postdoctoral fellowship awarded by the Consellería de Cultura, Educación e Ordenación Universitaria and the Consellería de Economía, Emprego e Industria of the Xunta de Galicia (ED481B 2017/088). The studies reported have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 740580 within the framework of the Visible Attributes Through Genomics (VISAGE) Project and ConsortiumS

    Genetic Analysis of Arrhythmogenic Diseases in the Era of NGS: The Complexity of Clinical Decision-Making in Brugada Syndrome

    Get PDF
    BACKGROUND: The use of next-generation sequencing enables a rapid analysis of many genes associated with sudden cardiac death in diseases like Brugada Syndrome. Genetic variation is identified and associated with 30-35% of cases of Brugada Syndrome, with nearly 20-25% attributable to variants in SCN5A, meaning many cases remain undiagnosed genetically. To evaluate the role of genetic variants in arrhythmogenic diseases and the utility of next-generation sequencing, we applied this technology to resequence 28 main genes associated with arrhythmogenic disorders. MATERIALS AND METHODS: A cohort of 45 clinically diagnosed Brugada Syndrome patients classified as SCN5A-negative was analyzed using next generation sequencing. Twenty-eight genes were resequenced: AKAP9, ANK2, CACNA1C, CACNB2, CASQ2, CAV3, DSC2, DSG2, DSP, GPD1L, HCN4, JUP, KCNE1, KCNE2, KCNE3, KCNH2, KCNJ2, KCNJ5, KCNQ1, NOS1AP, PKP2, RYR2, SCN1B, SCN3B, SCN4B, SCN5A, SNTA1, and TMEM43. A total of 85 clinically evaluated relatives were also genetically analyzed to ascertain familial segregation. RESULTS AND DISCUSSION: Twenty-two patients carried 30 rare genetic variants in 12 genes, only 4 of which were previously associated with Brugada Syndrome. Neither insertion/deletion nor copy number variation were detected. We identified genetic variants in novel candidate genes potentially associated to Brugada Syndrome. These include: 4 genetic variations in AKAP9 including a de novo genetic variation in 3 positive cases; 5 genetic variations in ANK2 detected in 4 cases; variations in KCNJ2 together with CASQ2 in 1 case; genetic variations in RYR2, including a de novo genetic variation and desmosomal proteins encoding genes including DSG2, DSP and JUP, detected in 3 of the cases. Larger gene panels or whole exome sequencing should be considered to identify novel genes associated to Brugada Syndrome. However, application of approaches such as whole exome sequencing would difficult the interpretation for clinical purposes due to the large amount of data generated. The identification of these genetic variants opens new perspectives on the implications of genetic background in the arrhythmogenic substrate for research purposes. CONCLUSIONS: As a paradigm for other arrhythmogenic diseases and for unexplained sudden death, our data show that clinical genetic diagnosis is justified in a family perspective for confirmation of genetic causality. In the era of personalized medicine using high-throughput tools, clinical decision-making is increasingly complex

    Exploring the biological role of postzygotic and germinal de novo mutations in ASD

    Get PDF
    De novo mutations (DNMs), including germinal and postzygotic mutations (PZMs), are a strong source of causality for Autism Spectrum Disorder (ASD). However, the biological processes involved behind them remain unexplored. Our aim was to detect DNMs (germinal and PZMs) in a Spanish ASD cohort (360 trios) and to explore their role across different biological hierarchies (gene, biological pathway, cell and brain areas) using bioinformatic approaches. For the majority of the analysis, a combined ASD cohort (N = 2171 trios) was created using previously published data by the Autism Sequencing Consortium (ASC). New plausible candidate genes for ASD such as FMR1 and NFIA were found. In addition, genes harboring PZMs were significantly enriched for miR-137 targets in comparison with germinal DNMs that were enriched in GO terms related to synaptic transmission. The expression pattern of genes with PZMs was restricted to early mid-fetal cortex. In contrast, the analysis of genes with germinal DNMs revealed a spatio-temporal window from early to mid-fetal development stages, with expression in the amygdala, cerebellum, cortex and striatum. These results provide evidence of the pathogenic role of PZMs and suggest the existence of distinct mechanisms between PZMs and germinal DNMs that are influencing ASD riskAA-G was supported by Fundación María José Jove. CR-F was supported by a contract from the FEDER. Instituto de Salud Carlos III/PI1900809/Cofinanciado FEDER supported this studyS
    corecore