2,988 research outputs found
LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons
<p>Abstract</p> <p>Background</p> <p>Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs).</p> <p>Results</p> <p>We have developed a software tool <it>LTRharvest </it>for the <it>de novo </it>detection of full length LTR retrotransposons in large sequence sets. <it>LTRharvest </it>efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of <it>LTRharvest </it>against a gold standard annotation for <it>Saccharomyces cerevisae </it>and <it>Drosophila melanogaster </it>shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of <it>LTRharvest </it>over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software.</p> <p>Conclusion</p> <p><it>LTRharvest </it>is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes <it>LTRharvest </it>a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.</p
FPGA-based Acceleration of Detecting Statistical Epistasis in GWAS
AbstractGenotype-by-genotype interactions (epistasis) are believed to be a significant source of unexplained genetic variation causing complex chronic diseases but have been ignored in genome-wide association studies (GWAS) due to the computational burden of analysis. In this work we show how to benefit from FPGA technology for highly parallel creation of contingency tables in a systolic chain with a subsequent statistical test. We present the implementation for the FPGA-based hardware platform RIVYERA S6-LX150 containing 128 Xilinx Spartan6-LX150 FPGAs. For performance evaluation we compare against the method iLOCi[9]. iLOCi claims to outperform other available tools in terms of accuracy. However, analysis of a dataset from the Wellcome Trust Case Control Consortium (WTCCC) with about 500,000 SNPs and 5,000 samples still takes about 19hours on a MacPro workstation with two Intel Xeon quad-core CPUs, while our FPGA-based implementation requires only 4minutes
Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems
This is a post-peer-review, pre-copyedit version of an article published in IEEE - ACM Transactions on Computational Biology and Bioinformatics. The final authenticated version is available online at: http://dx.doi.org/10.1109/TCBB.2015.2389958[Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderatelysized datasets and to a few hours for large-scale datasets.London. Wellcome Trust; 076113London. Wellcome Trust; 08547
Replication study of ulcerative colitis risk loci in a Lithuanian-Latvian case control sample
Background: Differences between populations might be reflected in their different genetic risk maps to complex diseases, for example, inflammatory bowel disease. We here investigated the role of known inflammatory bowel disease associated single nucleotide polymorphisms (SNPs) in a subset of patients with ulcerative colitis (UC) from the Northeastern European countries Lithuania and Latvia and evaluated possible epistatic interactions between these genetic variants. Methods: We investigated 77 SNPs derived from 5 previously published genome-wide association studies for Crohn's disease and UC. Our study panel comprised 444 Lithuanian and Latvian patients with UC and 1154 healthy controls. Single marker case control association and SNP-SNP epistasis analyses were performed. Results: We found 14 SNPs tagging 9 loci, including 21q21.1, NKX2-3, MST1, the HLA region, 1p36.13, IL10, JAK2, ORMDL3, and IL23R, to be associated with UC. Interestingly, the association of UC with previously identified variants in the HLA region was not the strongest association in our study (P = 4.34 × 1023, odds ratio [OR] = 1.25), which is in contrast to all previously published studies. No association with any disease subphenotype was found. SNP-SNP interaction analysis showed significant epistasis between SNPs in the PTPN22 (rs2476601) and C13orf31 (rs3764147) genes and increased risk for UC (P = 1.64 × 1026, OR = 2.44). The association has been confirmed in the Danish study group (P = 0.04, OR = 3.25). Conclusions: We confirmed the association of the 9 loci (21q21.1, 1p36.13, NKX2-3, MST1, the HLA region, IL10, JAK2, ORMDL3, and IL23R) with UC in the Lithuanian Latvian population. SNP-SNP interaction analyses showed that the combination of SNPs in the PTPN22 (rs2476601) and C13orf31 (rs3764147) genes increase the risk for UC.publishersversionPeer reviewe
Response to Comment on "ApoE e4e4 Genotype and Mortality With COVID-19 in UK Biobank" by Kuo et al
This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.C.L.K. and D.M. are supported by an R21 grant (R21AG060018) funded by National Institute on Aging, National Institute of Health, USA. D.M. also is supported by the University of Connecticut School of Medicine.published version, accepted version (12 month embargo), submitted versio
Local genetic variation of inflammatory bowel disease in Basque population and its effect in risk prediction
[EN] Inflammatory bowel disease (IBD) is characterised by chronic inflammation of the gastrointestinal tract. Although its aetiology remains unknown, environmental and genetic factors are involved in its development. Regarding genetics, more than 200 loci have been associated with IBD but the transferability of those signals to the Basque population living in Northern Spain, a population with distinctive genetic background, remains unknown. We have analysed 5,411,568 SNPs in 498 IBD cases and 935 controls from the Basque population. We found 33 suggestive loci (p 0.68. In conclusion, we report on the genetic architecture of IBD in the Basque population, and explore the performance of European-descent genetic risk scores in this population.Samples and data used in the present work were provided by the Basque Biobank (http://www.biobancovasco.org).We want to thank Miguel Angel Vesga from the Basque Centre of Transfusion and Human Tissues for providing the access to control samples. This work was founded to MD by Gipuzkoako Foru Aldundia/Diputacion Foral de Gipuzkoa. The project that gave rise to these results rece
Recommended from our members
From Next-Generation Sequencing Alignments to Accurate Comparison and Validation of Single-Nucleotide Variants: The Pibase Software
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5–60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences
Improving efficiency in epistasis detection with a gene-based analysis using functional filters
peer reviewe
Detailed stratified GWAS analysis for severe COVID-19 in four European populations
Publisher Copyright: © The Author(s) 2022.Given the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), a deeper analysis of the host genetic contribution to severe COVID-19 is important to improve our understanding of underlying disease mechanisms. Here, we describe an extended genome-wide association meta-analysis of a well-characterized cohort of 3255 COVID-19 patients with respiratory failure and 12 488 population controls from Italy, Spain, Norway and Germany/Austria, including stratified analyses based on age, sex and disease severity, as well as targeted analyses of chromosome Y haplotypes, the human leukocyte antigen region and the SARS-CoV-2 peptidome. By inversion imputation, we traced a reported association at 17q21.31 to a ∼0.9-Mb inversion polymorphism that creates two highly differentiated haplotypes and characterized the potential effects of the inversion in detail. Our data, together with the 5th release of summary statistics from the COVID-19 Host Genetics Initiative including non-Caucasian individuals, also identified a new locus at 19q13.33, including NAPSA, a gene which is expressed primarily in alveolar cells responsible for gas exchange in the lung.Peer reviewe
HLA-DPA1*02:01~B1*01:01 is a risk haplotype for primary sclerosing cholangitis mediating activation of NKp44+ NK cells
Objective Primary sclerosing cholangitis (PSC) is characterised by bile duct strictures and progressive liver disease, eventually requiring liver transplantation. Although the pathogenesis of PSC remains incompletely understood, strong associations with HLA-class II haplotypes have been described. As specific HLA-DP molecules can bind the activating NK-cell receptor NKp44, we investigated the role of HLA-DP/NKp44-interactions in PSC. Design Liver tissue, intrahepatic and peripheral blood lymphocytes of individuals with PSC and control individuals were characterised using flow cytometry, immunohistochemical and immunofluorescence analyses. HLA-DPA1 and HLA-DPB1 imputation and association analyses were performed in 3408 individuals with PSC and 34 213 controls. NK cell activation on NKp44/HLA-DP interactions was assessed in vitro using plate-bound HLA-DP molecules and HLA-DPB wildtype versus knock-out human cholangiocyte organoids. Results NKp44+NK cells were enriched in livers, and intrahepatic bile ducts of individuals with PSC showed higher expression of HLA-DP. HLA-DP haplotype analysis revealed a highly elevated PSC risk for HLA-DPA1*02:01~B1*01:01 (OR 1.99, p=6.7×10-50). Primary NKp44+NK cells exhibited significantly higher degranulation in response to plate-bound HLA-DPA1*02:01-DPB1*01:01 compared with control HLA-DP molecules, which were inhibited by anti-NKp44-blocking. Human cholangiocyte organoids expressing HLA-DPA1*02:01-DPB1*01:01 after IFN-γ-exposure demonstrated significantly increased binding to NKp44-Fc constructs compared with unstimulated controls. Importantly, HLA-DPA1*02:01-DPB1*01:01-expressing organoids increased degranulation of NKp44+NK cells compared with HLA-DPB1-KO organoids. Conclusion Our studies identify a novel PSC risk haplotype HLA-DP A1*02:01~DPB1*01:01 and provide clinical and functional data implicating NKp44+NK cells that recognise HLA-DPA1*02:01-DPB1*01:01 expressed on cholangiocytes in PSC pathogenesis
- …