45 research outputs found

    Accelerating epistasis analysis in human genetics with consumer graphics hardware

    Get PDF
    BACKGROUND: Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. FINDINGS: We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs 2000whileobtainingsimilarperformanceonaBeowulfclusterrequires150CPUcoreswhich,includingtheaddedinfrastructureandsupportcostoftheclustersystem,costapproximately2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately 82,500. CONCLUSION: Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe

    Novel Approach Identifies SNPs in SLC2A10 and KCNK9 with Evidence for Parent-of-Origin Effect on Body Mass Index

    Get PDF
    Marja-Liisa Lokki tyƶryhmien Generation Scotland Consortium, LifeLines Cohort Study ja GIANT Consortium jƤsenPeer reviewe

    Nanocomposites: synthesis, structure, properties and new application opportunities

    Full text link

    Genomic reconstruction of the SARS-CoV-2 epidemic in England.

    Get PDF
    The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance dataĀ generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021

    Mechanistic evidence for a front-side, SNi-type reaction in a retaining glycosyltransferase

    No full text
    A previously determined crystal structure of the ternary complex of trehalose-6-phosphate synthase identified a putative transition stateā€“like arrangement based on validoxylamine A 6?-O-phosphate and uridine diphosphate in the active site. Here linear free energy relationships confirm that these inhibitors are synergistic transition state mimics, supporting front-face nucleophilic attack involving hydrogen bonding between leaving group and nucleophile. Kinetic isotope effects indicate a highly dissociative oxocarbenium ionā€“like transition state. Leaving group 18O effects identified isotopically sensitive bond cleavages and support the existence of a hydrogen bond between the nucleophile and departing group. BrĆønsted analysis of nucleophiles and Taft analysis highlight participation of the nucleophile in the transition state, also consistent with a front-face mechanism. Together, these comprehensive, quantitative data substantiate this unusual enzymatic reaction mechanism. Its discovery should prompt useful reassessment of many biocatalysts and their substrates and inhibitor

    A novel electronic data collection system for large-scale surveys of neglected tropical diseases

    Get PDF
    BACKGROUND: Large cross-sectional household surveys are common for measuring indicators of neglected tropical disease control programs. As an alternative to standard paper-based data collection, we utilized novel paperless technology to collect data electronically from over 12,000 households in Ethiopia. METHODOLOGY: We conducted a needs assessment to design an Android-based electronic data collection and management system. We then evaluated the system by reporting results of a pilot trial and from comparisons of two, large-scale surveys; one with traditional paper questionnaires and the other with tablet computers, including accuracy, person-time days, and costs incurred. PRINCIPLE FINDINGS: The electronic data collection system met core functions in household surveys and overcame constraints identified in the needs assessment. Pilot data recorders took 264 (standard deviation (SD) 152 sec) and 260 sec (SD 122 sec) per person registered to complete household surveys using paper and tablets, respectively (Pā€Š=ā€Š0.77). Data recorders felt a lack of connection with the interviewee during the first days using electronic devices, but preferred to collect data electronically in future surveys. Electronic data collection saved time by giving results immediately, obviating the need for double data entry and cross-correcting. The proportion of identified data entry errors in disease classification did not differ between the two data collection methods. Geographic coordinates collected using the tablets were more accurate than coordinates transcribed on a paper form. Costs of the equipment required for electronic data collection was approximately the same cost incurred for data entry of questionnaires, whereas repeated use of the electronic equipment may increase cost savings. CONCLUSIONS/SIGNIFICANCE: Conducting a needs assessment and pilot testing allowed the design to specifically match the functionality required for surveys. Electronic data collection using an Android-based technology was suitable for a large-scale health survey, saved time, provided more accurate geo-coordinates, and was preferred by recorders over standard paper-based questionnaires
    corecore