Search CORE

27 research outputs found

Genetic Sequence Matching Using D4M Big Data Approaches

Author: Dodson Stephanie
Kepner Jeremy
Ricke Darrell O.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2014
Field of study

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.Comment: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 201

arXiv.org e-Print Archive

Crossref

Rapid Sequence Identification of Potential Pathogens Using Techniques from Sparse Linear Algebra

Author: Chiu Nelson
Dodson Stephanie
Kepner Jeremy
Ricke Darrell O.
Shcherbina Anna
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/01/2015
Field of study

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D

^{4}

RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D

^{4}

RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest

arXiv.org e-Print Archive

Crossref

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

Author: Helfer Brian
Kepner Jeremy
Reuther Albert
Ricke Darrell O.
Samsi Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/07/2017
Field of study

Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for Dense Generalized Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme Computing conferenc

arXiv.org e-Print Archive

Crossref

Performance of Bootstrap Embedding for long-range interactions and 2D systems

Author: Ricke Nathan Darrell
Van Voorhis Troy
Welborn Matthew Gregory
Ye Hongzhou
Publication venue: 'Informa UK Limited'
Publication date: 01/11/2016
Field of study

Fragment embedding approaches offer the possibility of accurate description of strongly correlated systems with low-scaling computational expense. In particular, wave function embedding approaches have demonstrated the ability to subdivide systems across highly entangled regions, promising wide applicability for a number of challenging systems. In this paper, we focus on the wave function embedding method Bootstrap Embedding, extending it to the Pariser–Parr–Pople and 2D Hubbard models in order to evaluate the behaviour of the method in systems that are less amenable to local fragment embedding. We find that Bootstrap Embedding remains accurate for these systems, and we investigate how fragment size, shape, and choice of matching conditions affect the results. We also evaluate the properties of Bootstrap Embedding that lead to the method's favourable convergence properties. Keywords: Embedding; correlation; Bootstrap; DMETNational Science Foundation (U.S.) (Grant CHE-1464804

DSpace@MIT

Construction of an ~700-kb transcript map around the Familial Mediterranean Fever locus on human chromosome 16p13.3

Author: Adams
Aksentijevich
Andrea Cercek
Anil Vedula
Antequera
Buckland
Calabro
Chen
Dahl
Daniel L. Kastner
Darrell O. Ricke
David F. Callen
David Krizman
Deborah Gumucio
Elizabeth Mansfield
Francis S. Collins
Geryl Wood
Huebner
Ivona Aksentijevich
Jingmei Liu
Kulp
Lancet
Melanie Hamon
Michael Centola
Nathan Fischel-Ghodsian
Neil Richards
Neta Shafran
Norman A. Doggett
Nurit Zaks
P. Paul Liu
Pras
Puder
Raman Sood
Robert I. Richards
Robert K. Moyzis
Sinoula Apostolou
Tanaz Kahan
Trevor Blake
Xiang Chen
Xiaoguang Chen
Yasuda
Yokoyama
Zuoming Deng
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/1998
Field of study

We used a combination of cDNA selection, exon amplification, and computational prediction from genomic sequence to isolate transcribed sequences from genomic DNA surrounding the familial Mediterranean fever (FMF) locus. Eighty-seven kb of genomic DNA around D16S3370, a marker showing a high degree of linkage disequilibrium with FMF, was sequenced to completion, and the sequence annotated. A transcript map reflecting the minimal number of genes encoded within the ∼700 kb of genomic DNA surrounding the FMF locus was assembled. This map consists of 27 genes with discreet messages detectable on Northerns, in addition to three olfactory-receptor genes, a cluster of 18 tRNA genes, and two putative transcriptional units that have typical intron–exon splice junctions yet do not detect messages on Northerns. Four of the transcripts are identical to genes described previously, seven have been independently identified by the French FMF Consortium, and the others are novel. Six related zinc-finger genes, a cluster of tRNAs, and three olfactory receptors account for the majority of transcribed sequences isolated from a 315-kb FMF central region (betweenD16S468/D16S3070 and cosmid 377A12). Interspersed among them are several genes that may be important in inflammation. This transcript map not only has permitted the identification of the FMF gene (MEFV), but also has provided us an opportunity to probe the structural and functional features of this region of chromosome 16.Michael Centola, Xiaoguang Chen, Raman Sood, Zuoming Deng, Ivona Aksentijevich, Trevor Blake, Darrell O. Ricke, Xiang Chen, Geryl Wood, Nurit Zaks, Neil Richards, David Krizman, Elizabeth Mansfield, Sinoula Apostolou, Jingmei Liu, Neta Shafran, Anil Vedula, Melanie Hamon, Andrea Cercek, Tanaz Kahan, Deborah Gumucio, David F. Callen, Robert I. Richards, Robert K. Moyzis, Norman A. Doggett, Francis S. Collins, P. Paul Liu, Nathan Fischel-Ghodsian and Daniel L. Kastne

Crossref

Adelaide Research & Scholarship

Defined Mixtures Set 1

Author: Ricke Darrell
Publication venue: Harvard Dataverse
Publication date
Field of study

Defined mixtures of 2 to 5 contributors for 3K and 39K SNP Estonian profile

Harvard Dataverse Network

Fast P(RMNE) Data

Author: Ricke Darrell
Publication venue: Harvard Dataverse
Publication date
Field of study

Data associated with Fast P(RMNE) article on rapid high precision random man not excluded calculation

Harvard Dataverse Network

11 Million SNP Profiles datasets

Author: Ricke Darrell
Publication venue: Harvard Dataverse
Publication date
Field of study

High throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) provides additional applications for DNA forensics including identification, mixture analysis, kinship prediction, and biogeographic ancestry prediction. Public repositories of human genetic data are being rapidly generated and released, but the majorities of these samples are de-identified to protect privacy, and have little or no individual metadata such as appearance (photos), ethnicity, relatives, etc. A reference in silico dataset has been generated to enable development and testing of new DNA forensics algorithms. This dataset provides 11 million SNP profiles for individuals with defined ethnicities and family relationships spanning eight generations with admixture for a panel with 39,108 SNPs

Harvard Dataverse Network

Two Different Antibody-Dependent Enhancement (ADE) Risks for SARS-CoV-2 Antibodies

Author: Ricke Darrell O.
Publication venue: 'Frontiers Media SA'
Publication date: 01/12/2020
Field of study

COVID-19 (SARS-CoV-2) disease severity and stages varies from asymptomatic, mild flu-like symptoms, moderate, severe, critical, and chronic disease. COVID-19 disease progression include lymphopenia, elevated proinflammatory cytokines and chemokines, accumulation of macrophages and neutrophils in lungs, immune dysregulation, cytokine storms, acute respiratory distress syndrome (ARDS), etc. Development of vaccines to severe acute respiratory syndrome (SARS), Middle East Respiratory Syndrome coronavirus (MERS-CoV), and other coronavirus has been difficult to create due to vaccine induced enhanced disease responses in animal models. Multiple betacoronaviruses including SARS-CoV-2 and SARS-CoV-1 expand cellular tropism by infecting some phagocytic cells (immature macrophages and dendritic cells) via antibody bound Fc receptor uptake of virus. Antibody-dependent enhancement (ADE) may be involved in the clinical observation of increased severity of symptoms associated with early high levels of SARS-CoV-2 antibodies in patients. Infants with multisystem inflammatory syndrome in children (MIS-C) associated with COVID-19 may also have ADE caused by maternally acquired SARS-CoV-2 antibodies bound to mast cells. ADE risks associated with SARS-CoV-2 has implications for COVID-19 and MIS-C treatments, B-cell vaccines, SARS-CoV-2 antibody therapy, and convalescent plasma therapy for patients. SARS-CoV-2 antibodies bound to mast cells may be involved in MIS-C and multisystem inflammatory syndrome in adults (MIS-A) following initial COVID-19 infection. SARS-CoV-2 antibodies bound to Fc receptors on macrophages and mast cells may represent two different mechanisms for ADE in patients. These two different ADE risks have possible implications for SARS-CoV-2 B-cell vaccines for subsets of populations based on age, cross-reactive antibodies, variabilities in antibody levels over time, and pregnancy. These models place increased emphasis on the importance of developing safe SARS-CoV-2 T cell vaccines that are not dependent upon antibodies

DSpace@MIT

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)