The decreasing costs and increasing speed and accuracy of DNA sample
collection, preparation, and sequencing has rapidly produced an enormous volume
of genetic data. However, fast and accurate analysis of the samples remains a
bottleneck. Here we present D4RAGenS, a genetic sequence identification
algorithm that exhibits the Big Data handling and computational power of the
Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear
algebra and statistical properties to increase computational performance while
retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield
speed and precision tradeoffs, with applications in biodefense and medical
diagnostics. The D4RAGenS analysis algorithm is tested over several
datasets, including three utilized for the Defense Threat Reduction Agency
(DTRA) metagenomic algorithm contest