The advancements of high-throughput genomics have unveiled much about
the human genome highlighting the importance of variations between
individuals and their contribution to disease. Even though numerous
software have been developed to make sense of large genomics datasets, a
major short falling of these has been the inability to cope with
repetitive regions, specifically to validate structural variants and
accordingly assess their role in disease. Here we describe our program
STEAK, a massively parallel software designed to detect chimeric reads
in high-throughput sequencing data for a broad number of applications
such as identifying presence/absence, as well as discovery of
transposable elements (TEs), and retroviral integrations. We highlight
the capabilities of STEAK by comparing its efficacy in locating HERV-K
HML-2 in clinical whole genome projects, target enrichment sequences,
and in the 1000 Genomes CEU Trio to the performance of other TE and
virus detecting tools. We show that STEAK outperforms other software in
terms of computational efficiency, sensitivity, and specificity. We
demonstrate that STEAK is a robust tool, which allows analysts to
flexibly detect and evaluate TE and retroviral integrations in a diverse
range of sequencing projects for both research and clinical purposes