Hadooping the genome: The impact of big data tools on biology

Abstract

This essay examines the consequences of the so-called ‘big data’ technologies in biomedicine. Analyzing algorithms and data structures used by biologists can provide insight into how biologists perceive and understand their objects of study. As such, I examine some of the most widely used algorithms in genomics: those used for sequence comparison or sequence mapping. These algorithms are derived from the powerful tools for text searching and indexing that have been developed since the 1950s and now play an important role in online search. In biology, sequence comparison algorithms have been used to assemble genomes, process next-generation sequence data, and, most recently, for ‘precision medicine.’ I argue that the predominance of a specific set of text-matching and pattern-finding tools has influenced problem choice in genomics. It allowed genomics to continue to think of genomes as textual objects and to increasingly lock genomics into ‘big data’-driven text-searching methods. Many ‘big data’ methods are designed for finding patterns in human-written texts. However, genomes and other’ omic data are not human-written and are unlikely to be meaningful in the same way

    Similar works