We pick up where we left off from the previous lecture about BLAST and further dive into its particulars. BLAST, similar to other local alignment processes are based on the ‘Seed’, ‘Extend’, and ‘Report ’ paradigm. Since the 1990’s these revolutionary ideas have been accepted as notably fast methods for scientist to find similarities between a given sequence to all other, already decoded, biological sequences. Step 1: Seed – First, we need to construct a dictionary of either all the words or all the databases of protein sequences. In this context, a word is a short subsequence consisting of a few consecutive letters in alignment. Then we search for word matches between the query and database, which is accomplished in practice in linear time. Step 2: Extend – Second use quick local alignment to extend to the left and right of each word to find the best local alignment score. These methods are usually simple in nature and variations of their algorithms correlate with speed and sensitivity. A simple extension approach (Figure 1) adds to the left and right of a word sequence without gaps. The extension stops when enough mismatches lowers the current score to be a certain constant ‘C ’ below the best alignment score found so far
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.