BLAST : steps for alignment - similarity search between sequences

Let's consider the query sequence : ARGHLTFYIFQLM
First step : this query sequence is cut in words of fixed size W (default : W = 3 for protein). For example : TFY

For each of these words, a list of similar words is created, using a substitution matrix. For example, the PAM 250 substitution matrix (see below).

This matrix allows to calculate a score for substitution of an amino acid into another one.

For example, comparison of the word TFY with the neighbour word TDY gives :

T changed into T gives the score (T,T) = 3

F changed into D gives the score (F,D) = -6

Y changed into Y gives the score (Y,Y) = 10

scores sum = 7

If the value of the score sum is higher than a fixed value, the word is retained in the list.

Finaly, the list of similar words is :

TDY
TWY
THY
TAY

Second step

Each word from the list of similar words is searched for similarity (a hit) against all sequences of the database.

A hit is also defined by a score value that must be higher than a fixed value.

 

Third step : the similarity is extended starting from the common word, in both directions along the matching sequence.

The extension will be finished when either :

  • the cumulated score decreases of a fixed amount "X" compared to the maximum value previously reached
  • the cumulated score becomes = 0
  • the extremity of one of the two sequences is reached
Finally, the longest fragment found is called a High Scoring Pairs (HSP).

 

 

PAM 250 substitution matrix

A low value in the matrix indicates that the amino acid considered can be replaced by the other (e.g., tryptophane W replaced by cystéine C : -8) and therefore that this part of the sequence is weakly homologuous.

On the contrary, a high value indicates a region with high homology.

A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
A
2
R
-2
6
N
0
0
2
D
0
-1
2
4
C
-2
-4
-4
-5
4
Q
0
1
1
2
-5
4
E
0
-1
1
3
-5
2
4
G
1
-3
0
1
-3
-1
0
5
H
-1
2
2
1
-3
3
1
-2
6
I
-1
-2
-2
-2
-2
-2
-2
-3
-2
5
L
-2
-3
-3
-4
-6
-2
-3
-4
-2
2
6
K
-1
3
1
0
-5
1
0
-2
0
-2
-3
5
M
-1
0
-2
-3
-5
-1
-2
-3
-2
2
4
0
6
F
-4
-4
-4
-6
-4
-5
-5
-5
-2
1
2
-5
0
9
P
1
0
-1
-1
-3
0
-1
-1
0
-2
-3
-1
-2
-5
6
S
1
0
1
0
0
-1
0
1
-1
-1
-3
0
-2
-3
1
3
T
1
-1
0
0
-2
-1
0
0
-1
0
-2
0
-1
-2
0
1
3
W
-6
2
-4
-7
-8
-5
-7
-7
-3
-5
-2
-3
-4
0
-6
-2
-5
17
Y
-3
-4
-2
-4
0
-4
-4
-5
0
-1
-1
-4
-2
7
-5
-3
-3
0
10
V
0
-2
-2
-2
-2
-2
-2
-1
-2
4
2
-2
2
-1
-1
-1
0
-6
2
4