|
|
BLAST : steps for alignment - similarity search between sequences |
| Let's consider the query sequence : ARGHLTFYIFQLM |
| First
step : this query sequence is cut in words
of fixed size W (default : W = 3 for protein). For
example : TFY
For each of these words, a list of similar words is created, using a substitution matrix. For example, the PAM 250 substitution matrix (see below). This matrix allows to calculate a score for substitution of an amino acid into another one. |
|
![]() |
|||||||||
|
Second step Each word from the list of similar words is searched for similarity (a hit) against all sequences of the database. A hit is also defined by a score value that must be higher than a fixed value. |
|
|
Third step : the similarity is extended starting from the common word, in both directions along the matching sequence. The extension will be finished when either :
Finally, the longest fragment
found is called a High
Scoring Pairs (HSP).
|
|
|
PAM 250 substitution matrix A low value in the matrix indicates that the amino acid considered can be replaced by the other (e.g., tryptophane W replaced by cystéine C : -8) and therefore that this part of the sequence is weakly homologuous. On the contrary, a high value indicates a region with high homology. |
|
A
|
R
|
N
|
D
|
C
|
Q
|
E
|
G
|
H
|
I
|
L
|
K
|
M
|
F
|
P
|
S
|
T
|
W
|
Y
|
V
|
|
|
A
|
2
|
|||||||||||||||||||
|
R
|
-2
|
6
|
||||||||||||||||||
|
N
|
0
|
0
|
2
|
|||||||||||||||||
|
D
|
0
|
-1
|
2
|
4
|
||||||||||||||||
|
C
|
-2
|
-4
|
-4
|
-5
|
4
|
|||||||||||||||
|
Q
|
0
|
1
|
1
|
2
|
-5
|
4
|
||||||||||||||
|
E
|
0
|
-1
|
1
|
3
|
-5
|
2
|
4
|
|||||||||||||
|
G
|
1
|
-3
|
0
|
1
|
-3
|
-1
|
0
|
5
|
||||||||||||
|
H
|
-1
|
2
|
2
|
1
|
-3
|
3
|
1
|
-2
|
6
|
|||||||||||
|
I
|
-1
|
-2
|
-2
|
-2
|
-2
|
-2
|
-2
|
-3
|
-2
|
5
|
||||||||||
|
L
|
-2
|
-3
|
-3
|
-4
|
-6
|
-2
|
-3
|
-4
|
-2
|
2
|
6
|
|||||||||
|
K
|
-1
|
3
|
1
|
0
|
-5
|
1
|
0
|
-2
|
0
|
-2
|
-3
|
5
|
||||||||
|
M
|
-1
|
0
|
-2
|
-3
|
-5
|
-1
|
-2
|
-3
|
-2
|
2
|
4
|
0
|
6
|
|||||||
|
F
|
-4
|
-4
|
-4
|
-6
|
-4
|
-5
|
-5
|
-5
|
-2
|
1
|
2
|
-5
|
0
|
9
|
||||||
|
P
|
1
|
0
|
-1
|
-1
|
-3
|
0
|
-1
|
-1
|
0
|
-2
|
-3
|
-1
|
-2
|
-5
|
6
|
|||||
|
S
|
1
|
0
|
1
|
0
|
0
|
-1
|
0
|
1
|
-1
|
-1
|
-3
|
0
|
-2
|
-3
|
1
|
3
|
||||
|
T
|
1
|
-1
|
0
|
0
|
-2
|
-1
|
0
|
0
|
-1
|
0
|
-2
|
0
|
-1
|
-2
|
0
|
1
|
3
|
|||
|
W
|
-6
|
2
|
-4
|
-7
|
-8
|
-5
|
-7
|
-7
|
-3
|
-5
|
-2
|
-3
|
-4
|
0
|
-6
|
-2
|
-5
|
17
|
||
|
Y
|
-3
|
-4
|
-2
|
-4
|
0
|
-4
|
-4
|
-5
|
0
|
-1
|
-1
|
-4
|
-2
|
7
|
-5
|
-3
|
-3
|
0
|
10
|
|
|
V
|
0
|
-2
|
-2
|
-2
|
-2
|
-2
|
-2
|
-1
|
-2
|
4
|
2
|
-2
|
2
|
-1
|
-1
|
-1
|
0
|
-6
|
2
|
4
|