About the sequence polymorphism analysis
As multiple sequence alignment is quite time-consuming, we pre-computed
genes and proteins of the different strains of influenza A virus for usersĄŻ
convenience of in-depth research. We grouped the sequences by host, subtype
and segment and performed multi-alignments between groups. The aligned
sequence groups have been manually corrected to remove redundant sequences.
Polymorphisms are presented through a graphical view of SNP distribution
plot, minor allele distribution, as well as tabular statistics on each
position versus consensus sequence. Users not only are able to search
sequence polymorphisms by host, subtype, and segment but also have instant
access to pre-made alignments, phylogenetic trees, and geographical
distributions in a world map.
For Nucleotide Sequences
There are two ways to illustrate the sequence polymorphisms. One is the SNP distribution plot and the minor allele distribution plot on each polymorphism site. The other is the consensus sequence with a list of A, T, C, G statistics on each position.
SNP is defined as the position which has different nucleoside between sequences. The major allele on SNP position is defined as the nucleoside whose number on position i is the most, the other nucleosides on this position are defined as minor allele. For example, if there are 260 As, 5 Ts and 1 C on position i, the major allele is A, the minor alleles are T and C.
The SNP distribution plot:
A part of the minor allele distribution plot:
For the second method to illustrate a sequence polymorphism, the consensus sequence is defined as the major allele sequence. If there are two major alleles, the consensus nucleoside is defined to be N. The A,T,C,G distribution picture on every position is also shown.