The directory ftp://ftp.genomics.org.cn/pub/influenza is created to distribute data stored in the Influenza Virus Database (IVDB).
The IVDB FTP site provides data download for IV nucleotide sequences, protein sequences, CDS and corresponding UTR sequences both from BIG and other public resources.
This directory mainly contains the following files:
influenza_nucleotide_table.info --- Table with detail information of nucleotide records, not including sequences.
influenza_protein_table.info --- Table with detail information of protein records, not including sequences
influenza_cds_table.info --- Table with detail information of CDS including the correlation of CDS and protein
influenza_3utr_table.info --- Table with detail information of 3'UTR not including sequences
influenza_5utr_table.info --- Table with detail information of 5'UTR not including sequences
influenza_nucleotide.fa --- nucleotide FASTA file
influenza_cds.fa --- CDS FASTA file
influenza_protein.fa --- protein FASTA file
influenza_3utr.fa --- 3'UTR FASTA file
influenza_5utr.fa --- 5'UTR FASTA file
The influenza_nucleotide_table.info, influenza_protein_table.info and influenza_cds_table.info files are tab-delimitated tables.
The influenza_nucleotide_table.info file includes the following fields:
GenBank accession number, gi, strain, segment, type, subtype, host, organism, country, district, year, gender, age, sequence length, full_tag, class, definition
For BIG data, we nominate the ID like BIGIV00001. The field 'organism' is the concrete species of host. Taken strain A/parakeet/Narita/92A/98(H9N2) for example, its 'host' is Avian and 'organism' is parakeet. The field 'district' is the concrete province/state of the strain. Taken strain A/Chicken/Beijing/8/98(H9N2) for example, the 'country' is China and 'district' is Beijing. The field 'full_tag' denotes whether the nucleotide is full-length. The field 'class' denotes the specific sequence category according to our Q-Filter.
The influenza_protein_table.info file includes the following fields:
GenBank accession number, gi, strain, segment, type, subtype, host, organism, country, district, year, gender, age, sequence length, definition
For BIG data, we nominate the ID like BIGIVP0001 and for ISD data which do not have IDs available, we nominate the ID like ISDIVP0001. Other fields are nominated based on the same rule as the nucleotide information table.
The influenza_cds_table.info file have the following fields:
GenBank accession number for nucleotide, cds range, GenBank accession number for protein
This table provides correlation of nucleotide, CDS and protein.
The influenza_3utr_table.info file includes the following fields:
GenBank accession number for nucleotide, according CDS range, 3'UTR range
This table provides the correlation of nucleotide, CDS and the corresponding 3'UTR.
The influenza_5utr_table.info file contains the same fields of information for 5'UTR as the influenza_3utr_table.info.
For questions about files within this directory, please send an email to:firstname.lastname@example.org