|
SimFlu
A Simulation Tool for Predicting the Evolutionary Patterns of Influenza A virus
What is SimFlu?
SimFlu stands for ¡°a Simulation tool for influenza virus¡±, and it performs the sequence simulations using the codon variation patterns of influenza A viruses over time. This program can be installed on Linux and Macintosh OS X as well as Windows XP or 7, and it automatically searches the various types of input files from user input folder of working directory.
SimFlu is freely available from the ¡®PRODUCT¡¯ in SimFlu homepage. You may download the compressed SimFlu files for your operating systems.
Calculation of the SimFlu's library
SimFlu program provides pre-calculated variation parameters of influenza A virus genes between two different year-of-isolation groups as library. The source nucleotide sequences of these library files were collected from the Influenza Virus Resource (http://www.ncbi.nlm.nih.gov /genomes/FLU/) of US National Center for Biotechnology Information (NCBI). In the current version of SimFlu library (ver. 1.0), we collected the messanger RNAs (mRNAs) of 10 kinds of genes (HA, NA, NP, PA, PB1, PB2, M1, M2, NS1 and NS2) for 3 major influenza A virus subtypes (H1N1, H3N2 and H5N1). The target year-of-isolations were from 2000 to 2011, target host species were human and swine for H1N1 and H3N2, and human and avian for H5N1.
- The first step of calculating the SimFlu library files is to perform multiple sequence alignments (MSAs) among all the possible pairs of year-of-isolations between 2000 and 2011.
- When you execute the SimFlu program, you have to choose the interval type of target years as well as the initial and final target year-of-isolations in 'Library Settings' step. The interval type can be divided into 2 categories, such as type A, and type B. If you choose the ¡®type A¡¯, the time intervals between the initial (time T) and final target (time T ') years will be one-year, whereas the initial year is fixed and only the final year is increased by 1-year in ¡®type B¡¯. Next figure is the calculation process of SimFlu libraries when you choose the ¡®type A¡¯ interval time with a range between 2000 and 2011.
- Each pair of target years must be aligned together using MSA program such as ClustalW, and then their aligned output sequences are divided into 2 different files according to their year of isolations for the further process (AR2000, AR2001, ¡¦, AR2011). Just be sure that you need to save the aligned sequences in FASTA-format.
- In the second step, we counted all the possible codon variations between 2 MSA result files that contain the ¡®gap¡¯ information. Detailed comparing and counting process is described as follow.
- First of all, each sequence (Sequence #1, #2, ¡¦, #n, n = total number of sequences in ¡®First year-of-isolation¡¯) in the MSA result file named ¡®First year of isolaiton¡¯ is compared with the sequences (Sequence #1, #2, ¡¦, #p, p = total number of sequences in ¡®Second year of isolation¡¯) in the other MSA result file named ¡®Second year of isolation¡¯. As a result, total n x p times of comparisons will be conducted. Each codon in each sequence region along the aligned result of the first year of isolation is compared with that in the second year of isolation result, and counted variation is saved in the 61 x 61 matrix named codon variation matrix (CVM) in Figure 2. In the final step, all the CVMs are converted into codon transition matrix (CTM) using the Markov model. The names of calculated CTMs are encoded, and then, saved in [lib] folder as packaged with their version number.
Simulation algorithm
SimFlu is a simulation tool than create the hypothetical future nucleotides from the real influenza A virus sequence (= seed sequence) using the codon variation parameters, such as SimFlu¡¯s library or user parameter files. Detailed working process is described as follow.
- The simulation process of SimFlu begins with importing the seed sequence in units of codons. Once all the codons of seed sequence are read, SimFlu generates a random number between 0 and 1, respectively, in each codon position except for the start and termination codon, and when this task is completed, SimFlu converts each codon of the seed sequence to a new coodn which is changed by the probability of random number based on the SimFlu library or user parameter. It repeats the same process as many times as you ordered. During the working process, SimFlu creates a temporal folder named [_tmp] in your working directory to conduct many intensive works, and this folder will be removed when SimFlu finish the simulation.
- This figure is a screenshot of job processing of SimFlu. In the first line, SimFlu presents the starting time of simulation, and each simulation process is represented as a bar graph as shown above. In this case, the user uses the library of HA (hemagglutinin) gene of H1N1 subtype isolated between 2000 and 2011. Because the initial and final target years are increased equally, such as 2000-2001, 2001-2002, ¡¦, the interval type of target years is thpe ¡®A¡¯. If you select to use your own user parameter files instead of SimFlu¡¯s library, you can see your parameter file names after ¡®User Para.:¡¯. After finishing all the simulation processes, SimFlu also informs the ¡®Ending Time¡¯ and ¡®Total Processing Time¡¯.
|