Development of Global Consensus Sequence and Analysis of Highly Conserved Domains of the HCV NS5B Prote in

Background Hepatitis C virus (HCV) is a plus stranded RNA virus which encodes 10 different genes. The HCV NS5B gene encodes a polymerase, which is responsible for the replication of the virus and is a potential target for the development of antiviral agents. HCV has a high mutation rate and is classified into six major genotypes. Objectives The aim of this study was to draw a representing consensus sequence of each HCV genotype, align all six consensus sequences to draw a global consensus sequence and also study the highly conserved residues. Materials and Methods 236 HCV NS5B sequences, belonging to all six genotypes, reported from all over the world were aligned then a representing phylogenetic tree wasdrawn. Results The active site residues D220, D225, D318 and D319, which bind the divalent cations, are highly conserved among all the HCV genotypes. The other catalytic pocket residues, R158, S367, R386, and T390 and R394, which interact with the triphosphate of NTPs, are also highly conserved while T390 is mutated to valine in the genotype 5. The motif B residues G283, T286, T287 and N291, which take part in sugar selection by RdRp, are also highly conserved except for T286 which is mutated to proline in the genotypes 3 and 6. The residues E18, Y191, C274, Y276 and H502, which take part in primer/template interaction, are also high conserved except for H502 which is mutated to serine in genotype 2. High variation in all the six consensus sequences was observed in a 12 amino acid beta hairpin loop, which interacts with the double stranded RNA. Nine different peptides from the highly conserved regions of HCV NS5B protein were drawn which can be used as a peptide vaccine. The HCV NS5B phylogenetic tree shows the clusters of different genotypes and their evolutionary association. Conclusions In spite of a high mutation rate in HCV, the residues which are present in the catalytic pocket, sugar selection and template/primer interaction are highly conserved. These are target sites for the development of antiviral agents or peptide vaccines. The phylogenetic analysis suggests that different HCV genotypes have been evolved from the genotype 1a.


Implication for health policy/practice/research/medical education:
HCV NS5B is a potent target for the development of antiviral agents. Different genotypes of HCV respond differently with nonnucleoside inhibitors. We developed a HCV NS5B consensus sequence. The consensus sequence will aid in future for the screening of antiviral agents. The drug that will show response to consensus sequence will have very high chances to show response against all the genotypes of HCV.

Background
Hepatitis C virus (HCV) was discovered in 1989 as a causative agent of non-A non-B hepatitis which belongs to the Flaviviridae family. About 200 million people are living with HCV, involves about 3.3% of the world's population (1). Most patients with persistent infection of HCV develop chronic hepatitis, fibrosis and even liver cancer (2,3). HCV has been classified into different genotypes based on at least 67% similarity of nucleotide sequences. There is a strong association between HCV genotypes and both responses to interferon treatment and the degree of clinical progression of chronic HCV infection (4). HCV has six major genotypes and their distribution patterns depend on geographic area and transmission routes (5). HCV comprises a genome of about 9.6 kb, with a single open reading frame of about 3000 amino acids, flanked by 5' and 3' untranslated regions. The HCV 5'NTR is 341 bp long and acts as an internal ribosomal entry site. The HCV polyprotein is cleaved co and posttranslational into 10 different proteins. The structural proteins result from cleavage in the N terminal portion of the polyprotein. Two viral proteases mediate downstream cleavage to produce nonstructural proteins. NS3 acts as a protease and NS5B is an HCV RNA dependent RNA polymerase (6). The HCV NS5B polymerase contains the classic fingers, palm and thumb subdomains of a polymerase. The fingers subdomain interacts with the incoming nucleoside triphosphate, as well as with the template base to which it is paired. The palm subdomain is the catalytic center for the nucleotidyl transfer reaction and the thumb subdomain plays a role in positioning the RNA for initiation and elongation (7). NS5B is a potent target for designing antiviral strategies. Currently no vaccine is available for HCV. Different strategies and concepts for vaccination have been used in the last decade. Many studies have been performed on rodents, chimpanzees and human beings. The first approach used in humankind for HCV vaccination was a peptide-based vaccine. HCV vaccination is based on two different concepts in clinical settings. One concept is the use of a preventive vaccine for healthy people to prevent them from being infected, and the second concept is the use of a therapeutic vaccine for the treatment of already infected patients. Preventive vaccinations against HCV were used to induce an immune response in healthy people, including the generation of antigen-specific T cells (8).

Objectives
The aim of the present study was to draw a global consensus sequence of the NS5B protein of HCV, study the highly conserved residues and draw a phylogenetic tree.

Peptides Designing and Phylogenetic Analysis
The consensus sequences of all the six HCV genotypes were drawn in CLC workbench software. These consensus sequences were aligned in the CLC workbench to get the global consensus sequence. The consensus sequence was used to study variations in different motifs and domains of the HCV NS5B. Short peptides from the highly conserved regions of the HCV NS5B protein were selected from the consensus sequence analysis; these peptides are the best targets to be tested as a potential peptide vaccine. To draw a phylogenetic tree of the HCV NS5B gene belonging to different genotypes we used 236 sequences; 60 sequences were from the genotype 1, 52 sequences from the genotype 2, 16 sequences from the genotype 3, 41 sequences from the genotype 4, 17 sequences from the genotype 5 and 50 sequences from the genotype 6. All 236 sequences were first aligned in the CLC workbench software and the aligned file was then subjected to the UPGMA method to draw a phylogenetic tree by UPGMA method.

Results
We have drawn the HCV NS5B consensus sequence of each HCV genotype. All the consensus sequences were aligned to study the residues which were highly conserved among all the genotypes. Figure 1 shows the alignment of the consensus sequence of all the six HCV genotypes; the global consensus sequence is shown at the base. Conserved residues are shown with their corresponding symbols while the highly variable amino acids are denoted by "x" symbol. The alignment of all the consensus sequences will help us to study the highly conserved residues in the HCV NS5B protein. Short peptides of 9 to 18 amino acids were designed from the highly conserved regions of the HCV NS5B consensus protein sequences; the sequence and position of these peptides are shown in Table 1. These are the positions which are highly conserved and are the targets to design peptide vaccines or site specific inhibitors. A phylogenetic tree of 236 HCV NS5B sequences belonging to the all six genotypes reported from all over the world was constructed using the UPGMA method in CLC work bench software as shown in Figure 2. A default value of 100 was used in bootstrap analysis and the values are present at each branch. Sequences from different genotypes are clustered together. The tree shows that the different HCV genotypes have been evolved from the genotype 1a.

Discussion
The HCV NS5B protein contains palm, fingers and thumb subdomains. The palm region contains five different motifs A to E, which play a major role in the polymerization ability of HCV polymerase. Motif A contains 212 to 234 amino acids, including the D220-X4-D225 region, which forms the catalytic pocket. D220 and D225 are the residues which are responsible for binding with the magnesium ions. Mutations of D220 to glycine or cysteine completely abolish the NS5B function (9)(10)(11). Consensus sequence analysis shows that this region is highly conserved among all the HCV genotypes. Motif B contains G283, T286, T287 and N291 and takes part in sugar selection by RdRp (10). The consensus sequence alignment shows that G283, T287 and N291 are highly conserved among all the genotypes while T286 is mutated to proline in genotype 3 and 6. It is reported that the mutation in G283 and T287 completely abolish the HCV NS5B function (9,10). Motif C contains the highly conserved GDD motif; the consensus sequence alignment shows that this motif is highly conserved among all the genotypes of HCV. The first aspartate binds the second divalent cation and mutation in this motif is not tolerated, resulting in the abolishment of RdRp function (10). Motif D contains 326 to 347 amino acids which forms the palm core structure. Consensus sequence analysis shows that R345 is highly conserved among all the HCV genotypes; mutation of arginine to lysine increases the RdRp activity to 152% compared to the wild type NS5B (9,10). Motif E contains 360 to 376 hydrophobic amino acids which forms the interaction of palm with thumb. Consensus sequence analysis shows that this motif is highly conserved among all the HCV genotypes. Consensus sequence analysis shows that the catalytic pocket residues D220, D225, D318, D319, which are responsible for binding with divalent cations, are highly conserved. The other catalytic pocket residues R158, S367, R386, T390 and R394, which interact with NTP triphosphates (11), are highly conserved among all the HCV NS5B consensus sequences except for the T390 which is mutated to valine in the genotype 5. A 12 amino

Position of Peptides
Sequence of Peptides acid long beta hairpin loop is present in the HCV NS5B protein which protrudes from the enzyme active site. This loop interferes with binding to the double stranded RNA due to steric hindrance (11,12). The consensus sequence analysis shows that this loop is highly variable among all the HCV genotypes. It is reported that E18, Y191, C274, Y276 and H502 are important for interaction of template/primer (11). Consensus sequence analysis shows that these residues are highly conserved among all the genotypes of HCV except for the H502 which is mutated to serine in the genotype 2. Also D225, R48, R158, R386, R394 and S367 are the amino acids which interact with the initiating GTP (13); consensus sequence analysis shows that these residues are highly conserved among all six HCV genotypes. In this study we have drawn a phylogenetic tree of 236 HCV NS5B sequences reported from different countries of the world. The tree was constructed by the UPGMA method as shown in Figure 2. The tree show that the genotype 1a occupies the root of the tree, and the first genotype evolved from the genotype 1a was the genotype 1b. The genotype 1b bifurcates in two wings; from one wing the genotypes 3, 4 and 6 evolved and from the second the genotypes 2 and 5. Sarwer et al., drew a phylogenetic tree of 346 HCV NS4a sequences reported from all over the world (14) and reported that different HCV genotypes have been evolved from the genotype 1b, while our study suggests evolution from the genotype 1a. The difference may be due to the less variation in HCV NS4a sequences compared to the HCV NS5B sequences. Our previous study of some core sequence phylogenetic analyses suggested that the Pakistani core sequences have evolutionary associations with sequences reported from Japan (15). The variation in phylogenetic analysis of different HCV genes might be due to different sequence variations and mutation patterns. Our study suggests that there are certain stretches of amino acids which take part in binding with divalent cations, sugar selection and template/primer interaction and are highly conserved. These conserved residues are a potential target for the development of antiviral agents or peptide vaccines. Phylogenetic analysis suggests that different HCV genotypes have been evolved from the genotype 1a.