In Silico Identification and Conservation Analysis of B-cell and T-Cell Epitopes of Hepatitis C Virus 3a Genotype Enveloped Glycoprotein 2 From Pakistan: A Step Towards Heterologous Vaccine Design

Background: Hepatitis C virus (HCV) is known for the eminent global disease burden responsible for encumbering public health. Development of an effective vaccine is the major need of the day; however, several obstacles loom ahead of this objective. One of the major barriers is that as a RNA virus, it mutates rapidly resulting in high sequence divergence and several viral isolates in the world. Theglycoprotein 2 (gpE2) is the primary component of HCV envelope with direct interaction with the host cell surface receptors; it is an indispensable target of neutralizing antibodies and hence, should be a fundamental component of vaccine design. Objectives: This study focused on B-cells and T-cells epitopes prediction in HCV gpE2, particularly in 3a genotype, in Pakistan and identification of the conserved epitopes among various 3a isolates at global level, principally conserved across HCV major genotypes. Materials and Methods: Epitope finding was done by using online available bioinformatics tools including Immune Epitope Database (IEDB), ProPred-I, and ProPred. Conservation of these epitopes was found by aligning selected gpE2 sequences using MultAlin online software and conservancy analysis tool available at IEDB. Results: Many B-cell and T-cell epitopes predicted in gpE2 were found conserved among HCV 3a genotypes whereas few were conserved in other genotypes anticipating these epitopes as potential candidates of producing strong B-cell and T-cell response against HCV 3a and other genotypes. Conclusions: HCV gpE2 is an ideal target for HCV vaccine. Prediction of epitope immunogenicity and characterization on the basis of peptide sequences will be significantly helpful for development of a heterologous vaccine against HCV variants.


Background
Hepatitis C virus (HCV) is a serious health problem with over 200 million people worldwide at risk of infection (1)(2)(3). In most patients, HCV remains asymptomatic for years after infection and the patients are often unaware of their infection until it becomes too late for effective treatment. About 20% to 50% of infected individuals develop progressive liver disease that ultimately leads to liver cirrhosis, liver failure and hepatocellular carcinoma (4)(5)(6). According to World Health Organization (WHO) report, HCV accounts for over 350000 deaths per year worldwide with anticipated three to four million new infections each year. Countries with the highest rates of chronic infection are Egypt (22%), Pakistan (4.8%) and China (3.2%). Unsafe medical practices in Pakistan are the main cause of enormous spread of HCV (7). More than ten million people are living with this fatal disease in Pakistan (8,9), which is an alarming number.
The HCV genomic RNA consists of a long open read-ing frame of over 9024 nucleotides and relatively short untranslated regions (UTR) at the 5′ and 3′ ends. The 5′ and 3′ UTR contain cis-acting RNA elements necessary for HCV polyprotein translation and RNA replication. Thepolyprotein is eventually cleaved by cellular peptidases and viral proteases to produce structural (core as well as envelope glycoproteins 1 [gpE1] and 2 [gpE2]) and nonstructural components (NS2, NS3, NS4, and NS5A) (10). Unfortunately, there is no available vaccine against HCV. Efforts done for developing HCV vaccine have been hindered by several factors including the prone to higherror replication of HCV (11), gpE2 gene as the most variable component of the viral genome, lack of suitable animal models and the absence of well-established in vitro knowledge of protective immunity (12). Recent studies had shown that CD4+ and CD8+ T-cell responses are essential in the control of acute HCV infection and it is suggested that neutralizing anti-HCV antibody responses

Objectives
The present study aimed to locate conserved B-cell and T-cell epitopes in HCV 3a genotype gpE2 gene cloned from HCV infected patients from Pakistan by using online bioinformatics tools including Immune Epitope Database (IEDB), ProPred-I, and ProPred. The conservation of predicted epitopes by these tools was compared among Pakistan, Asia, and the world population affected by HCV 3a and other genotypes. In addition, a specific criterion for epitope conservation was proposed in this study that might help to find out specific epitopes not only in HCV gpE2 but also in other important immunogenic genes.

Genotype 3a Envelope Glycoprotein 2 Gene Consensuses Sequence
The HCV 3a genotype consensuses sequence was done by aligning 24 different gpE2 sequences retrieved from gene bank in Pakistan (Table 1). ClustalW and MultAlin online available software were used for sequence alignment. The consensus sequence "E2PK"was then used for further applications.

The Protein Information Resource Database
PIR database The Protein Information Resource (PIR; http://pir.georgetown.edu/) database was employed to determine molecular weight, percentage of highly repeated amino acid, and the least repeated amino acid in the viral gpE2. This database is a computer-based method for the comparison of protein sequences, detection of distantly related sequences, and duplications within sequences.

VaxiJen
VaxiJen, a bioinformatics tool, was used for analyzing antigenic property of HCV 3a genotype gpE2 gene and its comparison with other HCV genes ( Table 2). VaxiJen predicts each of the HCV proteins for antigenicity property. VaxiJen is the server for alignment independent prediction of protective antigen. It allows antigen classification based onthephysiochemical properties of proteins and uses autocross-covariance (ACC) transformation of protein sequences into uniform equal-length vectors. Anti-genicity scores are shown in Tables 3, 4, 5 and 6.

ProPred-I and ProPred
Promiscuous T-cell epitopes of HCV 3a genotype gpE2 were predicted for both class I and II MHC binding by using online immune informatics tools such as ProPred-I, and ProPred. ProPred-I and ProPred epitope prediction tools cover maximum number of Human Leukocyte antigens e.g. HLA. The ProPred-I is an online tool to identify and predict the Class IMHC binding regions in protein antigens (23). It predicts binding peptides for 47 alleles. This is a matrix-based method, which also allows the prediction of the standard proteasome and immune proteasome cleavage sites in an antigenic sequence. This server helps in identifying the promiscuous T-cell epitopes. Pro-Pred server predicts Class II MHC-binding regions in an antigen sequence, using quantitative matrices proposed by Sturniolo et al. in 1999 (24).ProPred server allows predicting 57 allele-specific class II MHC-binding peptides. The server helps to determine promiscuous binding regions that are useful in selecting vaccine candidate.

Epitope Conservation Analysis
The predicted B-call and T-cell epitopes of HCV 3a genotype gpE2 from Pakistani isolates (E2PK) were subjected for conservation analysis from Pakistan, Asia, and all over the world. Conservation of these predicted epitopes was also rated for major HCV 1 to 6 genotypes worldwide. In case of T cell, only those epitopes that bind to maximum number of alleles were selected. The predicted epitopes of HCV 3a (E2PK) along with selected sequences of genotypes 3a (23 from Pakistan, 30 from Asia, and 50 from other countries) and genotypes 1 to 6 (70 from other countries) were submitted to epitope conservation analysis tool (IEDB). The epitopes with 80% to 100% conservancy were selected. Finally, all the selected conserved epitopes were analyzed for similarity with human proteome using Blast program (http://www.ncbi.nlm.nih.gov/BLAST/) to verify that these peptides will not trigger auto immunity.

Results
The HCV gpE2consensus sequence was developed using sequences reported from Pakistan. The consensus sequence was then used to predict various antigenic epitopes within this protein. Probable antigenic protein value for gpE2 was identified by VaxiJen at 0.4 threshold level   and was also compared with other HCV genes. The HCV gpE2 showed highest antigenicity value of 0.49 as compare to other HCV proteins ( Table 2) introducing gpE2 as a potential candidate for HCV vaccine development. The molecular weight of gpE2 was found to be 38042.92 KDa. Among the amino acids, glycine has the highest rep-etition rate followed by proline, leucine, threonine, serine, alanine, and valine ( Figure 1). Repetitions of amino acid residues determine the probability of a particular proteinantigenicity. Proteins frequently containing cystine, leucine, and valine are expected to have more antigenic determinants. Seventeen B-cell epitopes were predicted by IEDB in gpE2 (Table 3).
Ninety-five epitopes in gpE2 (E2PK) were predicted against 57 class II MHC-specific alleles by ProPred-I. Among them, 30 epitopes were worth of discussion (Tables 4 and  5). Epitopes R1 to R3 were found to be ≥ 90% conserved and epitopes R4 to R6 showed ≥ 80% conservation among 3a and other genotypes. Other epitopes designated as Q1 to Q16 showed ≥ 90% conservation and epitopes P1 to P6 conservation was 80% among 3a population with no conservation in other genotypes (Tables 4 and 5).
Few epitopes were predicted against 47 class I MHCspecific alleles by ProPred. Among these epitopes M1 and M2 were conserved in HCV 3a and other genotypes ( Table  6) whereas epitope M3 showed above 90% conservation  only in 3a population.
To avoid the autoimmune response, all the predicted B-cell-binding and class I as well as II MHC-binding antigenic regions was analyzed for homology with human proteome and no epitope was found to be homologous with human proteome.

Discussion
High antigenicity of the HCV gpE2 is considered as the most impending obstacle to HCV vaccine (25). Finding the right antigenic determinants or epitopes that can induce important immune response against pathogen is the major challenge to developing HCV vaccine. New advancements in sequence databases and computerbased epitope design have been known to screen out all possible epitopes that are able to provoke immune response against a particular pathogen (26). This study was designed to predict the conserve B-cell-binding and class I as well as II MHC-binding epitopes in HCV gpE2 by using computer-based in silico approach. The HCV gpE2 is found to be more immunogenic in comparison to other protein when identified by VaxiJen using alignment independent algorithm as the in silico identification of antigens was above 0.4% of threshold level. Among B-cell epitopes, B1 to B4 epitopes were found to be conserved among 3a and other genotypes (1)(2)(3)(4)(5)(6). The data showed that these epitopes might produce antibodies not only against HCV 3a genotypes but also against other genotypes. Recent in vitro studies have also confirmed that some of these epitopes produce neutralizing antibodies. Keck et al. reported that epitope present at position 410 to 425 (B1) produces neutralizing antibodies (15). Another study reported epitope I (B1) as highly conserved among the genotypes as well as the major antibody neutralization target (26). Human monoclonal antibody HCV1 also recognizes a highly-conserved linear epitope of the HCV gpE2 (amino acids 412-423) and neutralizes a broad range of HCV genotypes (27). Our In silico results are in line with this report as B1 was more than 90% conserved for all genotypes. These results are further supported by a recent report advocating B1, B3, and B4 as ideal B-cell epitopes with antigenicity ranging from 0.75 to 0.9 for 3a genotype (28); in addition, our study emphasized these epitopes as being universal for all major genotypes ( Table 3). The epitopes B5 to B7 were ≥ 90% conserved whereas certain epitopes (B8 to B11) were found to be ≥ 80% conserved among HCV 3a population; these epitopes, however, were not conserved among other genotypes (Table 3). Furthermore, antigenicity of B10 and B11 was also low (< 0.4). On the other hand, B9 and B7 had very high antigenicity of 1.2 and 1.75, respectively, and as further supported by the recent report by Idrees et al. (28), these epitopes might be valuable for 3a genotype theraputics. Antibodies to amino acids 496 to 515 were isolated by affinity binding and elution from the serum of a vaccinated chimpanzee and were found to specifically neutralize chimeric 1a/2a, 1b/2a, and 2a HCV cell culture (29). In our study, B5 seems to be a part of the mentioned epitope and showed 80% conservation for other genotypes. These results provided evidence that broadly neutralizing antibodies to HCV might protect against heterologous viral infection and suggested that a prophylactic vaccine against HCV might be achievable. Many neutralizing antibodies against HCV gpE2 gene are reported; however these antibodies differ in their mechanism of neutralization and are mostly homologus in action (17). Recent studies have proved that predicted epitopes by different online tools shows immunogenicity when experimentally checked in herpes simplex virus, influenza A virus, and Vibriomimicus (30,31). The human antibodies raised against HCV gpE2 epitopes do not offer protection against multiple viral infections; the preseason may be related to either genetic variations among viral strains particularly within the hyper variable region-1 (HVR-1), low titers of anti-gpE2 antibodies, or interference of non-neutralizing antibodies with the function of neutralizing antibodies (32). Thus, recombinant or synthetic antigens may be more efficient in inducing neutralizing antibodies to certain epitopes and screening virally infected patients may not be the best approach for finding new cross-reactive epitope.
Analysis of T-cell immune responses to gpE2 has been previously established. In silico-based immune dominant CD8+ epitopes selected for HLA-A2 and HLA-H2d had shown encouraging delayed-type hypersensitivity response in vaccinated mice (21). In this study, conserved class I and II MHC epitopes were predicted and their conservancy was checked in other genotypes. The initiative for this study was based on the information that certain T-cell epitopes on the HCV gpE2 play important role in viral clearance (33). Among T-cell epitopes, some epitopes showed ≥ 90% and ≥ 80% conservation among HCV 3a as well as other HCV genotypes. Many T-cell epitopes reported in this study showed maximum allele-binding affinity confirming them as a potential T-cell epitopes. M2 and M1 are found to be the best class I MHC epitope to be used for synthetic vaccine against multi-isotypes of HCV; R1, R3, and Q6 likewise are ideal class 2 MHC-specific epitopes with high antigenicity score and high conservancy across major genotypes. M2,Q5, and R3 are also predicted to be an ideal candidates for T-cell-based vaccine for HCV 3a genotype (29); however, we further suggested that M1, M2, and R3 were equally good for other genotypes as they are > 90% conserved across six major genotypes (Tables  4 and 5).
Thus, immune informatics tools were applied in the present study to predict the antigenicity of HCV 3a genotype gpE2 followed by prediction of its B-cell and T-cell epitope and conservancy of these epitopes among Pakistan, Asia, and other countries population infected with HCV 3a and other genotypes. In comparison to those epi-topes derived from highly variable genome region, the use of conserved epitopes among protein could provide broader protection against HCV 3a and other genotypes. Therefore, these epitopes can be used as effective vaccine candidates for Asian and other continents residents. This analysis showed that predicted epitopes can be used for vaccine design against HCV. Our results showed that most of B-cell and T-cell epitopes predicted from PKE2 showed higher conservation in 3a gpE2 gene. The conservation was observed to be higher in Pakistan followed by Asia and other countries as compared to other genotypes. These epitopes are potential candidates for genotype-specific vaccine design. We also proposed few significantly cross-reactive epitopes that can be used for vaccine development and are expected to elicit strong immune response for 3a as well as other genotypes.