Improved method for predicting linear B-cell epitopes
© Larsen et al. 2006
Received: 16 February 2006
Accepted: 24 April 2006
Published: 24 April 2006
B-cell epitopes are the sites of molecules that are recognized by antibodies of the immune system. Knowledge of B-cell epitopes may be used in the design of vaccines and diagnostics tests. It is therefore of interest to develop improved methods for predicting B-cell epitopes. In this paper, we describe an improved method for predicting linear B-cell epitopes.
In order to do this, three data sets of linear B-cell epitope annotated proteins were constructed. A data set was collected from the literature, another data set was extracted from the AntiJen database and a data sets of epitopes in the proteins of HIV was collected from the Los Alamos HIV database. An unbiased validation of the methods was made by testing on data sets on which they were neither trained nor optimized on. We have measured the performance in a non-parametric way by constructing ROC-curves.
The best single method for predicting linear B-cell epitopes is the hidden Markov model. Combining the hidden Markov model with one of the best propensity scale methods, we obtained the BepiPred method. When tested on the validation data set this method performs significantly better than any of the other methods tested. The server and data sets are publicly available at http://www.cbs.dtu.dk/services/BepiPred.
Vaccines have mostly been composed of killed or attenuated whole pathogens. For safety reasons, however, it could be desirable to use peptide vaccines that are able to generate an immune response against a given pathogen . Such vaccines could contain peptides representing linear B-cell epitopes from the proteins of the pathogen. Hughes et al.  used linear B-cell epitopes to induce protective immunity in mice against P. aeruginosa. By immunizing animals, synthetic peptides containing linear B-cell epitopes can also be used to raise antibodies against a specific protein, which e.g. can be used in screening assays or as diagnostic tools .
B-cell epitopes are parts of proteins or other molecules that antibodies (made by B-cells) bind. Most protein epitopes are composed of different parts of the polypeptide chain that are brought into spatial proximity by the folding of the protein. These epitopes are called discontinuous, but for approximately 10% of the epitopes, the corresponding antibodies are cross-reactive with a linear peptide fragment of the epitope . These epitopes are denoted linear or continuous and are mainly composed of a single stretch of the polypeptide chain.
Even though linear B-cell epitopes thus are of limited relevance in the detailed understanding of a humoral immune response, identification of such linear peptide segments will often be the initial step in the search for antigenic determinants in pathogenic organisms. The traditional experimental peptide scanning approach is clearly not feasible on a genomic scale. Prediction methods are very cost effective and reliable methods for predicting linear B-cell epitopes would therefore be a first step in guiding a genome wide search for B-cell antigens in pathogenic organism.
The classical way of predicting linear B-cell epitopes is by the use of propensity scale methods. These methods assign a propensity value to every amino acid, based on studies of their physico-chemical properties. Fluctuations in the sequence of prediction values are reduced by applying a running average window. This prediction procedure was first developed by Hopp and Woods .
Pellequer et al.  compared several propensity scale methods using a data set of 14 epitope annotated proteins. They found that applying the scales by Parker et al.  (hydrophilicity), Chou and Fasman  and Levitt  (secondary structure) and by Emini et al.  (accessibility) gave slightly better results than the other scales tested.
Alix  developed a program called PEOPLE, which predicts the location of linear B-cell epitopes using combinations of propensity scale methods. Odorico  have developed a program, BEPITOPE, for predicting the location of linear B-cell epitopes using propensity scale methods.
Recently, Blythe and Flower  studied the performance of many propensity scale methods and found that even the best methods predict only marginally better than a random model. They made a thorough study using a data set of 50 epitope mapped proteins from the AntiJen web page http://www.jenner.ac.uk/AntiJen.
In this study, we have developed a novel method for predicting linear B-cell epitopes, BepiPred, which is found to perform both significantly better than random predictions as well as significantly better than a number of tested propensity scales.
Even though the present method is a significant improvement over earlier methods for predicting linear B-cell epitopes, it still has major limitations. There is a need for further improvements in predictive power before such systems become generally useful to provide reliable predictions of B-cell epitopes.
Predictions by propensity scale methods
We first tested a number of propensity scale methods on the Pellequer data set . For every scale and window size, a ROC-curve and area under it, the A roc -value, was calculated as a measure of the prediction accuracy. 1000 bootstrap samples were drawn from the predictions in order to estimate the standard error of the A roc -value, . The best scale was found to be the one by Levitt  (window size of 11, A roc = 0.658 ± 0.013). This method with will be denoted Levitt. The second best scale is the scale by Parker et al.  (window size 9, A roc = 0.654 ± 0.013), denoted Parker. The other scales, that were tested, did not perform as well as the scales by Parker et al.  and Levitt .
Performing a permutation experiment 1000 times, we estimated the P-value for the hypothesis that a method performs like a random model, where the alternative hypothesis is that it performs better than a random model. The resulting P-values for Parker and Levitt were both below 0.1%.
Predictions by hidden Markov models
Experiments were conducted in which hidden Markov models (HMMs) were used for the prediction of the location of linear B-cell epitopes. The methods were build from positive windows extracted from the AntiJen data set. The HMMs were tested on the Pellequer data set to find the optimal parameters. Different sizes of the extracted peptide windows, different weights of pseudo count correction for estimating the amino acid frequencies and different sizes of the smoothing window were tested. For the best method, the size of the extracted windows was found to be 5, the size of the smoothing window was 9 and the pseudo-count correction was 107. The performance of the method on the Pellequer data set was A roc = 0.663 ± 0.012. This method with these parameters will be denoted HMM.
Combinations of methods. Predictions on the Pellequer data set.
Weight on method 1
0.671 ± 0.013
0.669 ± 0.013
Validating the methods
Validation of the methods on the HIV data set.
0.600 ± 0.011
0.586 ± 0.011
0.586 ± 0.011
0.584 ± 0.011
0.572 ± 0.011
Sensitivities for selected specificities (both in %) for some of the methods. The data is taken from their ROC-curves, shown in Figure 1. The methods were validated on the HIV data set.
P values (in %) for the comparisons of methods. If a P-value is below the chosen significance level of 5%, the alternative hypothesis, which is that the method to the left is more accurate than the method at the top, can be accepted. The methods were validated on the HIV data set.
We have constructed a prediction method for linear B-cell epitopes using a hidden Markov model. Hidden Markov models have not been used for this specific purpose before.
Our method has a quite low sensitivity. One way of increasing the sensitivity is to lower the applied threshold, but that would also lead to a lower specificity. Pellequer et al. showed that a reduction of over-predictions could be done by combining prediction curves, and further improvements of B-cell epitope prediction methods may be obtained using similar approaches.
Pellequer et al.  have made a comparison of several propensity scales using one of the data sets in the present study: the Pellequer dataset. They made a study applying some propensity scale methods to the data set and used a fixed threshold of 0.7 s, where s is the standard deviation of the prediction values. This threshold classified the predictions as positive or negative. They found that the predictions using the different scales were better than random, complying with the findings of the present study. They compared the scales on a data set consisting of nine of the sequences and found that the scales by Parker et al. , Chou and Fasman , Levitt  and of Emini et al.  gave slightly better results than the other scales tested.
In the present study, we found that for a similar data set, the scales that performed best were constructed by Levitt  and Parker et al. . This corresponds well with the findings of Pellequer et al. .
Blythe and Flower  have found that even the best propensity scale methods perform only marginally better than a random model. They used a data set of 50 epitope mapped proteins from the AntiJen home page http://www.jenner.ac.uk/AntiJen and applied many propensity scale methods to the data.
We have tested several propensity scale methods and optimized their parameters in order to identify the best method. For the Pellequer data set, the best method was found to be the scale by  with a window size of 11. The second best propensity scale method was the scale by Parker et al.  with a window size of 7–11. This scale was intended to be used with a window size of 7 by the authors, which corresponds well with our findings.
We present a novel method for predicting linear B-cell epitopes, BepiPred. It is a combination method, made by combining the predictions of a hidden Markov model and the propensity scale by Parker et al. . We have tested different parameters in order to optimize the hidden Markov model and the propensity scale method.
We have tested the methods using the non-parametric ROC-curves and made an unbiased validation using a separate data set. We found that BepiPred had the highest prediction accuracy on the test data set, and it is shown to perform significantly better than all other methods tested on the validation data set. Comparing BepiPred with the best propensity scale methods on the validation data set, for a specificity of 80% the sensitivity for BepiPred, the scale by Parker et al.  and by Levitt  is 30.9%, 28.8% and 26.8%, respectively.
Future work could include using data from other sources, such as the Immune Epitope Database and Analysis Resource, IEDB , or the Epitome database of structurally inferred antigenic epitopes in proteins http://www.rostlab.org/services/epitome.
Three data sets of proteins with linear B-cell epitope annotation were used in these studies. All data sets were constructed by measuring the cross-reactivity between the intact protein and the peptide fragment .
The Pellequer data set
A data set was used for the tests and optimization of the methods. Since this dataset was unavailable in an electronic form it was recreated by Lund et al. . The epitope annotations were taken from Pellequer et al.  and references herein. An exception was the sequence of scorpion neurotoxin, in which the data was taken from . This data set, denoted the Pellequer data set, contains 14 protein sequences and 83 epitopes. The epitope density is 0.34.
The AntiJen data set
A second data set was used to train and build the hidden Markov model. This data set was extracted from the AntiJen database, formerly JenPep http://www.jenner.ac.uk/AntiJen. This data set, denoted the AntiJen data set, consists of 127 protein sequences, and the epitope density is 0.08. The proteins of this data set are not fully annotated, and the annotation for the non-epitope stretches is not known.
The HIV data set
A separate data set was made allowing an unbiased validation of the methods. It consists of epitopes found in the proteins of HIV taken from the HIV Molecular Immunology Database of the Los Alamos National Laboratory http://www.hiv.lanl.gov. The epitopes in this data set are overlapping to some degree. Therefore a procedure for determining more accurate borders of the minimal epitopes was applied to the epitopes. If a smaller epitope was contained as part of a larger epitope, the larger epitope was discarded from the data set. Two of the sequences had no assigned epitopes and were therefore discarded from the data set. The HIV data set consists of 10 protein sequences and the epitope density is 0.38.
Propensity scale methods
The propensity scale methods assign a propensity value to every amino acid of the query protein sequence. Fluctuations are reduced by applying a running mean window. In the N- and C- termini we used asymmetric windows to avoid discarding prediction examples. The scales used in this study are based on antigenicity , hydrophilicity , inverted hydrophobicity [21, 22], accessibility  and secondary structure [7, 8].
Hidden Markov models
Let i = (i 1, i 2, ..., i w ) denote a sequence of amino acids, which has been extracted from a protein sequence. Let j denote the position in this window, j = 1...w. On basis of i, the hidden Markov model predicts if the center position of the window is annotated as part of an epitope. In the N- and C-termini, parts of the extracted windows are exceeding the terminals. For these residues, the character 'X' is used, which does not count when the hidden Markov model is used for the predictions. The prediction score for a window is given by
which is the log odds of the residue at the center position of the window is being part of an epitope (Epitope model) as opposed to if it is occurring by chance (Random model).
To construct the Random model, background frequencies of the Swiss-Prot database , q i , is used. For the Epitope model, p i,j is the effective amino acid probability of having amino acid i at position j according to the model.
To calculate the values of p i,j , all windows, for which their center position is annotated as part of an epitope, are extracted from atraining data set. Again, if an extracted window exceeds the N or C terminal, the character 'X' is used, which does not count when calculating the parameters.
These extracted peptide windows form a matrix of aligned peptides of the width w. From this alignment, p i,j is calculated as the pseudo count corrected probability of occurrence of amino acid i in column j, estimated as in . To make the pseudo count correction, pseudo count frequencies, g i,j , are calculated. They are given by
where p k,j is the observed frequency of amino acid k in column j of the alignment . The variable b i,k is the Blosum 62 substitution matrix frequency, e.g. the frequency of which i is aligned to k .
To give an example of using (2), let the window size, w = 1. The model is then only covering residues, which are annotated as being part of linear B-cell epitopes. If the observed peptides consists of the following single amino acid sequences L and V, with the frequencies p L,1 = 0.5 and p V,1 = 0.5, then the pseudo-count frequency for e.g. I is given by
The effective amino acid frequencies are calculated as a weighted average of the observed frequency and the pseudo count frequency,
Here, α is the effective number of sequences in the alignment - 1, and β is the pseudo count correction , which is also called the weight on low counts. To finish the calculation example, let β be very large as it is in this work. Then p I,1 ≈ g I,1 = 0.14.
Note that we shall use the term hidden Markov model throughout this work to refer to the weight matrix generated using (1). The parameters of the ungapped Markov model are calculated using a so-called Gibbs sampler, written by Nielsen et al. .
The result of applying (1) is a prediction score for every residue of the query sequence. To reduce fluctuations, a smoothing window is applied to every position. It is made asymmetric in the N- and C- termini in order to conserve prediction examples.
The result of applying a prediction method to a data set is a set of prediction examples, x = (x 1, x 2, ...,x N ). Let n denote the residue number. Every x n consists of a target value and a predicted value. If the residue is annotated as part of an epitope, the target value is 1, zero otherwise. If asymmetric smoothing windows are used in the N- and C- termini, the variable N is equal to the number of residues in the data set.
According to a variable threshold, the prediction examples are classified as positives or negatives, and according to the target values, the predictions can be true or false. The predictions can be either true positives (TP), true negatives (TN), false positives (FP) or false negatives (FN).
The prediction accuracy is measured by constructing Receiver Operational Characteristics, ROC, curves . For every value of the threshold, the true positive proportion, TP/(TP+FN), and the false positive proportion, FP/(FP+TN), is calculated. A ROC-curve is constructed by plotting the false positive proportion against the true positive proportion for all values of the threshold. It is therefore a non-parametric measure.
The sensitivity is equal to the true positive proportion, and the specificity, given by TN/(FP+TN), is equal to 1 – the false positive proportion. In this way, a ROC-curve is displaying the trade-off between the sensitivity and the specificity for all possible thresholds. A good method has a high true positive proportion when it has a low false positive proportion. A such model has a high sensitivity and a high specificity. The performance of the method is measured as the area under the curve, the A roc -value. For a random prediction, the true positive proportion is equal to the false positive proportion for every value of the threshold. Then A roc = 0.5. For a perfect method, A roc = 1.
Bootstrapping is used to estimate the standard error of the A roc -value, as a measure of the uncertainty of the A roc -value . The relation between the standard error and the standard deviation, s, is that se = , where r is the number of repeats of the underlying experiment .
Bootstrapping is a method for generating pseudo-replica (bootstrap samples) of the predictions, denoted x*, which deviate a little from x. The bootstrap sample, x* = , is defined as a random sample of size N, drawn with replacement from x. Some of the prediction examples from x may appear zero times, some one time, some twice etc. Drawing a bootstrap sample can in other words be done by copying randomly chosen prediction examples, x n , from x into x*. In this way, some variation from x is introduced into x*.
A paired t-test is performed in order to determine if one method is more accurate than another. The H 0-hypothesis for this test is that two means are equal, μ 1 = μ 2. Instead of μ, and hence A roc is used. The starting point is the performance measures of the two methods, A roc,M1 and A roc,M2, where M1 denotes method 1. By bootstrapping we have the vectors and . Every bootstrap pair are drawn identically for every b, making the two A roc -values paired.
The H 0-hypothesis is therefore A roc,M1 = A roc,M2 and the alternative hypothesis A roc,M1 > A roc,M2. The test statistic t is given by
The paired difference of the b'th bootstrap samples, D b , is given by
When testing the H 0-hypothesis that a method performs like a random model, a permutation experiment can be made. The alternative hypothesis is that the method is performing better than a random model. From the predictions of the method, x, the target values are permuted to result in a new prediction set, x perm,p . This is done for p = 1...p max . For every p, the prediction accuracy is calculated as . The P-value for the H 0-hypothesis is calculated as the proportion of times for which > A roc .
- Nardin E, Calvo-Calle J, Oliveira G, Nussenzweig R, Schneider M, Tiercy J, Loutan L, Hochstrasser D, Rose K: A totally synthetic polyoxime malaria vaccine containing Plasmodium falciparum B cell and universal T cell epitopes elicits immune response in volunteers of diverse HLA types. J Immunol 2001, 166:481–489.PubMedGoogle Scholar
- Hughes E, Gilleland HJ: Ability of synthetic peptides representing epitopes of outer membrane protein F of Pseudomonas aeruginosa to afford protection against P. aeruginosa infection in a murine acute pneumonia model. Vaccine 1995,13(18):1750–1753.View ArticlePubMedGoogle Scholar
- Schellekens G, Visser H, de Jong B, van den Hoogen F, Hazes J, Breedveld F, van Venrooij W: The diagnostic properties of rheumatoid arthritis antibodies recognizing a cyclic citrullinated peptide. Arthritis Rheum 2000, 43:155–163.View ArticlePubMedGoogle Scholar
- Pellequer J, Westhof E, Van Regenmortel M: Predicting the location of continuous epitopes in proteins from their primary structure. Methods Enzymol 1991, 203:176–201.View ArticlePubMedGoogle Scholar
- Hopp T, Woods K: Prediction of protein antigenic determinants from amino acid sequence. Proc Natl Acad Sci U S A 1981,78(6):3824–3828.View ArticlePubMedGoogle Scholar
- Parker J, Guo D, Hodges R: New hydrophilicity scale derived from High-Performance Liquid Chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 1986, 25:5425–5432.View ArticlePubMedGoogle Scholar
- Chou P, Fasman G: Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 1978, (47):45–148.
- Levitt M: Conformational preferences of amino acids in globular proteins. Biochemistry 1978,17(20):4277–4285.View ArticlePubMedGoogle Scholar
- Emini E, Hughes J, Perlow D, Boger J: Induction of hepatitis A virus-neutralizing antibody by a virus specific synthetic peptide. J Virol 1985,55(3):836–839.PubMedGoogle Scholar
- Alix A: Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 1999,18(3):311–314(4).View ArticlePubMedGoogle Scholar
- Odorico M, Pellequer J: BEPITOPE: predicting the location of continuous epitopes and patterns in proteins. J Mol Recognit 2003, 16:20–22.View ArticlePubMedGoogle Scholar
- Blythe M, Flower D: Benchmarking B cell epitope prediction: Underperformance of existing methods. Protein Sci 2005, 14:246–248.View ArticlePubMedGoogle Scholar
- McSparron H, Blythe M, Zygouri C, Doytchinova I, Flower D: JenPep: a novel computational information resource for immunobiology and vaccinology. J Chem Inf Comput Sci 2003,43(4):1276–1287.PubMedGoogle Scholar
- Pellequer J, Westhof E, Van Regenmortel M: Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunol Lett 1993, 36:83–99.View ArticlePubMedGoogle Scholar
- Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fieri W, Kronenberg M, Kubo R, Lund O, Nemazee D, Ponomarenko J, Sathiamurthy M, Schoenberger S, Steward S, Surko P, Way S, Wilson S, Sette A: The Immune Epitope Database and Analysis Resource: From Vision to Blueprint. PLoS Biol 2005, 3:e91.View ArticlePubMedGoogle Scholar
- Van Regenmortel M, Muller S: Synthetic Peptides as Antigens. Laboratory Techniques in Biochemistry and Molecular Biology (Edited by: Pillai S, Van der Vliet P). Elsevier 1999., 28:
- Lund O, Nielsen M, Lundegaard C, Keşmir C, Brunak S: Immunological Bioinformatics The MIT Press 2005.
- Devaux C, Juin M, Mansuelle P, Granier C: Fine molecular analysis of the antigenicity of the Androctonus australis hector scorpion neurotoxin II: a new antigenic epitope disclosed by the Pepscan method. Mol Immunol 1993,30(12):1061–1068.View ArticlePubMedGoogle Scholar
- Korber B, Brander C, Haynes B, Koup R, Moore J, Walker B, Watkins D, (Eds): HIV Immunology and HIV/SIV Vaccine Databases 2003 Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, New Mexico 2003.Google Scholar
- Welling G, Weijer W, van der Zee R, Welling-Wester S: Prediction of sequential antigenic regions in protiens. FEBS Lett 1985,188(2):215–218.View ArticlePubMedGoogle Scholar
- Kyte J, Doolittle R: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157:105–132.View ArticlePubMedGoogle Scholar
- Cornette J, Cease K, Margalit H, Spouge J, Berzofsky J, DeLisi C: Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol 1987,195(3):659–685.View ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 2000, 28:45–48.View ArticlePubMedGoogle Scholar
- Nielsen M, Lundegaard C, Worning P, Hvid C, Lamberth K, Buus S, Brunak S, Lund O: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 2004,20(9):1388–1397.View ArticlePubMedGoogle Scholar
- Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389–3402.View ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff J: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992, 89:10915–10919.View ArticlePubMedGoogle Scholar
- Swets J: Measuring the accuracy of diagnostic systems. Science 1988,240(4857):1285–1293.View ArticlePubMedGoogle Scholar
- Efron B, Tibhirani R: An Introduction to the Bootstrap first Edition Chapman & Hall 1993.
- Johnson R: Probability and Statistics for Engineers seventh Edition Prentice Hall International, Inc. 2005.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.