Skip to main content

Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries



It has been previously shown that combinatorial peptide libraries are a useful tool to characterize the binding specificity of class I MHC molecules. Compared to other methodologies, such as pool sequencing or measuring the affinities of individual peptides, utilizing positional scanning combinatorial libraries provides a baseline characterization of MHC molecular specificity that is cost effective, quantitative and unbiased.


Here, we present a large-scale application of this technology to 19 different human and mouse class I alleles. These include very well characterized alleles (e.g. HLA A*0201), alleles with little previous data available (e.g. HLA A*3201), and alleles with conflicting previous reports on specificity (e.g. HLA A*3001). For all alleles, the positional scanning combinatorial libraries were able to elucidate distinct binding patterns defined with a uniform approach, which we make available here. We introduce a heuristic method to translate this data into classical definitions of main and secondary anchor positions and their preferred residues. Finally, we validate that these matrices can be used to identify candidate MHC binding peptides and T cell epitopes in the vaccinia virus and influenza virus systems, respectively.


These data confirm, on a large scale, including 15 human and 4 mouse class I alleles, the efficacy of the positional scanning combinatorial library approach for describing MHC class I binding specificity and identifying high affinity binding peptides. These libraries were shown to be useful for identifying specific primary and secondary anchor positions, and thereby simpler motifs, analogous to those described by other approaches. The present study also provides matrices useful for predicting high affinity binders for several alleles for which detailed quantitative descriptions of binding specificity were previously unavailable, including A*3001, A*3201, B*0801, B*1501 and B*1503.


T cells recognize a complex formed between a major histocompatibility complex (MHC) molecule and an antigenic peptide, or epitope. The identification of T cell epitopes is crucial to facilitate the study of the correlates of immunity. Different MHC molecules are associated with different peptide binding specificities, usually referred to as MHC peptide binding motifs. A large body of literature relates to the definition of MHC binding motifs for class I molecules of several different species, including humans, mice, chimpanzees and macaques (see, e.g., [1], for review). In general, class I MHC molecules recognize peptides of 9 to 10 residues in length and carrying residues with similar physiochemical specificity at main anchor positions. Typically, the main anchors are found in position 2 and at the C-terminus of the peptide ligand, although other anchor arrangements have been described for several alleles.

A variety of different methods are available to define MHC peptide binding motifs, each associated with its own advantages and disadvantages. The most common methods involve the pool sequencing of naturally presented MHC ligands or the evaluation of the binding capacity of individual peptide libraries. The pool sequencing approach is based on the bulk sequencing of peptides naturally bound to MHC following their elution with acidic buffers from the MHC peptide binding site. This is a remarkably simple and effective method, and has been applied with success in dozens of instances [1]. It immediately and reliably identifies the most dominant binding requirements of an MHC molecule. An additional unique advantage of this approach is the fact that it is based on the characterization of physiologically processed ligands. Disadvantages associated with this method are that it is only semi-quantitative, and typically identifies only the most canonical (stringent) motifs. This can be a drawback in terms of utilizing this method for epitope predictions, since it has been shown that many dominant epitopes do not carry canonical pool sequencing defined motifs. For example, the prototypical dominant human leukocyte antigen (HLA) A*0201 restricted influenza matrix 58–66 epitope (sequence GILGFVFTL) [2, 3] does not contain the main anchor pattern associated with the HLA-A*0201 pool sequencing motif, which specifies the presence of L or M in position 2. Indeed, in a recent study we observed that 57% (8/14) of the HLA-A*0201 restricted vaccinia-derived epitopes identified did not conform with the A*0201 motif derived by pool sequencing analysis [4].

The most common alternative method for defining motifs is based on establishing quantitative MHC binding assays in vitro, and then testing series of individual peptides. These peptides are either single substitution analogs of high affinity binding epitopes or ligands, or large libraries of unrelated peptides. This method allows detailed probing of the relative role and chemical specificity of each position along the peptide sequence. A concern over this method when relying solely on single substitution analogs relates to the fact that it might reflect a binding mode specific to the particular parent peptide utilized as "wild type", although in practice the specificity patterns identified by single substitution analysis typically correspond well with those identified by other methodologies [519]. The same binding assay approach can be used to test large libraries of unrelated peptides (typically 100 or more) of a given size, and all carrying acceptable main anchor residues. As each peptide represents a unique sequence, this approach overcomes the concern associated with the single substitution approach that any pattern identified is dependent on the context of the specific "wild type" ligand.

Affinity data from individual peptides can be analyzed with different computational approaches to derive quantitative motifs that elucidate both primary and secondary influences on binding capacity with great detail (see, e.g., [1, 9, 2038]). Predictions based on this type of data can give very accurate quantitative approximations of peptide binding, and can discriminate between candidate ligands bearing the same main anchor motifs. The most significant drawback of this approach is that it is dependent upon the availability of panels of several hundreds of allele specific peptides. As a result, this approach can be relatively labor intensive and expensive. Also, the selection of peptide sequences can introduce biases into the training data, for example by over or under representing residues at specific sequence positions.

An alternative approach to characterize the binding specificity of MHC molecules is based on the use of positional scanning combinatorial peptide libraries. Such libraries consists of combinatorial mixtures of large numbers of different peptides all sharing a single residue at a certain position. Measuring the affinity of such a library effectively evaluates the average influence of the shared residue on binding in a diverse set of surrounding sequences. Thus, an estimate of the binding contribution of all 20 residues in a 9-mer peptide can be derived by measuring the affinity of a set of 180 mixtures. This approach has been utilized successfully to determine specificity for several different applications, including analyses of the specificities associated with T cell receptor (TCR) recognition [39], proteosomal cleavage [40], and transporter associated with antigen processing (TAP) transport [41], as well as the identification of T cell epitopes [42, 43]. Their efficacy in characterizing MHC binding specificity was first explored in several studies starting over a decade ago [32, 4446]. Matrices derived from analysis of combinatorial libraries have been found to perform well in the prediction of peptides with high MHC binding affinity [32, 47, 48]. Buus, in his visionary review of MHC studies, proposed the systematic use of combinatorial libraries a "Human MHC Project", directed at a complete mapping of human immune reactivities [49, 50].

Like the single substitution or peptide library approaches, data generated from positional scanning combinatorial library studies provide quantitative motifs. The unique advantage of using positional scanning combinatorial libraries is that they can be re-used for every allele, representing potentially very significant cost savings. Retesting the same probes for each allele also removes the risk of introducing bias into the set of tested ligands. These advantages have led us to systematically apply the use of combinatorial libraries to a set of 19 class I MHC alleles. In this large scale evaluation, we test if this approach works uniformly across different alleles. We compare its prediction performance to that of bioinformatics machine learning algorithms. We also developed a heuristic approach to convert the combinatorial library affinity data into a classical representation of primary and secondary anchor positions, which makes them directly comparable to those obtained in pool sequencing. Finally, we test the ability of these matrices in practical applications to identify MHC binding peptides and T cell epitopes.


Positional scanning combinatorial libraries and peptide synthesis

The combinatorial library was synthesized as previously described [51]. Each pool in the library contains 9-mer peptides with one fixed residue at a single position. With each of the 20 naturally occurring residues represented at each position along the 9-mer backbone, the entire library consisted of 180 peptide mixtures.

Peptides utilized in screening studies were synthesized as described elsewhere [16], or purchased as crude material from Mimotopes (Minneapolis, MN/Clayton, Victoria, Australia), Pepscan Systems B.V. (Lelystad, Netherland) or A and A Labs (San Diego, CA). Peptides synthesized for use as radiolabeled ligands were synthesized by A and A Labs and purified to >95% homogeneity by reverse phase HPLC. Purity of these peptides was determined using analytical reverse-phase HPLC and amino acid analysis, sequencing, and/or mass spectrometry. Peptides were radiolabeled with the chloramine T method [52]. Lyophilized peptides were re-suspended at 4–20 mg/ml in 100% DMSO, then diluted to required concentrations in PBS +0.05% (v/v) nonidet P40 (Fluka Biochemika, Buchs, Switzerland).

MHC purification and peptide binding assays

MHC purification and quantitative binding assays based on the inhibition of binding of a high affinity radiolabeled ligand were performed essentially as described elsewhere [18, 52]. HLA A*0201, A*6802, B*0702, B*0801, B*2705, B*3501, B*5101, B*5301, and B*5401 molecules were purified from EBV transformed homozygous B cell lines, as previously described [15, 16, 18, 5255]. For A*3201, B*1501, B*5801 and B*5802, the WT47, SPACH, AP and 35841 cell lines were utilized, respectively. A*3001 molecules were obtained from the RSH cell line, or kindly provided by Dr. Soren Buus. B*1503 molecules were purchased from Pure Protein L.L.D. (Oklahoma City, OK), or kindly provided by Dr. Soren Buus. All HLA cell lines are from the IHWG cell bank (Fred Hutchinson Cancer Research Center). Mouse class I molecules were purified from P815 (H-2 Dd and Kd), CH27 (H-2 Kk), or EL-4 (H-2 Db) lines, as previously described [10, 52].

For the B*1501, B*1503, A*3201 and A*3001 assays, the artificial sequences AQIDNYNKF (peptide 3128.0001), YQAVVPLVY (peptide 3054.0065), RILHNFAYSL (peptide 1454.42) and KTKDYVNGL (peptide 1428.02) were utilized as the radiolabeled probes, respectively. Radiolabeled ligands for all other assays were as previously described [15, 16, 18, 5255]. In competition assays, each mixture or individual peptide was tested in 3 or more independent experiments for its capacity to inhibit the binding of the radiolabeled peptide. The concentration of peptide yielding 50% inhibition of the binding of the radiolabeled peptide was calculated. Under the conditions utilized, where [label] < [MHC] and IC50 ≥ [MHC], the measured IC50 values are reasonable approximations of KD.

Bioinformatic analysis

IC50 nM values for each mixture were standardized as a ratio to the geometric mean IC50 nM value of the entire set of 180 mixtures, and then normalized at each position as previously described [17, 18] so that the value associated with the optimal value at each position corresponds to 1. For each position, an average (geometric) relative binding affinity (ARB) was calculated, and then the ratio of the ARB for the entire library to the ARB for each position was derived. We have denominated this ratio, which describes the factor by which the normalized geometric average binding affinity associated with all 20 residues at a specified position differs from that of the average affinity of the entire library, as the specificity factor (SF). As calculated, positions with the highest specificity will have the highest SF value. Primary anchor positions were then defined as those associated with an SF > 2.4. This criterion identifies positions where the majority of residues are associated with significant decreases in binding capacity. Secondary anchors were identified based on the standard deviation of residue specific values at each position.

To identify predicted binders, all possible 9-mer peptides in vaccinia WR sequences were scored using the matrix values, where the final score for each peptide represents the product of the matrix value for the corresponding residue at each position. Algorithms derived by combining positional scanning combinatorial library and individual peptide data sets were generated using the stabilized matrix method (SMM) approach, as previously described [56].

Characteristics of the Study Population

Healthy males and females between 25 and 49 years of age were used in this study. Exclusion criteria were body weight of <45.4 kg and/or established pregnancy. Institutional Review Board approval and appropriate consent were obtained.

Peripheral Blood Mononuclear Cell (PBMC) Isolation and HLA Typing

PBMCs were isolated from heparinized blood by gradient centrifugation with a Histopaque-1077 (catalogue no. H8889, Sigma) [57], and the cells were cryopreserved in liquid nitrogen in 10% DMSO/FBS. Each donor's PBMCs were typed for HLA-A and -B by high-resolution PCR (Atria Genetics, San Francisco, CA).

Ex Vivo Primary ELISPOT Assay

Peptides were synthesized, and divided into groups according to their predicted HLA-A and HLA-B-restriction. PBMCs from individuals with the corresponding haplotype were incubated at 2 × 105 per well in the presence of individual peptides at 10 μg/ml, or a control pool with 24 peptides derived from commonly encountered pathogens (EBV, CMV, and influenza A virus) [58, 59]. The ELISPOT assays were performed as described previously [60]. Responses against DMSO alone were subtracted from the experimental values. To assess statistical significance, a one-tailed Student t test was performed in which the triplicate values of each condition were compared with those of the negative controls. The criteria for positivity in a single experiment was set to ≥ 20 net spot-forming cells (SFCs)/106, a stimulation index (SI) ≥ 2.0, and p ≤ 0.05. Each experiment was performed twice. Epitopes were defined as peptides giving a positive response in 2/2 experiments using PBMC from a single donor.


Evaluation of the positional scanning combinatorial library approach for predicting HLA A*0201 binding peptides

Previous studies in other laboratories have demonstrated that the combinatorial approach performs well in predicting binders to several murine MHC class I molecules [32, 46, 47]. To verify that the same holds for human MHC molecules, we initially used the positional scanning combinatorial library with the best characterized human allele HLA A*0201, for which detailed primary and secondary anchor motifs have been described (see, e.g., [3, 9, 16, 34, 61, 62]). Also, several different predictive methods for this allele are widely available, and have been rigorously tested and compared (see, e.g., [36, 47]).

Previously, we compared the efficacy of several prediction approaches for A*0201 [47]. In that analysis, 3 algorithms hosted by the Immune Epitope Database (IEDB) [6365] were evaluated in cross-validation, and 16 other publicly available algorithms were evaluated directly by scoring a library of over 3000 peptides whose capacity to bind A*0201 was known. The performance of each method was then evaluated using receiver operator curves (ROC), and calculating the area under the curve (AUC). The performance of the 16 directly evaluated algorithms, measured by AUC, ranged from a best of 0.935 to 0.788 (see [47] and Table 1). Overall, the average performance was 0.864, with a median score of 0.871.

Table 1 Performance of several methods for predicting A*0201 binders.

To gauge the relative performance of the combinatorial approach, each of the 9-mer mixtures was tested for its capacity to inhibit the binding of a high affinity radiolabeled ligand to purified A*0201 molecules. The measured IC50 nM values for each mixture are shown in Additional file 1 [see Additional file 1]. IC50 nM values were then normalized as described in the Materials and Methods. The resulting A*0201 matrix (Table 2) was used to score the same 3089 9-mer peptides used above. Good agreement between predicted IC50 and measured IC50 was noted (r2 = 0.53), and an AUC of 0.909 was measured (Table 1 and Figure 1). This performance is above both the average and median found for 10 of the 16 other algorithms available on the Internet. This performance is notable in that the combinatorial library utilizes only 180 data points as the training set. By contrast, the top performing ARB, ANN and SMM-based algorithms were developed using a training set with over 10-fold more data points.

Figure 1
figure 1_22

Positional scanning combinatorial library based predictions for HLA A*0201. Scatter plot depicting the relationship between the predicted score generated from the A*0201 matrix and measured IC50 nM values for 3089 9-mer peptides. Binding assays were performed as described in the materials and methods for peptides previously [47] utilized to compare various publicly available prediction tools. Peptides were scored using the matrix as described in the text.

Table 2 Positional scanning combinatorial library derived matrix describing 9-mer binding to HLA A*0201.

While information of training set size is unavailable for the remaining algorithms, given the general availability of A*0201 binding data, it is reasonable to assume that these algorithms have utilized training sets of similar size. Furthermore, the training set for the combinatorial approach does not overlap with the test set, and is thus completely unbiased, unlike the case for most of the tools utilized in the comparison [47]. Taken together, the data presented in this section have provided further demonstration of the efficacy of using a positional scanning combinatorial library for identification of MHC class I binding peptides.

Generation and validation of positional scanning combinatorial library matrices for additional human and murine class I alleles

Encouraged by the results obtained in the context of A*0201, we derived, and make herein available to the scientific community, combinatorial library matrices for an additional 14 HLA (A*3001, A*3201, A*6802, B*0702, B*0801, B*1501, B*1503, B*2705, B*3501, B*5101, B*5301, B*5401, B*5801 and B*5802) and 4 mouse (H-2 Dd, Kd, Db and Kk) class I molecules. The measured IC50 values are provided in Additional file 1 [see Additional file 1], and will also be submitted to the Immune Epitope Database (IEDB) for hosting at the IEDB Analysis resource [66]. The IC50 values for each mixture were normalized as described above. The resulting matrix values are tabulated in Additional file 2 [see Additional file 2]. For each allele, the matrices identified a reproducible, characteristic, binding pattern.

Identification of primary and secondary anchor positions by the positional scanning combinatorial library approach

To compare the results of combinatorial matrices with those from pool sequencing and single residue substitutions, and in order to meaningfully summarize the rather large amount of data in each scoring matrix, it is desirable to describe MHC binding in terms of simple motifs. The first step in defining such a motif is identifying the peptide positions that have the strongest influence on binding.

As before, A*0201 was first utilized as a model system. A*0201 binds peptides utilizing the peptide residues in position 2 and at the C-terminus as main anchors. At both main anchor positions hydrophobic or aliphatic residues are preferred or tolerated. Additional influences on binding capacity are contributed by residues at secondary positions, most prominently positions 1, 3, and 7, where both positive and deleterious influences can be noted [9].

To derive an objective criteria useful for identifying primary anchor positions, we reasoned that positions associated with the highest specificity would be associated with the lowest average affinity, as the majority of residues would not be tolerated. Accordingly, we first calculated the average relative binding affinity (ARB) for each position, representing the geometric mean of the values at each position (see Table 2), normalized to the average for the entire library. We have denominated this ratio as the specificity factor (SF). Analysis of the A*0201 data revealed that a SF value of ≥ 2.4 could demarcate only position 2 and the C-terminus as the main anchor positions (Table 3). To test the general applicability of the SF method for identifying primary anchors, we generated SF values for each of the 18 other alleles for which we have derived combinatorial matrices (Table 3). For 13 of the 18 additional alleles examined (72%), a SF > 2.4 identified all main anchor positions identified by pool sequencing or other motif analyses. Overall, 33/38 (87%) anchor positions identified previously were identified using this criteria. Notable exceptions were A*3001 and B*0801. Using a higher or lower threshold resulted in lower correspondence with previously described motifs.

Table 3 Specificity factors derived from positional scanning combinatorial library matrices identify primary anchor positions.

To define secondary anchor positions a different approach was utilized. We reasoned that positions associated with the highest standard deviation (SD) between residues would correspond to those most affected positively or negatively by secondary binding effects. Again analyzing the A*0201 data, it was found that a SD > 3 could successfully identify positions previously described as secondary anchors (Figure 2). Using a higher or lower threshold resulted in lower correspondence with previously described motifs. Applying this criterion to the other alleles, one or more dominant secondary anchor positions could be identified for most alleles (Figure 2). In the majority of these cases, the dominant secondary anchors were found to be in positions 1, 3 and 7. This pattern of secondary interactions is largely in agreement with a previous analysis [67].

Figure 2
figure 2_22

Primary and secondary anchor positions for 19 HLA and H-2 class I alleles defined using positional scanning combinatorial peptide library matrices. Maps of primary and secondary anchor positions as defined using the combinatorial library data. Primary anchors (blue shading) were identified using specificity factors (SF), as secondary anchor positions (green shading) were determined on the basis of standard deviation (SD), as described in the text.

Validation of the positional scanning combinatorial library approach: identifying primary anchor preferences

We next examined how well the preference patterns at the identified main anchor position agreed with those identified by pool sequencing or peptide library methods. Again starting with the well characterized A*0201 allele, we found that residues with relative affinities within about 10-fold of the optimal residue correspond to those previously described as preferred at the position 2 and C-terminal main anchors (Tables 2 and 4). More specifically, at position 2, using a 10-fold threshold identified L, M and Q as the most preferred residues. By contrast, the pool sequencing approach identified L and M. Similarly, at the C-terminus, V, I, L and A were identified as the preferred residues by the combinatorial libraries, compared to V and L from pool sequencing. In this respect, the preferences identified by the 10-fold criteria for combinatorial libraries are about mid-way between the more stringent motif defined by pool sequencing analyses, and the extended motif identified by peptide screening approaches [16].

Table 4 Comparison of main anchor motifs identified using positional scanning combinatorial libraries with those using other approaches.

The same criteria were then applied to the set of 18 additional alleles. Again, the patterns identified by the combinatorial libraries largely followed those previously described (Table 4). As was the case with A*0201, the 10-fold criteria applied to the combinatorial library data tended to identify a broader motif than identified by pool sequencing. However, when a more stringent threshold (e.g., 5-fold) is utilized, a narrower motif very similar to that described by pool sequencing is identified.

This analysis revealed several unexpected designations. The identification of position 3 as a main anchor for A*3001 binding, instead of position 2, is in disagreement with the published literature, but was not entirely unexpected based on analyses using single amino acid substitution peptides (Sidney and Sette, unpublished observations). The preference in position 2, identified as a dominant secondary anchor here, appears to be more towards small residues (V, T, and A) rather than aromatics, as indicated by pool sequencing, although these latter residues are still well tolerated. The preference at position 3 was found to be for basic residues. Pool sequencing had suggested a preference for hydrophobic residues at the C-terminus. While the combinatorial library generated motif is not in disagreement with this general specificity, the identification of an A3-supertype like preference for K was unexpected. However, subsequent MHC peptide binding studies by others (Harndahl and Buus et al, IEDB submission 1000945, [63]) and us (Sidney and Sette, unpublished observations) have confirmed this preference. Positions 1 and 2 were found to be dominant secondary anchors. This observation, in consideration with the discrepancy identified between the pool sequencing and positional scanning combinatorial libraries, suggests that A*3001 may be able to bind peptides using multiple different anchor arrangements.

In other cases, specifically B*1501, B*5802 and B*2705, no clear anchors were defined at either the N- or C-terminal end, where a more diffuse chemical specificity is apparent. A similar failure to identify dominant signals at more than one anchor residue has also been noted for several alleles when using pool sequencing methods (see e.g., [68]).

A detailed motif for A*3201 has not, to our knowledge, been previously made available. It has been suggested that this allele would be a member of the A1-supertype, and the motif identified herein is congruent with that association. However, peptide binding studies have not yet been able to confirm that this allele shares significant repertoire overlap with other alleles of this supertype.

Application of positional scanning combinatorial libraries: Predicting MHC binding peptides

The performance of combinatorial library matrices was further evaluated with 5 selected alleles (A*3001, A*3201, B*0801, B*1501 and B*1503). Each of these alleles is relatively common in the human population, but large sets of high affinity binding peptides are not, to our knowledge, currently available. Using these matrices, all 9-mer sequences from vaccinia Western Reserve (WR) strain were scored utilizing the product of the matrix value for the corresponding residue at each position. For each allele, the top 300 scoring 9-mers peptides, corresponding to approximately the top 0.5%, were synthesized and tested for binding. The binding data are summarized in Additional file 3 [see Additional file 3]. It was found that on average 68% of the peptides selected bound their corresponding allele with an affinity of 100 nM or better, with a minimum of 58% in the case of B*1501, and a maximum of 78% in the case of A*3001. By comparison, in the cases of A*3001, B*0801 and B*1501, where binding data was available for sets of peptides with poorer scores, it was found that peptides with scores equivalent to those in the lower 50% range were only rarely binders, with rates of binding in the 1 to 5% range (Figure 3).

Figure 3
figure 3_22

Efficacy of positional scanning combinatorial library based predictions for 3 HLA class I alleles. The percent of peptides scoring within a specified percentile range that bind A*3001, A*3201 or B*1501. Peptides were scored using the corresponding combinatorial library matrix. Peptides were then assigned a percentile score indexed to the percentile associated with 9-mer peptides derived from vaccinia with the same matrix score. About 60,000 9-mers derived from the vaccinia WR sequence were scored to develop the indices.

Taken together, these data further validate the use of combinatorial libraries as a basis for predictive algorithms. Also, the present analysis has provided sets of high affinity binders derived from vaccinia WR for 5 relatively common HLA class I alleles.

Application of positional scanning combinatorial libraries: Predicting T cell epitope candidates

We initially wanted to test the sets of high affinity peptides identified from vaccinia virus in DryVax immunized donors, similar to a previous investigation with donors carrying HLA alleles from common supertypes [69]. However, at the time of this study, we were unable to enroll a large enough number of newly vaccinated donors with the desired matching HLA alleles. We instead decided to validate the ability of combinatorial libraries to aid in the identification of T cell epitopes from influenza recognized in human donors for which we could enroll multiple donors with the HLA alleles A*3001, A*3201, and B*1501.

To make optimal use of both the combinatorial library data and the individual peptide binding data for these alleles, we utilized the SMM (stabilized matrix method) approach [56], which can combine these data to compute second generation matrices. These second generation matrices have been found to perform better than predictions based on either approach alone. All 9-mer peptides present in a representative set of Influenza A H1N1 and H3N2 strains were scored using the second generation matrices for A*3001, A*3201, and B*1501 as shown in Additional file 4 [see Additional file 4], and for each allele the top 100 scoring peptides were synthesized.

The predicted high-affinity-binding peptides were tested for their ability to elicit T cell responses from human donors with matching HLA. PBMCs from donors were isolated from leukopherisis or general blood donation volunteers, and HLA-typed by high-resolution PCR. In total, 13 healthy donors of 25–49 years of age were included in the study, including 3 donors for A*3001 and A*3201, and 8 for B*1501. Cryopreserved PBMCs were assayed with individual peptides from the set(s) corresponding to the donor's haplotype, and the reactivity was determined using IFNγ ELISPOT assays. Positive epitopes were defined as described in the Methods.

From these experiments, epitopes were successfully identified for each allele (Table 5). Specifically, 2, 1, and 13 epitopes were identified in patients typed as A*3001, A*3201 and B*1501, respectively. However, no peptide was recognized in more than one donor. To the best of our knowledge, each of these represent novel epitopes, and together are the first set of influenza virus derived epitopes based on predictions for A*3001, A*3201 and B*1501.

Table 5 Influenza epitopes.


Because peptide binding to MHC is a requirement to elicit a T cell response, algorithm-based approaches predicting peptide binding are often utilized as a first screen to identify epitopes derived from large pathogens. In the present study, we have utilized 9-mer positional scanning combinatorial libraries to characterize the peptide binding specificities of several mouse and human class I alleles. When the corresponding positional scanning combinatorial library data were utilized to generate matrices for the prediction of binders derived from vaccinia, it was found that in all cases examined between 58 and 78% of the top 0.5% scoring peptides were high affinity binders, depending on the specific allele considered. The biological relevance of quantitative motifs derived from combinatorial library analyses were validated by identifying several epitopes derived from influenza A virus that were recognized by PBMCs from human donors. This study therefore provides a set of 19 uniformly generated matrices that can be directly applied to predict MHC peptide binding and T cell epitope candidates.

An implicit feature of the approach is that it provides a detailed quantitative motif for each MHC specificity examined. However, it is often useful to summarize MHC binding specificity in the more simple terms of primary anchor motifs. This "minimalist" approach dates back to the earliest studies of MHC binding, where specificities were defined using pool sequencing or single amino acid substitution analyses. These methods were very good at characterizing the most prominent features of allele specific motifs, and the resulting motifs have generally formed the syntax with which MHC binding is described. To extend the utility of the combinatorial approach, we have developed a heuristic approach to translate the matrix data generated by the combinatorial libraries into the more simple motifs that are the idiom of MHC studies. In the majority of cases, generalizable parameters could be defined that allowed the identification of main and secondary anchor positions congruent with those defined by other approaches.

The majority of HLA class I molecules whose binding specificity have been described by crystal structure, pool sequencing or peptide binding studies, the main anchor interactions of the peptide almost invariably involve the residues at position 2 and the C-terminus of the peptide. This pattern also appears to be true for most macaque and chimpanzee class I alleles studied to date. As evidenced by the cases of A*3001 and B*0801, the combinatorial library analysis suggests that the paradigm of position 2/C-terminus anchor spacing for MHC peptide binding is not always true. This has been reported previously in the case of B*0801 [70], where positions 3 and 5, in addition to the C-terminus, have been identified as primary anchors. Although this exact pattern was not duplicated by the combinatorial analysis, the present data do confirm the importance of positively charged residues in the middle of the peptide for conferring high affinity binding capacity. The ability to pick up unexpected binding patterns of MHC alleles is one of the key advantages of the combinatorial libraries, which have no prior expectations on which positions are likely to be important for MHC:peptide interactions.

In our previous HLA supertype classification study [71], B*0801 was considered an outlier on the basis of it's somewhat unique, for HLA, use of positions 3 and 5 as main anchor positions. This designation was also made by Lund [72] and Hertz [73]. Others [74, 75] have classified it with alleles we [71] and others [72, 73] have assigned as members of the B7-supertype. In the present study, the combinatorial library analysis suggested that positions 5 and 6 are important (in addition to the C-terminus) for peptide binding. As such, the present analysis does not provide sufficient evidence to suggest that B*0801 should be assigned to a specific supertype, which are largely defined by position 2 (and C-terminus) specificity. We can note that proline, the B7-supertype associated position 2 specificity, does appear to be well tolerated in position 2 by B*0801. Also, our own unpublished binding data suggests that there may be some cross-reactivity between B*0702 and B*0801. However, this potential cross-reactivity has not been examined in enough detail at this point to draw any conclusions.

Previously, B*3501, B*5101, B*5301, and B*5401 were assigned to the B7-supertype [7173], which describes a set of HLA alleles sharing a preference for proline as the position 2 main anchor. In the present study, the combinatorial library analysis confirmed this preference, but, surprisingly, also indicated that alanine was well tolerated by these alleles in position 2. This would suggest that overlap may exist in the repertoires of at least some B7-supertype alleles with alleles outside the B7-supertype, and in particular those associated with the B58- and or B62-supertypes. While the binding data we have to date suggest that the majority of instances of repertoire overlap for B7-supertype alleles will fall within the B7-supertype, and that proline is the most dominant preference in position 2, evidence for some cross-reactivity is also quite apparent. Indeed, a recent study [76] has found a high degree of cross-recognition of epitopes between alleles associated with different supertypes. Future studies will hopefully shed additional light on this issue.

In utilizing combinatorial libraries to characterize MHC specificity and identify binders, the approach we have implemented is computationally simple. We have largely utilized relative binding values for each residue/position coordinate. To predict binders, we have assumed the independent binding of peptide side chains, and represented the predicted binding propensity as a product of each coordinate. There are other ways to process the raw data for the purpose of generating prediction matrices, or to define anchor positions. To facilitate further investigation of prediction approaches by the bioinformatic community, we have here provided both the raw and processed data for over a dozen different HLA, and 4 H-2, class I alleles. We believe that this data will be of value to the community for the prediction of binders and epitopes, at least for several alleles not previously characterized in detail.

We compared the prediction performance of the combinatorial library with a set of 16 bioinformatic approaches for the best characterized human MHC allele, HLA A*0201. While several algorithms outperformed the combinatorial library, this has to be taken into perspective, as these algorithms are based on up to ten times more training data. Even more surprising, the combinatorial library nevertheless proved highly competitive, with a better prediction quality than 10 out of 16 algorithms. Taken together, the combinatorial libraries minimally provide a very solid baseline characterization of MHC binding specificity, which can be generated both quickly and with cost effectiveness.

Using combinatorial library based matrices to identify sets of candidate peptides, epitopes were successfully identified in patients typed as A*3001, A*3201 and B*1501. Notably, however, no peptide was recognized in more than one donor. This diversity of responses is similar to what was noted previously for mapping T cell responses to vaccinia-derived peptides in human donors. Because of the small number of A*3001 and A*3201 donors tested, it is possible we have under estimated the number of positive responses. Other factors may also be responsible for the lower response rates observed. The donor pool utilized represents an outbred population, and almost all of the donors were heterozygous at both the A and B loci. Thus, the diverse donor responses may reflect the different influences of other MHC alleles in shaping the overall T cell repertoire. Similarly, that dominant epitopes recognized in multiple donors were not identified may be due to the fact that the set of donors is representative of diverse histories of exposure to different viral strains.

The epitope identification aspect of the study was not pursued to the level and detail of our previous studies (e.g., [69]). Several factors are responsible for this, including the fact that while the alleles studied are not rare, neither are they prevalent, making it resource intensive to identify a sufficient number of additional donors. As a result, the identified peptides represent potential leads of a preliminary nature. At the same time, the data does help demonstrate that the matrices derived in the study are useful for epitope identification, even if the epitope identification study was not ideal. Furthermore, to the best of our knowledge, each of the epitopes identified in the present study represent novel epitopes, and together are the first set of influenza virus derived epitopes based on predictions for A*3001, A*3201 and B*1501.


The present study has extended observations from previous studies [32, 44, 46, 48, 49] showing the usefulness of positional scanning combinatorial libraries for identifying MHC class I binding peptides. Herein we have made available combinatorial library based matrices for 19 class I alleles of human and mouse origin, including several that have not previously been characterized in detail. These libraries have also been shown to be useful for identifying specific primary and secondary anchor positions, and thereby simpler motifs, analogous to those described by other approaches. For A*3001, A*3201, B*0801, B*1501 and B*1503, sets of vaccinia WR derived peptides that bind with high affinity have been identified. These peptides represent candidates for future studies towards the identification of epitopes derived from vaccinia, a virus of high interest for the development of viral vector based vaccines, in addition to its well-known use as a vaccine against smallpox. Finally, we have also identified several epitopes derived from influenza that are recognized in HLA A*3001, A*3201 and B*1501 donors.


  1. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 1999, 50:213–9.

    Article  CAS  PubMed  Google Scholar 

  2. Gotch F, Rothbard J, Howland K, Townsend A, McMichael A: Cytotoxic T lymphocytes recognize a fragment of influenza virus matrix protein in association with HLA-A2. Nature 1987, 326:881–2.

    Article  CAS  PubMed  Google Scholar 

  3. Falk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG: Allele-specific motifs revealed by sequencing of self-peptides eluted from MHC molecules. Nature 1991, 351:290–6.

    Article  CAS  PubMed  Google Scholar 

  4. Pasquetto V, Bui HH, Giannino R, Mirza F, Sidney J, Oseroff C, Tscharke DC, Irvine K, Bennink JR, Peters B, et al.: HLA-A* HLA-A*1101, and HLA-B*0702 Transgenic Mice Recognize Numerous Poxvirus Determinants from a Wide Variety of Viral Gene Products. J Immunol 0201, 175:5504–15.

    Google Scholar 

  5. Allen TM, Sidney J, del Guercio MF, Glickman RL, Lensmeyer GL, Wiebe DA, DeMars R, Pauza CD, Johnson RP, Sette A, et al.: Characterization of the peptide binding motif of a rhesus MHC class I molecule (Mamu-A*01) that binds an immunodominant CTL epitope from simian immunodeficiency virus. J Immunol 1998, 160:6062–71.

    CAS  PubMed  Google Scholar 

  6. Loffredo JT, Sidney J, Piaskowski S, Szymanski A, Furlott J, Rudersdorf R, Reed J, Peters B, Hickman-Miller HD, Bardet W, et al.: The High Frequency Indian Rhesus Macaque MHC Class I Molecule, Mamu-B*01, Does Not Appear to Be Involved in CD8+ T Lymphocyte Responses to SIVmac239. J Immunol 2005, 175:5986–97.

    CAS  PubMed  Google Scholar 

  7. Loffredo JT, Sidney J, Wojewoda C, Dodds E, Reynolds MR, Napoe G, Mothe BR, O'Connor DH, Wilson NA, Watkins DI, et al.: Identification of seventeen new simian immunodeficiency virus-derived CD8+ T cell epitopes restricted by the high frequency molecule, Mamu-A*02, and potential escape from CTL recognition. J Immunol 2004, 173:5064–76.

    CAS  PubMed  Google Scholar 

  8. Mothe BR, Sidney J, Dzuris JL, Liebl ME, Fuenger S, Watkins DI, Sette A: Characterization of the peptide-binding specificity of Mamu-B*17 and identification of Mamu-B*17-restricted epitopes derived from simian immunodeficiency virus proteins. J Immunol 2002, 169:210–9.

    CAS  PubMed  Google Scholar 

  9. Ruppert J, Sidney J, Celis E, Kubo RT, Grey HM, Sette A: Prominent role of secondary anchor residues in peptide binding to HLA-A2.1 molecules. Cell 1993, 74:929–37.

    Article  CAS  PubMed  Google Scholar 

  10. Sette A, Sidney J, Bui HH, Del Guercio MF, Alexander J, Loffredo J, Watkins DI, Mothe BR: Characterization of the peptide-binding specificity of Mamu-A*11 results in the identification of SIV-derived epitopes and interspecies cross-reactivity. Immunogenetics 2005, 57:53–68.

    Article  CAS  PubMed  Google Scholar 

  11. Sidney J, del Guercio MF, Southwood S, Hermanson G, Maewal A, Appella E, Sette A: The HLA-A*0207 peptide binding repertoire is limited to a subset of the A*0201 repertoire. Hum Immunol 1997, 58:12–20.

    Article  CAS  PubMed  Google Scholar 

  12. Sidney J, Del Guercio MF, Southwood S, Sette A: The HLA Molecules DQA1*0501/B1*0201 and DQA1*0301/B1*0302 Share an Extensive Overlap in Peptide Binding Specificity. J Immunol 2002, 169:5098–108.

    PubMed  Google Scholar 

  13. Sidney J, Dzuris JL, Newman MJ, Johnson RP, Kaur A, Amitinder K, Walker CM, Appella E, Mothe B, Watkins DI, et al.: Definition of the Mamu A*01 peptide binding specificity: application to the identification of wild-type and optimized ligands from simian immunodeficiency virus regulatory proteins. J Immunol 2000, 165:6387–99.

    CAS  PubMed  Google Scholar 

  14. Sidney J, Grey HM, Southwood S, Celis E, Wentworth PA, del Guercio MF, Kubo RT, Chesnut RW, Sette A: Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum Immunol 1996, 45:79–93.

    Article  CAS  PubMed  Google Scholar 

  15. Sidney J, Southwood S, del Guercio MF, Grey HM, Chesnut RW, Kubo RT, Sette A: Specificity and degeneracy in peptide binding to HLA-B7-like class I molecules. J Immunol 1996, 157:3480–90.

    CAS  PubMed  Google Scholar 

  16. Sidney J, Southwood S, Mann DL, Fernandez-Vina MA, Newman MJ, Sette A: Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertype molecules. Hum Immunol 2001, 62:1200–16.

    Article  CAS  PubMed  Google Scholar 

  17. Sidney J, Southwood S, Pasquetto V, Sette A: Simultaneous prediction of binding capacity for multiple molecules of the HLA B44-supertype. J Immunol 2003, 171:5964–5974.

    CAS  PubMed  Google Scholar 

  18. Sidney J, Southwood S, Sette A: Classification of A1- and A24-supertype molecules by analysis of their MHC-peptide binding repertoires. Immunogenetics 2005, 57:393–408.

    Article  CAS  PubMed  Google Scholar 

  19. Kubo RT, Sette A, Grey HM, Appella E, Sakaguchi K, Zhu NZ, Arnott D, Sherman N, Shabanowitz J, Michel H, et al.: Definition of specific peptide motifs for four major HLA-A alleles. J Immunol 1994, 152:3913–24.

    CAS  PubMed  Google Scholar 

  20. Donnes P, Elofsson A: Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 2002, 3:25.

    Article  PubMed  Google Scholar 

  21. Reche PA, Glutting JP, Zhang H, Reinherz EL: Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics 2004, 56:405–19.

    Article  CAS  PubMed  Google Scholar 

  22. Schueler-Furman O, Altuvia Y, Sette A, Margalit H: Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci 2000, 9:1838–46.

    Article  CAS  PubMed  Google Scholar 

  23. Zhang GL, Srinivasan KN, Veeramani A, August JT, Brusic V: PREDBALB/c: a system for the prediction of peptide binding to H2d molecules, a haplotype of the BALB/c mouse. Nucleic Acids Res 2005, 33:W180–3.

    Article  CAS  PubMed  Google Scholar 

  24. Hertz T, Yanover C: PepDist: a new framework for protein-peptide binding prediction based on learning peptide distance functions. BMC Bioinformatics 2006,7(Suppl 1):S3.

    Article  PubMed  Google Scholar 

  25. Buus S, Lauemoller SL, Worning P, Kesmir C, Frimurer T, Corbet S, Fomsgaard A, Hilden J, Holm A, Brunak S: Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach. Tissue Antigens 2003, 62:378–84.

    Article  CAS  PubMed  Google Scholar 

  26. Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 2004, 20:1388–97.

    Article  CAS  PubMed  Google Scholar 

  27. Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, Brunak S, Lund O: Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci 2003, 12:1007–17.

    Article  CAS  PubMed  Google Scholar 

  28. Zhang GL, Khan AM, Srinivasan KN, August JT, Brusic V: MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res 2005, 33:W172–9.

    Article  CAS  PubMed  Google Scholar 

  29. Guan P, Doytchinova IA, Zygouri C, Flower DR: MHCPred: bringing a quantitative dimension to the online prediction of MHC binding. Appl Bioinformatics 2003, 2:63–6.

    CAS  PubMed  Google Scholar 

  30. Tenzer S, Peters B, Bulik S, Schoor O, Lemmel C, Schatz MM, Kloetzel PM, Rammensee HG, Schild H, Holzhutter HG: Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding. Cell Mol Life Sci 2005, 62:1025–37.

    Article  CAS  PubMed  Google Scholar 

  31. Hakenberg J, Nussbaum AK, Schild H, Rammensee HG, Kuttler C, Holzhutter HG, Kloetzel PM, Kaufmann SH, Mollenkopf HJ: MAPPP: MHC class I antigenic peptide processing prediction. Appl Bioinformatics 2003, 2:155–8.

    CAS  PubMed  Google Scholar 

  32. Udaka K, Wiesmuller KH, Kienle S, Jung G, Tamamura H, Yamagishi H, Okumura K, Walden P, Suto T, Kawasaki T: An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries. Immunogenetics 2000, 51:816–28.

    Article  CAS  PubMed  Google Scholar 

  33. Sathiamurthy M, Hickman HD, Cavett JW, Zahoor A, Prilliman K, Metcalf S, Fernandez Vina M, Hildebrand WH: Population of the HLA ligand database. Tissue Antigens 2003, 61:12–9.

    Article  CAS  PubMed  Google Scholar 

  34. Parker KC, Bednarek MA, Coligan JE: Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol 1994, 152:163–75.

    CAS  PubMed  Google Scholar 

  35. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 2005, 57:304–14.

    Article  CAS  PubMed  Google Scholar 

  36. Gulukota K, Sidney J, Sette A, DeLisi C: Two complementary methods for predicting peptides binding major histocompatibility complex molecules. J Mol Biol 1997, 267:1258–67.

    Article  CAS  PubMed  Google Scholar 

  37. Peters B, Bui HH, Sidney J, Weng Z, Loffredo JT, Watkins DI, Mothe BR, Sette A: A computational resource for the prediction of peptide binding to Indian rhesus macaque MHC class I molecules. Vaccine 2005, 23:5212–5224.

    Article  CAS  PubMed  Google Scholar 

  38. Peters B, Tong W, Sidney J, Sette A, Weng Z: Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules. Bioinformatics 2003, 19:1765–1772.

    Article  CAS  PubMed  Google Scholar 

  39. Pinilla C, Martin R, Gran B, Appel JR, Boggiano C, Wilson DB, Houghten RA: Exploring immunological specificity using synthetic peptide combinatorial libraries. Curr Opin Immunol 1999, 11:193–202.

    Article  CAS  PubMed  Google Scholar 

  40. Nazif T, Bogyo M: Global analysis of proteasomal substrate specificity using positional-scanning libraries of covalent inhibitors. Proc Natl Acad Sci USA 2001, 98:2967–72.

    Article  CAS  PubMed  Google Scholar 

  41. Uebel S, Kraas W, Kienle S, Wiesmuller KH, Jung G, Tampe R: Recognition principle of the TAP transporter disclosed by combinatorial peptide libraries. Proc Natl Acad Sci USA 1997, 94:8976–81.

    Article  CAS  PubMed  Google Scholar 

  42. Nino-Vasquez JJ, Allicotti G, Borras E, Wilson DB, Valmori D, Simon R, Martin R, Pinilla C: A powerful combination: the use of positional scanning libraries and biometrical analysis to identify cross-reactive T cell epitopes. Mol Immunol 2004, 40:1063–74.

    Article  CAS  PubMed  Google Scholar 

  43. Zhao Y, Gran B, Pinilla C, Markovic-Plese S, Hemmer B, Tzou A, Whitney LW, Biddison WE, Martin R, Simon R: Combinatorial peptide libraries and biometric score matrices permit the quantitative analysis of specific and degenerate interactions between clonotypic TCR and MHC peptide ligands. J Immunol 2001, 167:2130–41.

    CAS  PubMed  Google Scholar 

  44. Stryhn A, Pedersen LO, Romme T, Holm CB, Holm A, Buus S: Peptide binding specificity of major histocompatibility complex class I resolved into an array of apparently independent subspecificities: quantitation by peptide libraries and improved prediction of binding. Eur J Immunol 1996, 26:1911–8.

    Article  CAS  PubMed  Google Scholar 

  45. Udaka K, Wiesmuller KH, Kienle S, Jung G, Walden P: Decrypting the structure of major histocompatibility complex class I-restricted cytotoxic T lymphocyte epitopes with complex peptide libraries. J Exp Med 1995, 181:2097–108.

    Article  CAS  PubMed  Google Scholar 

  46. Lauemoller SL, Holm A, Hilden J, Brunak S, Holst Nissen M, Stryhn A, Ostergaard Pedersen L, Buus S: Quantitative predictions of peptide binding to MHC class I molecules using specificity matrices and anchor-stratified calibrations. Tissue Antigens 2001, 57:405–14.

    Article  CAS  PubMed  Google Scholar 

  47. Peters B, Bui HH, Frankild S, Nielson M, Lundegaard C, Kostem E, Basch D, Lamberth K, Harndahl M, Fleri W, et al.: A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2006, 2:e65.

    Article  PubMed  Google Scholar 

  48. Sidney J, Peters B, Moore C, Pencille TJ, Ngo S, Masterman KA, Asabe S, Pinilla C, Chisari FV, Sette A: Characterization of the peptide-binding specificity of the chimpanzee class I alleles A*0301 and A*0401 using a combinatorial peptide library. Immunogenetics 2007.

  49. Lauemoller SL, Kesmir C, Corbet SL, Fomsgaard A, Holm A, Claesson MH, Brunak S, Buus S: Identifying cytotoxic T cell epitopes from genomic and proteomic information: "The human MHC project.". Rev Immunogenet 2000, 2:477–91.

    CAS  PubMed  Google Scholar 

  50. Buus S: Description and prediction of peptide-MHC binding: the 'human MHC project'. Curr Opin Immunol 1999, 11:209–13.

    Article  CAS  PubMed  Google Scholar 

  51. Pinilla C, Appel JR, Blanc P, Houghten RA: Rapid identification of high affinity peptide ligands using positional scanning synthetic peptide combinatorial libraries. Biotechniques 1992, 13:901–5.

    CAS  PubMed  Google Scholar 

  52. Sidney J, Southwood S, Oseroff C, Del Guercio MF, Sette A, Grey H: Measurement of MHC/Peptide Interactions by Gel Filtration. Current Protocols in Immunology John Wiley & Sons, Inc 1998, 18.3.1–18.3.19.

  53. Sette A, Sidney J, Livingston B, Dzuris J, Crimi C, Walker CM, Southwood S, Collins EJ, Hughes A: Class I molecules with similar peptide binding specificities are the result of both common ancestry and convergent evolution. Immunogenetics 2003, 54:830–841.

    CAS  PubMed  Google Scholar 

  54. van der Most RG, Concepcion RJ, Oseroff C, Alexander J, Southwood S, Sidney J, Chesnut RW, Ahmed R, Sette A: Uncovering subdominant cytotoxic T-lymphocyte responses in lymphocytic choriomeningitis virus-infected BALB/c mice. J Virol 1997, 71:5110–4.

    CAS  PubMed  Google Scholar 

  55. van der Most RG, Sette A, Oseroff C, Alexander J, Murali-Krishna K, Lau LL, Southwood S, Sidney J, Chesnut RW, Matloubian M, et al.: Analysis of cytotoxic T cell responses to dominant and subdominant epitopes during acute and chronic lymphocytic choriomeningitis virus infection. J Immunol 1996, 157:5543–54.

    CAS  PubMed  Google Scholar 

  56. Peters B, Sette A: Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics 2005, 6:132.

    Article  PubMed  Google Scholar 

  57. Kanof ME, Smith PD, Zola H: Current Protocols in Immunology Wiley, San Diego 1997, 2:7.1.1–7.1.5.

    Google Scholar 

  58. Rothbard JB, Lechler RI, Howland K, Bal V, Eckels DD, Sekaly R, Long EO, Taylor WR, Lamb JR: Structural model of HLA-DR1 restricted T cell antigen recognition. Cell 1988, 52:515–23.

    Article  CAS  PubMed  Google Scholar 

  59. Currier JR, Kuta EG, Turk E, Earhart LB, Loomis-Price L, Janetzki S, Ferrari G, Birx DL, Cox JH: A panel of MHC class I restricted viral peptides for use as a quality control for vaccine trial ELISPOT assays. J Immunol Methods 2002, 260:157–72.

    Article  CAS  PubMed  Google Scholar 

  60. Tangri S, Ishioka GY, Huang X, Sidney J, Southwood S, Fikes J, Sette A: Structural features of peptide analogs of human histocompatibility leukocyte antigen class I epitopes that are more potent and immunogenic than wild-type peptide. J Exp Med 2001, 194:833–46.

    Article  CAS  PubMed  Google Scholar 

  61. Kast WM, Brandt RM, Sidney J, Drijfhout JW, Kubo RT, Grey HM, Melief CJ, Sette A: Role of HLA-A motifs in identification of potential CTL epitopes in human papillomavirus type 16 E6 and E7 proteins. J Immunol 1994, 152:3904–12.

    CAS  PubMed  Google Scholar 

  62. Parker KC, Bednarek MA, Hull LK, Utz U, Cunningham B, Zweerink HJ, Biddison WE, Coligan JE: Sequence motifs important for peptide binding to the human MHC class I molecule, HLA-A2. J Immunol 1992, 149:3580–7.

    CAS  PubMed  Google Scholar 

  63. The Immune Epitope Database and Analysis Resource []

  64. Peters B, Sette A: Integrating epitope data into the emerging web of biomedical knowledge resources. Nat Rev Immunol 2007, 7:485–90.

    Article  CAS  PubMed  Google Scholar 

  65. Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol 2005, 3:e91.

    Article  PubMed  Google Scholar 

  66. The Immune Epitope Database MHC Class I Binding Prediction Resource []

  67. Kondo A, Sidney J, Southwood S, del Guercio MF, Appella E, Sakamoto H, Celis E, Grey HM, Chesnut RW, Kubo RT, et al.: Prominent roles of secondary anchor residues in peptide binding to HLA-A24 human class I molecules. J Immunol 1995, 155:4307–12.

    CAS  PubMed  Google Scholar 

  68. Barber LD, Gillece-Castro B, Percival L, Li X, Clayberger C, Parham P: Overlap in the repertoires of peptides bound in vivo by a group of related class I HLA-B allotypes. Curr Biol 1995, 5:179–90.

    Article  CAS  PubMed  Google Scholar 

  69. Oseroff C, Kos F, Bui HH, Peters B, Pasquetto V, Glenn J, Palmore T, Sidney J, Tscharke DC, Bennink JR, et al.: HLA class I-restricted responses to vaccinia recognize a broad array of proteins mainly involved in virulence and viral gene regulation. Proc Natl Acad Sci USA 2005, 102:13980–5.

    Article  CAS  PubMed  Google Scholar 

  70. DiBrino M, Parker KC, Shiloach J, Turner RV, Tsuchida T, Garfield M, Biddison WE, Coligan JE: Endogenous peptides with distinct amino acid anchor residue motifs bind to HLA-A1 and HLA-B8. J Immunol 1994, 152:620–31.

    CAS  PubMed  Google Scholar 

  71. Sette A, Sidney J: Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics 1999, 50:201–12.

    Article  CAS  PubMed  Google Scholar 

  72. Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G, Justesen S, et al.: Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics 2004, 55:797–810.

    Article  CAS  PubMed  Google Scholar 

  73. Hertz T, Yanover C: Identifying HLA supertypes by learning distance functions. Bioinformatics 2007, 23:e148–55.

    Article  CAS  PubMed  Google Scholar 

  74. Tong JC, Tan TW, Ranganathan S: In silico grouping of peptide/HLA class I complexes using structural interaction characteristics. Bioinformatics 2007, 23:177–83.

    Article  CAS  PubMed  Google Scholar 

  75. Doytchinova IA, Guan P, Flower DR: Identifiying human MHC supertypes using bioinformatic methods. J Immunol 2004, 172:4314–23.

    CAS  PubMed  Google Scholar 

  76. Frahm N, Yusim K, Suscovich TJ, Adams S, Sidney J, Hraber P, Hewitt HS, Linde CH, Kavanagh DG, Woodberry T, et al.: Extensive HLA class I allele promiscuity among viral CTL epitopes. Eur J Immunol 2007, 37:2419–33.

    Article  CAS  PubMed  Google Scholar 

Download references


The experiments described herein comply with the current laws of the United States of America. This work is supported by NIH NIAID contracts N01-AI-40023 (AS), N01-AI-40024 (AS), HHSN266200400006C (AS), and HHSN266200400080C/N01 AI40080 (CP). EA was supported by the Wenner-Gren Foundations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Bjoern Peters.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

JS performed the MHC binding data analyses and drafted the manuscript. EA performed all cellular immunology assays and associated data analyses, and assisted preparing the manuscript. AS and BP participated in the conceptualization and design of the study, provided interpretation of the data, and assisted in the preparation of the manuscript. CM and SN performed all MHC binding assays and MHC purification. CP participated in conceptualization of the study, designed and provided the positional scanning combinatorial libraries, and helped to draft the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Binding capacity of positional scanning combinatorial libraries for 19 HLA and H-2 class I alleles. This table lists the measured IC50 nM value of each mixture in the combinatorial library for 19 class I alleles. Binding assays were performed as described in the Methods section. (XLS 96 KB)


Additional file 2: Normalized average relative binding affinities of positional scanning combinatorial libraries for 19 HLA and H-2 class I alleles. This table lists the normalized ARB values of each mixture in the positional scanning combinatorial library for 19 class I alleles. Binding data for each mixture were normalized as described in the materials and methods, such that the optimal mixture (residue) at each position corresponds to 1.0, and residues associated with lower affinities with correspondingly lower values. The normalized values for A*0201 are also shown in matrix format in Table 2. (XLS 108 KB)


Additional file 3: Vaccinia WR-derived peptides that bind 5 common HLA alleles with high affinity. The vaccinia WR-derived peptides identified that bind A) A*3001, B) A*3201, C) B*0801, D) B*1501 or E) B*1503 with affinities of 500 nM, or better are shown. Candidate peptides were selected using the corresponding normalized combinatorial library matrix values [see Additional file 2], and tested for binding as described in the Methods. For each allele, a set of 300 candidate peptides was originally selected. (XLS 133 KB)


Additional file 4: Second generation matrices for predicting binders to HLA A*3001, A*3201 and B*1501. Second generation prediction matrices were derived combining the positional scanning combinatorial library and peptide library data, using the stabilized matrix method (SMM) approach, as previously described [56]. Scores for individual peptides represent log predicted IC50 nM values, and are calculated as the sum of the corresponding position/residue values, plus the constant. Algorithms derived by combining positional scanning and individual peptide data sets were generated. (XLS 30 KB)

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Sidney, J., Assarsson, E., Moore, C. et al. Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome Res 4, 2 (2008).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Major Histocompatibility Complex
  • Human Leukocyte Antigen
  • Major Histocompatibility Complex Class
  • Major Histocompatibility Complex Molecule
  • Transporter Associate With Antigen Processing