An integrated approach to epitope analysis I: Dimensional reduction, visualization and prediction of MHC binding using amino acid principal components and regression approaches

Table 3 Coefficient of variation of the mean estimate of the LN(ic50) for different alleles of human MHC-II using two different schemes for cross validation.

Allele	Training	Random 1000	Training
	9 × 67% (1)	9 × 67% (2)	9 × 50% (3)
DRB1_0101	10.4%	14.4%	17.8%
DRB1_0301	6.2%	6.2%	7.4%
DRB1_0401	9.5%	9.5%	6.6%
DRB1_0404	7.3%	22.0%	9.4%
DRB1_0405	7.9%	7.3%	9.3%
DRB1_0701	4.8%	10.0%	12.4%
DRB1_0802	7.6%	7.0%	8.5%
DRB1_0901	12.6%	9.4%	12.9%
DRB1_1101	8.3%	7.6%	10.2%
DRB1_1302	6.7%	6.6%	8.5%
DRB1_1501	10.5%	8.3%	10.4%
DRB3_0101	4.4%	4.5%	5.4%
DRB4_0101	8.6%	6.9%	9.8%
DRB5_0101	12.5%	8.9%	13.8%
Average	8.4%	9.2%	10.2%

The training dataset used was the IEDB dataset (Additional File 4; Table S3b). The random dataset consisted of 1000 15-mers drawn from the surfome and secretome of the proteome of Staphylococcus aureus COL Genbank NC_002951. (1) A random 2/3 of the data set was selected 9 times to produce 9 sets of prediction equations. Each peptide in the set was used 6 times in combination with other peptides in the training set. (2) Equations from (1) were used to predict the LN(ic50) of the random peptides. (3) As in (1) but half of the training set was used to develop the equations.