1 Introduction
Deep convolutional neural networks (DCNNs) have exhibited exceptional performance in image classification Krizhevsky et al. (2012); He et al. (2016); Huang et al. (2017)
, so they have been widely used in various realworld applications including face recognition
Sun et al. (2015), selfdriving cars Bojarski et al. (2016), biomedical image processing Bakas et al. (2018), among many others Najafabadi et al. (2015). Despite of these successes, DCNN classifiers can be easily attacked by adversarial examples with perturbations imperceptible to human vision
Szegedy et al. (2013); Goodfellow et al. (2014); Su et al. (2019). This motivates the hot research in adversarial attacks and defenses of DCNNs. See Wiyatno et al. (2019); Ren et al. (2020) for reviews.Existing adversarial attacks can be categorized into whitebox, graybox, and blackbox attacks. Adversaries in whitebox attacks have the full information of their targeted DCNN model, whereas their knowledge is limited to model structure in graybox attacks and only to model’s input and output in blackbox attacks. For instance, popular algorithms for whitebox attacks include the fast gradient sign method Goodfellow et al. (2014); Kurakin et al. (2016), the projected gradient descent method Madry et al. (2017), the Carlini and Wagner attack Carlini and Wagner (2017), among many others Szegedy et al. (2013); Papernot et al. (2016); MoosaviDezfooli et al. (2016)
. Defensive techniques for those attacks include heuristic and certificated defenses. Adversarial training is the current most successful heuristic defense approach for improving the robustness of DCNNs, which simply incorporates adversarial samples into training but has better numerical performance than certificated defenses
Ren et al. (2020).In this paper, we propose a simple yet efficient framework for whitebox adversarial image generation and training for DCNN classifiers. For generating an adversarial example of a given image, our framework provides usercustomized options in the number of perturbed pixels, misclassification probability, and targeted incorrect class. To the best of our knowledge, this is the first approach rendering all the three desirable options. The freedom to specify the number of perturbed pixels allows users to conduct attacks at various pixel levels such as onepixel Su et al. (2019) and allpixel MoosaviDezfooli et al. (2017) attacks. Particularly, we adopt a recent perturbationmanifold based firstorder influence (FI) measure Shu and Zhu (2019) to efficiently locate the most vulnerable pixels to increase the attack success rate. In contrast with traditional Euclideanspace based measures such as Jacobian norm Novak et al. (2018) and Cook’s local influence measure Cook (1986), the FI measure captures the intrinsic change of the perturbed objective function Zhu et al. (2007, 2011) and shows better performance in detecting vulnerable images and pixels. Besides, our framework allows users to specify the misclassification probability and/or the targeted incorrect class. The prespecified misclassification probability is rarely seen in existing approaches, which produce an adversarial example either near the model’s decision boundary MoosaviDezfooli et al. (2016); Nazemi and Fieguth (2019) or with unguaranteed high confidence Nguyen et al. (2015)
. We tailor different loss functions accordingly to the three desirable options and their combinations, and apply the particle swarm optimization (PSO)
Kennedy and Eberhart (1995), a fast gradientfree method, to obtain the optimal perturbation. Moreover, we observe that our perturbations with high misclassfication probability can have certain adversarial universality MoosaviDezfooli et al. (2017) to images from different classes. For adversarial training, in training data we further utilize the FI measure to identify vulnerable images and their pixels that are prone to optional targeted classes. Then using our customized generation approach yields an adversarial dataset for training. Experiments show that our adversarial training significantly improves the robustness of pretrained DCNN classifers. Figure 1 illustrates the flowchart of our framework.We notice that two recent papers Zhang et al. (2019); Mosli et al. (2019) also applied PSO to craft adversarial images. However, we have intrinsic distinctions. First, the two papers focus on blackbox attacks, but ours is whitebox. Article Zhang et al. (2019) only studied allpixel attacks; although article Mosli et al. (2019) considered fewpixel attacks, but searched in random chunks to locate the vulnerable pixels, we use the FI measure to directly discover those pixels. Moreover, targeted attacks are not considered in Mosli et al. (2019), and both papers cannot prespecify a misclassification probability for the generated adversarial example. Our framework is able to design arbitrarypixellevel, confidencespecified, and/or targeted/nontargeted attacks.
Our contributions are summarized as follows:

We propose a novel whitebox framework for adversarial image generation and training for DCNN classifiers. It provides users with multiple options in pixel levels, confidence levels, and targeted classes for adversarial attacks and defenses.

We adopt a manifoldbased FI measure to efficiently identify vulnerable images and pixels for adversarial perturbations.

We design different loss functions adaptive to usercustomized specifications and apply the PSO, a fast gradientfree optimization, to obtain optimal perturbations.

We demonstrate the effectiveness of our framework via experiments on benchmark datasets and notice that our highconfidence perturbations may have certain adversarial universality.
2 Method
2.1 PerturbationManifold Based Influence Measure
Given an input image and a DCNN classifier with parameters , the prediction probability for class is denoted by . Let
be a perturbation vector in an open set
, which can be imposed on any subvector of . Let the prediction probability under perturbation be with .For sensitivity analysis of DCNNs, Shu and Zhu Shu and Zhu (2019) recently have proposed an FI measure to delineate the ‘intrinsic’ perturbed change of the objective function on the Riemannian manifold of Zhu et al. (2007, 2011). In contrast with traditional Euclideanspace based measures such as Jacobian norm Novak et al. (2018) and Cook’s local influence measure Cook (1986), this perturbationmanifold based measure enjoys the desirable invariance property under diffeomorphic (e.g., scaling) reparameterizations of perturbations and has better numerical performance in detecting vulnerable images and pixels.
Let be an objective function of interest, for example, the crossentropy . The FI measure at is defined by
(1) 
where , with , and is the pseudoinverse of . A larger value of indicates that the DCNN model is more sensitive in to local perturbation around . We shall use the FI measure to discover vulnerable images or pixels for adversarial attacks.
2.2 Particle Swarm Optimization
Since introduced by Kennedy and Eberhart Kennedy and Eberhart (1995) in 1995, the PSO algorithm has been successfully used in solving complex optimization problems in various fields of engineering and science Poli (2008); Eberhart and Shi (2001); Zhang et al. (2015). Let be an objective function, which will be specified in Section 2.3 for adversarial scenarios. The PSO algorithm performs searching via a population (called swarm) of candidate solutions (called particles) by iterations to optimize the objective function . Specifically, let
(2)  
(3) 
where is the position of particle in an dimensional space at iteration , is the total number of particles, and is the current iteration. The position, , of particle at iteration is updated with a velocity by
(4)  
where is the inertia weight, and are acceleration coefficients, and and
are uniformly distributed random variables in the range
. Following Xu et al. (2019), we fix and . We can see that the movement of each particle is guided by its individual best known position and the entire swarm’s best known position. We shall use the PSO algorithm to obtain desirable adversarial perturbations under various user’s requirements.2.3 Adversarial Image Generation
Given an image , we combine FI and PSO to generate its adversarial image with usercustomized options for the number of pixels for perturbation, the misclassification probability, and the targeted class to which the image is misclassified, denoted by , , and , respectively.
Denote image . For an RGB image of pixels, we view the three channel components of a pixel as three separate pixels, so here. We let the default value of .
We first locate vulnerable pixels in for perturbation, if is specified but the targeted pixels are not given by the user. We compute the FI measure in (1) for each pixel based on the objective function
(5) 
where . Denote to be the pixel with the th largest FI value. We use as the pixels for adversarial attack and let perturbation .
We then apply the PSO algorithm in (2) and (4) to obtain an optimal value of that minimizes the adversarial objective function
where we assume , constrains the range of perturbation to guarantee the visual quality of the generated adversarial image compared to the original, is a misclassification loss function, represents the magnitude of perturbation, and and are prespecified weights. To ensure the misleading nature of the generated adversarial sample, is set to prioritize over .
We use different functions to meet different usercustomized requirements on . If only is known, inspired by Meng (2018); Meng and Chen (2017), we let the misclassification loss function be
where is the label with the th largest prediction probability from the trained DCNN for the input image added with perturbation . Since results in the minimum of , this loss function encourages PSO to yield a valid perturbation. If the perturbed is prespecified with a misclassification probability , we use the misclassification loss function
Later in our experiments, we show that high is helpful to generate universal adversarial perturbations applicable to images from the other classes. If a targeted class is given, we choose the misclassification loss function
Furthermore, if both and are provided, we use
or equivalently .
2.4 Adversarial Training
We aim to create a set of adversarial images for a given trained DCNN model, and then finetune the model on the training data augmented with this adversarial dataset. To include as many adversarial images as possible, we do not specify a value to in Algorithm 1. Note that Algorithm 1 may not have a feasible solution when given with restrictive parameters such as small or small . To efficiently generate a batch of adversarial images, we first select a set of potentially vulnerable images by some modifications to Algorithm 1.
Specifically, given an image dataset , thresholds and targeted incorrect labels (if not given, the label with the second largest prediction probability), we first find , the set of all correctly classified images that have imagelevel FI (with ) and prediction probability . For each image in set , we generate its adversarial image by Algorithm 1 in which is the number of pixels with FI and is specified to . These generated adversarial images form an adversarial dataset. The whole procedure of our adversarial training is illustrated in Figure 1 and detailed in Algorithm 2.
3 Experiments
We conduct experiments on the two benchmark datasets MNIST and CIFAR10 using the ResNet32 model He et al. (2016). Data augmentation is used, including random horizontal and vertical shifts up to 12.5% of image height and width for both datasets, and additionally random horizontal flip for CIFAR10 data. Table 1 shows the prediction accuracy of our trained ResNet32 for the two datasets.
MNIST  CIFAR10  

Model  Training (n=60k)  Testing (n=10k)  Training (n=50k)  Testing (n=10k) 
Original  99.76%  99.25%  98.82%  91.28% 
Adversarial  99.68%  99.32%  99.10%  91.32% 
3.1 Customized Adversarial Image Generation
We consider two images with easy visual detection and large imagelevel FI in MNIST and CIFAR10, shown in Figures 2 and 3 with predictionprobability graphs and pixellevel FI maps. The probability bar graphs imply candidate misclassification classes that can be used as . The FI maps indicate the vulnerability of each pixel to local perturbation and are useful to locate pixels for attack.
We first evaluate the performance of Algorithm 1 (cf. Figure 1 (b)(e)) in generating adversarial examples of the two images according to different requirements on , and . Figures 4 and 5 show the generated adversarial images with corresponding perturbation maps. Perturbations 13 consider the settings with , , and , respectively, and with no specifications to and . For Perturbations 46, we only specify , , and , respectively, assign no value to , and tune being the number of pixels with FI and to obtain feasible solutions from PSO. Perturbations 79 are prespecified with , , and for MNIST, and , , and for CIFAR10, respectively, being the number of pixels with FI , and no value for . The detailed parameter settings for Algorithm 1 are provided in Supplementary Material. We can see that the generated adversarial images have visually negligible differences from the originals and satisfy the prespecified requirements.
We also investigate the adversarial universality of Perturbation 6 shown in Figures 4 and 5, which have 99% prediction probability to Class 4. Table 2 shows the proportions of original correctlyclassified images that are misclassified after added with the perturbations. The MNIST dataset has error rates at least 14.3% for all classes and some up to 100%, with a total rate above 87.5% in both training and testing sets. In particular, a remarkably large proportion of each class are misclassfied to Class 4 with a total rate of 62.2% and 64.5% for training and testing sets. Perturbation 6 for CIFAR10 also exhibits a certain extent of adversarial universality with nontargeted total error rates 3.92% and 6.19% and Class4targeted total rates 0.92% and 1.32% for training and testing sets, respectively. Figure 6 displays images from the other nine classes that are originally correctly classified with high probability but are misclassified (most with high probability) to Class 4 after added with Perturbation 6. These results indicate that our method may generate a universal adversarial perturbation, which particularly has the potential to misclassify images from different classes to the same specific class. The existence of universal adversarial perturbations may be attributed to the geometric correlations of decision boundaries between classes MoosaviDezfooli et al. (2017). An adversarial perturbation with very high confidence may have salient features of its resulting class and thus it may have strong power to drag other different images towards the decision boundary.
True Class  0  1  2  3  5  6  7  8  9  Total  

MNIST  Misclassifed  92.9  100  99.3  91.1  96.0  92.3  100  15.3  99.9  87.7 
Training  Misclass. to 4  81.1  36.9  55.8  84.3  75.5  49.6  95.3  14.2  68.8  62.2 
MNIST  Misclassifed  91.7  100  99.4  94.4  97.9  91.1  100  14.3  100  87.9 
Testing  Misclass. to 4  79.7  37.3  58.8  88.3  78.5  57.4  94.8  13.7  75.0  64.5 
CIFAR10  Misclassifed  8.46  2.74  2.90  7.27  5.77  0.42  4.82  2.07  1.00  3.92 
Training  Misclass. to 4  1.27  0.04  0.89  1.65  1.45  0.12  2.56  0.30  0.06  0.92 
CIFAR10  Misclassifed  11.2  5.62  6.27  12.36  8.29  0.94  6.81  3.05  2.61  6.19 
Testing  Misclass. to 4  2.17  0.10  1.48  2.52  2.34  0.31  2.81  0.32  0.21  1.32 
3.2 Adversarial Training
We consider using Algorithm 2 to generate adversarial datasets for adversarial training. Figure 7 shows the Manhattan plots of imagelevel FIs for correctly classified images and Figure 8 presents the heatmaps of confusion matrices. We can see that the distributions of imagelevel FIs and the patterns of misclassifications are very close between training and test datasets in both MNIST and CIFAR10. Hence, our adversarial training is expected to be useful for unseen adversarial examples generated from similar mechanisms in testing.
Based on the two figures, for selecting vulnerable images (cf. Figure 1(a)), we let , be the most frequent misclassified class of ’s true class, and in Algorithm 2 . The resulting image set is likely to be near the decision boundaries of the trained classifier. We then set in the algorithm. We generate adversarial datasets Adv1 () and Adv2 (), respectively, from training and testing sets of MNIST, and Adv3 () and Adv4 () from those of CIFAR10. Adv1 and Adv3 are used for adversarial training (cf. Figure 1(f)), whereas Adv2 and Adv4 test the adversarial trained models. The detailed parameter settings for Algorithm 2 to generate those datasets are given in Supplementary Material.
The adversarial trained ResNet32 models are trained from the original trained models on the training data augmented with Adv1 and Adv3, respectively, for additional 30 epochs for MNIST and 50 epochs for CIFAR10. The results of adversarial training are reported in Tables
1 and 3. Since the adversarial datasets () are much smaller than original testing datasets () and the original trained models already have high accuracy, the results in Tables 1 are only slightly improved on the test datasets. However, in Table 3, the adversarial training on Adv1 and Adv3 indeed benefits the defense of the finetuned ResNet32 models against adversarial attacks. The accuracy is dramatically improved from 0.00% to 83.82% and 88.93% on Adv1 and Adv3, respectively, and also up to 76.92% and 63.01% on testdata derived Adv2 and Adv4, respectively. We also observe an increase of and , respectively, in accuracy on combined data of original test set and its adversarial samples for MNIST and CIFAR10. These results indicate that our approach can significantly improve the adversarial defense of DCNN classifiers.MNIST  CIFAR10  

Model  Adv1  Tr.+Adv1  Adv2  Ts.+Adv2  Adv3  Tr.+Adv3  Adv4  Ts.+Adv4 
(n=136)  (n=60k+136)  (n=26)  (n=10k+26)  (n=244)  (n=50k+244)  (n=146)  (n=10k+146)  
Original  0.00%  99.53%  0.00%  98.99%  0.00%  98.34%  0.00%  89.97% 
Adversarial  83.82%  99.64%  76.92%  99.26%  88.93%  99.05%  63.01%  90.91% 
4 Conclusion
This paper introduced an FIandPSO based framework for adversarial image generation and training for DCNN classifiers by accounting for the user specified number of perturbed pixels, misclassification probability, and/or targeted incorrect class. We used the perturbationbased FI measure to efficiently detect the vulnerable images and pixels to increase the attack success rate. We designed different misclassification loss functions to meet various user’s specifications and obtained the optimal perturbation by the fast PSO algorithm. Experiments showed good performance of our approach in generating customized adversarial samples and associated adversarial training for DCNNs.
Broader Impact
DCNN models for image classification are widely used in various realworld applications such as selfdriving cars and face recognition for identification, but they can be vulnerable to adversarial attacks with small perturbations to original images, resulting in safety and security concerns in the above mentioned applications. Our proposed whitebox framework for adversarial image generation and training for DCNN classifiers may help developers to test and fortify their DCNNbased products to improve reliability in the realworld applications.
References

[1]
(2018)
Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge
. arXiv preprint arXiv:1811.02629. Cited by: §1.  [2] (2016) End to end learning for selfdriving cars. arXiv preprint arXiv:1604.07316. Cited by: §1.
 [3] (2017) Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Cited by: §1.
 [4] (1986) Assessment of local influence. Journal of the Royal Statistical Society: Series B (Methodological) 48 (2), pp. 133–155. Cited by: §1, §2.1.

[5]
(2001)
Particle swarm optimization: developments, applications and resources.
In
Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546)
, Vol. 1, pp. 81–86. Cited by: §2.2.  [6] (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1, §1.

[7]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §1, §3.  [8] (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §1.
 [9] (1995) Particle swarm optimization. In Proceedings of ICNN’95International Conference on Neural Networks, Vol. 4, pp. 1942–1948. Cited by: §1, §2.2.
 [10] (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
 [11] (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1.

[12]
(2017)
Towards deep learning models resistant to adversarial attacks
. arXiv preprint arXiv:1706.06083. Cited by: §1.  [13] (2017) Magnet: a twopronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §2.3.
 [14] (2018) Generating deep learning adversarial examples in blackbox scenario. Electronic Design Engineering 26 (24), pp. 164–173. Cited by: §2.3.
 [15] (2017) Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1765–1773. Cited by: §1, §3.1.
 [16] (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §1, §1.
 [17] (2019) They might not be giants: crafting blackbox adversarial examples with fewer queries using particle swarm optimization. arXiv preprint arXiv:1909.07490. Cited by: §1.
 [18] (2015) Deep learning applications and challenges in big data analytics. Journal of Big Data 2 (1), pp. 1. Cited by: §1.
 [19] (2019) Potential adversarial samples for whitebox attacks. arXiv preprint arXiv:1912.06409. Cited by: §1.
 [20] (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436. Cited by: §1.
 [21] (2018) Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, Note: arXiv preprint arXiv:1802.08760 Cited by: §1, §2.1.
 [22] (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pp. 372–387. Cited by: §1.
 [23] (2008) Analysis of the publications on the applications of particle swarm optimisation. Journal of Artificial Evolution and Applications 2008. Cited by: §2.2.
 [24] (2020) Adversarial attacks and defenses in deep learning. Engineering. Cited by: §1, §1.

[25]
(2019)
Sensitivity analysis of deep neural networks.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 4943–4950. Cited by: §1, §2.1.  [26] (2019) One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23 (5), pp. 828–841. Cited by: §1, §1.
 [27] (2015) Deepid3: face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873. Cited by: §1.
 [28] (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §1.
 [29] (2019) Adversarial examples in modern machine learning: a review. arXiv preprint arXiv:1911.05268. Cited by: §1.
 [30] (2019) Particle swarm optimization based on dimensional learning strategy. Swarm and Evolutionary Computation 45, pp. 33–51. Cited by: §2.2.
 [31] (2019) Attacking blackbox image classifiers with particle swarm optimization. IEEE Access 7, pp. 158051–158063. Cited by: §1.
 [32] (2015) A comprehensive survey on particle swarm optimization algorithm and its applications. Mathematical Problems in Engineering 2015. Cited by: §2.2.
 [33] (2007) Perturbation selection and influence measures in local influence analysis. The Annals of Statistics 35 (6), pp. 2565–2588. Cited by: §1, §2.1.
 [34] (2011) Bayesian influence analysis: a geometric approach. Biometrika 98 (2), pp. 307–323. Cited by: §1, §2.1.
Comments
There are no comments yet.