Adversarial Glossary

FGSM (Fast Gradient Sign Method)

A single-step white-box attack that computes the gradient of the loss function with respect to the input image, and pushes the pixel values in the direction of the gradient sign multiplied by an epsilon value.

PGD (Projected Gradient Descent)

A powerful iterative white-box attack. It applies FGSM multiple times with a small step size (Alpha), projecting the result back into the epsilon-ball around the original image after each step to ensure the perturbation remains imperceptible.

MI-FGSM (Momentum Iterative FGSM)

An extension of PGD that incorporates momentum into the iterative process. This helps escape poor local maxima and produces highly transferable adversarial examples that can fool black-box models.

DeepFool

An untargeted attack that computes the minimal perturbation required to push an image across the closest decision boundary, causing misclassification.

C&W (Carlini & Wagner)

A highly robust optimization-based white-box attack designed to defeat defensive distillation. It optimizes a custom margin-based objective function to find the smallest possible perturbation.

AutoAttack

A parameter-free ensemble of four diverse attacks (including APGD and Square Attack) that provides a reliable evaluation of a model's true adversarial robustness.

Square Attack

A query-efficient, score-based black-box attack that uses randomized search with localized square-shaped updates to decrease the model's confidence in the correct class.

Epsilon (ε)

The maximum allowed perturbation budget. A higher epsilon allows larger, more visible changes to the image, making the attack stronger but more detectable.

Alpha (α) / Step Size

The step size used in each iteration of iterative attacks like PGD. Usually, it is set smaller than Epsilon so the attack can take multiple steps to find a precise adversarial example.

L_inf (L-infinity Norm)

A distance metric where the distance is the maximum absolute change in any single pixel. Constraining an attack with L_inf ensures no single pixel changes by more than the Epsilon budget.

L2 Norm

A distance metric based on Euclidean distance (the sum of squared differences). L2-constrained attacks tend to spread the perturbation smoothly across the entire image.

Perturbation

The carefully calculated, often invisible noise added to an image to cause an AI model to misclassify it. In adversarial machine learning, this noise is mathematically optimized, not random.

Saliency Map

A visual representation of which pixels in the image had the most impact on the model's final decision. It helps explain what the neural network was "looking at" when it made a mistake.

White-box Attack

An attack where the adversary has full access to the target model's architecture, weights, and gradients. This allows for precise, gradient-based optimization.

Black-box Attack

An attack where the adversary only has access to the model's outputs (e.g., probability scores or labels) but not its internal gradients or weights.

Ensemble Attack

An attack method that combines multiple different individual attacks to test the robustness of a model comprehensively.

Targeted Attack

An attack that aims to cause the model to output a specific, attacker-chosen target class instead of just causing any misclassification.

SSIM (Structural Similarity Index Measure)

A perception-based metric that measures the visual similarity between the original and adversarial images. A value of 100% means they are structurally identical, while lower values indicate visible distortions.

PSNR (Peak Signal-to-Noise Ratio)

A mathematical metric used to quantify the distortion added by the perturbation. A higher PSNR (measured in decibels or dB) indicates a higher quality, stealthier attack with less perceptible noise.

Iterations / Steps

The total number of optimization steps the attack algorithm takes to generate the adversarial perturbation. More iterations generally result in a stronger and tighter attack, but require more computational time.

Latency

The wall-clock time (in milliseconds) it took to compute the adversarial example. Optimization-based attacks like C&W have much higher latency compared to fast gradient methods like FGSM.