Wolfram Language

Fooling Neural Networks

Optical illusions can fool humans. Similarly, one can construct illusions for image classification networks.

Load a pre-trained image classification network from the Wolfram Neural Net Repository.

Choose two images that the network can classify with certainty.

To "cockroach the tiger" a new neural net was built by prepending a ConstantArrayLayer that contains the image of the tiger.

One can retrain just the ConstantArrayLayer in this new network using the LearningRateMultipliers option and thus force the network to always classify the image as a cockroach.

After 256 training rounds, the resulting image is extracted from the ConstantArrayLayer.

For humans, the image appears to have stayed the same, but the network thinks differently. It now classifies the tiger as a cockroach with an 83% probability.

The network has been fooled by small changes with high spatial frequency.

Note that this optical illusion is network specific. The neural network in ImageIdentify is not fooled.

Furthermore, a blur quickly eliminates this optical illusion.

Related Examples