05 July 2015

Google's DeepDream

From the description on Vimeo:
A visualization of what's happening inside the mind of an artificial neural network.
By recognizing forms in these images, your mind is already reflecting what's going on in the software, projecting its own bias onto what it sees. You think you are seeing things, perhaps puppies, slugs, birds, reptiles etc. If you look carefully, that's not what's in there. But those are the closest things your mind can match to what it's seeing. Your mind is struggling to put together images based on what you know. And that's exactly what's happening in the software. And you've been training your mind for years, probably decades. These neural networks are usually trained for a few hours, days or weeks.

In non-technical speak:
An artificial neural network can be thought of as analogous to a brain (immensely, immensely, immensely simplified. nothing like a brain really). It consists of layers of neurons and connections between neurons. Information is stored in this network as 'weights' (strengths) of connections between neurons. Low layers (i.e. closer to the input, e.g. 'eyes') store (and recognise) low level abstract features (corners, edges, orientations etc.) and higher layers store (and recognise) higher level features. This is analogous to how information is stored in the mammalian cerebral cortex (e.g. our brain).

Here a neural network has been 'trained' on millions of images - i.e. the images have been fed into the network, and the network has 'learnt' about them (establishes weights / strengths for each neuron). (NB. This is a specific database of images fed into the network known as ImageNet http://image-net.org/explore )
Then when the network is fed a new unknown image (e.g. me), it tries to make sense of (i.e. recognise) this new image in context of what it already knows, i.e. what it's already been trained on.
This can be thought of as asking the network "Based on what you've seen / what you know, what do you think this is?", and is analogous to you recognising objects in clouds or ink / rorschach tests etc.
The effect is further exaggerated by encouraging the algorithm to generate an image of what it 'thinks' it is seeing, and feeding that image back into the input. Then it's asked to reevaluate, creating a positive feedback loop, reinforcing the biased misinterpretation.
This is like asking you to draw what you think you see in the clouds, and then asking you to look at your drawing and draw what you think you are seeing in your drawing etc,

That last sentence was actually not fully accurate. It would be accurate, if instead of asking you to draw what you think you saw in the clouds, we scanned your brain, looked at a particular group of neurons, reconstructed an image based on the firing patterns of those neurons, based on the in-between representational states in your brain, and gave *that* image to you to look at. Then you would try to make sense of (i.e. recognise) *that* image, and the whole process will be repeated.

We aren't actually asking the system what it thinks the image is, we're extracting the image from somewhere inside the network. From any one of the layers. Since different layers store different levels of abstraction and detail, picking different layers to generate the 'internal picture' hi-lights different features.
All based on the google research by Alexander Mordvintsev, Software Engineer, Christopher Olah, Software Engineering Intern and Mike Tyka, Software Engineer

No comments: