Neural Networks

Whether it’s a real brain or an artificial one, the trick is to train the damn thing.

Last month I promised to talk in a bit more depth about neural networks, which do a lot of cool things (like recognizing the amount written on a check you deposit into an ATM).

A biological neural network is a network of neurons, or nerve cells. Most, but not all, animals have such a network. For example, a roundworm contains about 300 neurons, a cockroach about 1 million, and you, Dear Reader, have more than 86 billion nerve cells in your body (with about one-third of them making up your brain). On the other hand, members of the sponge family don’t have a single nerve cell in their bodies.

The key feature of a neuron is electrical activity. A neuron receives signals from (potentially many) other cells via structures called dendrites. Based on those inputs, the neuron may “fire,” transmitting an electrical pulse along its axon (a single longer extension of the cell). The axon branches at its far end and, in response to a pulse, sends signals to the dendrites of (potentially many) other neurons. The function of these neurons is the basis for all the magical things we can do as humans, thanks to an incredibly large and complex neural network that includes the tens of billions of neurons in our brain. (Please understand that this is a vast simplification of an incredibly complex and incompletely understood process.)

Early computer scientists sought to reproduce the action of neurons. The first attempt was a program called a Perceptron, which accepted a number of inputs. Each input was multiplied by a “weighting” associated with that specific input (which might be positive or negative). If the sum of the inputs was greater than a certain number (called the “bias”), the Perceptron would “fire.” You can easily see the similarity to the behavior of neurons. Modern artificial neural networks contain several layers of these perceptrons, just as your brain contains several layers of nerve cells in its cerebral cortex, where the heavy lifting of human pattern recognition gets done.

Whether it’s a real brain or an artificial one, the trick is to train the damn thing. Our brains do this naturally via processes we don’t fully understand. In the case of an artificial neural network, it comes down to an algorithm for assigning weights to each input. Let’s say we want to train a neural net to recognize a capital “A,” regardless of the font used to depict it. We start with a large set of As in different fonts, and take a part of them to use as the “training set.”

We divide each picture into a 10-by-10 array and feed the input of each element into a perceptron (so there are 100 perceptrons in the first layer of our network). Input might be all black, all white, or somewhere in-between (on an edge of the letter).

The second layer of our network has 10 perceptrons, each of which has 100 inputs (one from each perceptron in the first layer). The final layer of our network has a single output perceptron, which takes the 10 inputs from the second layer and produces a single output: one if the character is an “A” and zero otherwise.

Initially, we give each input in each layer a random weight. Now we show each image in our training set to the first layer and see what comes out of the third, or output, layer. We know that each image should produce an output of one. So, when we get a zero, we know that the weights of the various inputs need to be adjusted. But how?

One approach is to randomly adjust each weight by a small amount and see if that change yields a one at the output instead of zero. If it does, then we keep the new weights; otherwise, we throw them away. This is basically what happens with biological evolution through mutation and natural selection. The problem is that it takes a long time. Research has yielded more efficient solutions, but this is a column, not a textbook.

Once our network has been trained correctly, it should be able to correctly recognize the rest of our images. A couple of problems can occur with training. One is that the training data doesn’t contain enough information to train the network. Another is that the network “memorizes” the training set—that is, it gets perfect results on the training set but fails miserably with the rest of the images.

Fortunately, scientists and engineers have found ways to overcome these problems, so Facebook can automatically tag your friends in photos, the Golden Gate Bridge toll system can read your license plate and the post office can automatically route envelopes by reading the address on the outside. It’s all done with the amazing pattern recognizing power of the neural networks.

Related Posts