Saturday, September 30, 2017

Evolving Neural Network Weights

Neural networks offer quiet a large repertoire of tasks that they can accomplish well. With the increasingly smart solutions people come up with to squeeze out performance with smaller networks(squeezeNet ,ReLu to name a few) it's likely that neural networks will dominate the Machine learning scenario for some time to come.

There is one limitation however. Current implementations of learning mechanisms ([1], [2]) are mostly gradient based. This limits the loss function to being at least differentiable, if not double differentiable in some cases. There are alternatives like reinforcement learning and genetic search but those have not been as heavily invested into by the community as the gradient based methods.

This post explores genetic search as a method for finding the appropriate weights for neural network architectures. We illustrate the basic principles and follow along with some basic code.

Let's look at the XOR problem. One of the classic problems that Minsky famously pointed to in the Perceptron book[3]. The problem is simply to compute the XOR function given two binary inputs. While a single perceptron is unable to do this, a multi layer perceptron is able to perform this computation.

We use a simple 3 layer neural network and sigmoid activation as done in the classic days of neural networks. Instead of using the wonderful powers of Theano, tensorflow or torch to perform automatic differentiation in order to use gradient based methods, we are able to drop those architectures from our program entirely.

We use numpy to make the matrix operations a little easier to read. The first part of the code imports the relevant libraries and defines the sigmoid function to be used later on.

We then define a roulette function which selects elements from a given array with a probability proportional to the magnitude of elements in the array. This is present in this part of the code.

The next bit defines the forward calculation of the network given it's weights and input data. Besides that it also defines a function which calculates the "score" of a given output from the network and the output expected. Here we ask the genetic search to maximize the ROC AUC score.

The get fitness function is another building block which makes use of the functions we have already defined. It takes in a network configuration and returns the "fitness" associated with this configuration. In genetic search terms this network configuration / network weight list is known as the "gene".

To employ genetic search in it's classic form we write two function which emulate gene crossover and mutation. This is the bread and butter of the genetic search algorithm which allows is to move around in the search space.

The last of our function definitions defines the "main" function which evolves network weights to fit on a certain data set that we provide. This data set is the XOR data set that we manufacture using numpy in the later sections of the code.

All that is left is to set the various configuration variables for the genetic search and to call the evolution function with these variables. We see that genetic search very quickly finds the "correct" weights for this neural network. It does get lost sometimes for very long periods of time but that is simply because we have implemented a vanilla version of genetic search.

The entire code is available at https://github.com/theSage21/evonet.

No comments:

Post a Comment