Neural networks
so it would be a waste of processing time to evaluate every single one of these at the start of the training process when the model is likely to be very inaccurate.
One way of addressing this is known as stochastic gradient descent. This adjusts the model after evaluating smaller batches of samples repeatedly rather than doing them all at once (Bottou, 2012). For example, in a data set with 16384 samples stochastic gradient descent could split it into 128 batches of 64 samples. Pseudocode for both methods follows:
for i = 0 to nTrainingCycles - 1:
loss = 0
for j = 0 to 16383:
loss += Loss(network.predict(data[j]), labels[j])
network.gradientDescent(loss)
for i = 0 to 127:
loss = 0
offset = i * 64
for j = 0 to 63:
loss += Loss(network.predict(data[offset + j], labels[offset + j]))
network.gradientDescent(loss)
Option two: genetic algorithms
Genetic algorithms are a directed random search technique which is often used in optimization problems in which analytical solutions are difficult to obtain. (Leung, 2003) They seek to mimic the way in which living organisms evolve following Darwin’s theo ry of natural selection where in a population only the best adapted organisms in an environment are able to reproduce successfully and therefore pass down their genes to the next generation. This results in each generation being slightly better adapted to their environment than the previous generation and over a long period of time can cause huge changes in organisms. This section of the essay will outline a basic description of a genetic algorithm, simplified for the purpose of explanation and understanding. A genetic algorithm represents objects as chromosomes, an array of genes. Each gene is a bit and the chromosome’s value is determined based on the alleles of the genes (0 or 1) (Dewdney, 1993). The program must first find some way to generate parameters for a neural network based on a chromosome (and vice versa). For the purposes of examples, I will assume that each chromosome contains 8 genes, although in practice it is likely to be significantly larger than this, such as 10101001 or 00011001. The algorithm must first generate a population of random chromosomes. The size of this population will affect the speed of the algorithm, but incredibly large populations relative to the complexity of the problem are not necessary (Dewdney, 1993).
62
Made with FlippingBook interactive PDF creator