Semantron 22 Summer 2022

Neural networks

The linear rectifier: f(x) = max(0, x)

𝑒 𝑥 −𝑒 −𝑥 𝑒 𝑥 +𝑒 −𝑥

The hyperbolic tangent: f(x) = tanh(x) =

From this, an equation for the value of a node can be defined as:

𝑎 𝑖 = 𝑓(∑(𝑊 𝑖𝑗 ∙ 𝑥 𝑗 ) 𝑗

+𝑏 𝑖 )

where a i is the value of node i in the second layer, f is the activation function, W ij is the weight of the connection between node i in the second layer and node j in the first layer, x j is the value of node j in the first layer and b i is the bias of node i in the second layer. This calculation can be further simplified by defining each layer of nodes (and their biases) as vectors and the weights of connections as a matrix in which the first layer has m nodes and the second layer has n nodes. (Abdi, 1994)

𝑎,𝑏 𝜖 ℝ 𝑛

𝑊 𝜖 ℝ 𝑛×𝑚

𝑥 𝜖 ℝ 𝑚

R n means a vector with n elements and R n x m means a matrix with dimensions n x m (Darmochwał, 1991). Using this, the simplified equation is:

𝑎 𝑖 = 𝑓(𝑊 𝑖 ∙ 𝑥 +𝑏 𝑖 )

This is possible because the dot product between the two vectors W i and x is equivalent to the summation in the previous equation. Between two layers, the network essentially maps vectors ℝ 𝑚 → ℝ 𝑛 . Expanded to cover the whole network, vectors are mapped from the dimension of the input to the dimension of the output. For example, in a network with layers of size 16, 32, 12, 8 vectors would be mapped ℝ 16 →ℝ 32 →ℝ 12 →ℝ 8 .

60

Made with FlippingBook interactive PDF creator