PAPERmaking! Vol9 Nr2 2023

Processes 2023 , 11 , 809

7of 19

can simultaneously handle both types of inputs, as reported in previous studies [45–47]. LSTMs, which are an advancement of recurrent neural networks (RNNs), are known for accurately capturing long- and short-term time dependencies. Standard RNNs generally suffer from the problem of vanishing/exploding gradients due to the multiplication of weights over several layers [34,48]. LSTMs overcome this problem by using feedback loops in the hidden layers and enforcing a constant error flow through the internal states of special units called memory cells. By employing multiple LSTM layers where the hidden units of the previous layer are used as input for the next layer, complex processes can be modeled, and hidden hierarchical information can be extracted. This combination of LSTM-ANN is well-suited for processes with time-varying and time-invariant data. The proposed LSTM-ANN model is trained on the data generated by the multiscale model until a desired test accuracy of over 98% is achieved. LSTMs are able to model long-term sequential time dependencies through the use of input, output, and forget gates [49,50]. These networks take in the current input value x t , previous output value h t − 1 , and previous unit state c t − 1 as inputs and produce a current output value h t and unit state c t as outputs. In contrast, ANNs have a set of units in the hidden layer that are connected to every unit in the previous and subsequent layers. A combined LSTM-ANN network is shown in Figure 3, where the input is represented by x , the output is represented by h , nonlinear activation functions are represented by a logistic sigmoid ( σ ( · )) and hyperbolic tangent ( tanh ( · )) , point-wise multiplication and addition are represented by ∗ and + , and the cell state is represented by C . The figure consists of three LSTM units as a representative example to show the flow of data, which transfer the outputs h t − 1 , h t , and h t + 1 to the ANN layers. The ANN layers then process the time-invariant input data through nonlinear activation functions and connect to the output layer, representing the Kappa number and cellulose DP in this case. By using LSTM layers first, the network is able to extract time-varying information directly from the input layer, making it easier to extract temporal dependencies.

Figure3. Architecture of the LSTM-ANN network.

The cell state, c , flows through the chain of the LSTM block at time instant t , being optionally modified by the gates, which are represented by the equations below: f t = σ ( W f · [ h t − 1 , x t ]+ b f ) (7)

(8)

i t = σ ( W i · h t

b i )

− 1 +

ˆ c t = tanh ( W c · [ h t

(9)

b c ])

− 1 +

Made with FlippingBook Digital Publishing Software