PAPERmaking! Vol9 Nr2 2023

Processes 2023 , 11 , 809

8of 19

g t = i t · ˆ c t

(10)

(11)

c t = f t ∗ c t

g t

− 1 +

, x t ]+ b o )

(12)

o t = σ ( W o · [ h t

− 1

h t = o t ∗ tanh ( c t ) (13) The first gate in an LSTM network is the forget gate, represented by f t in Equation (7), which controls the amount of the cell state that is discarded. The second gate is the input gate, represented by g t in Equations (8)–(10), which decides the amount of input information to be stored in the cell state. The updated cell state is defined in Equation (11). The input layer, represented by i t , defines which cell state will be updated, and candidate values for updating it are represented by ˆ c t . The final gate is the output gate, represented by o t , which defines the output of the LSTM block. This gate determines the amount of cell state that will be output, which is defined by Equations (12) and (13). The trainable weights for each gate layer are represented by W f , W i , W c , and W o , and the bias associated with each gate layer is represented by b f , b i , b c , and b o . The training process and the impact of the network hyperparameters on training and model performance are discussed in [51,52]. For ease of understanding, the output state of time-step t in layer l can be described as follows:

( l ) = LSTM ( x

( l ) )

t , h t

(14)

h t

− 1

After obtaining the outputs from the LSTM layers, the next layer is the ANN layer. The trainable weights and biases for this layer are represented by W a and b a , respectively. The output of the ANN layer is denoted by y and is given by the following equations: y t = σ ( W a · h t + b a ) (15) The data is split between 80% training and 20% testing. The model can be trained using the Deep Learning Toolbox in MATLAB, TensorFlow, or PyTorch. There are several optimizers available, such as RMSProp, SGD, Adam, and AdaGrad; however, for this work, the Adam optimizer was used [53]. The optimal structure was determined through trial and error by altering the network hyperparameters, such as the number of layers, nodes, types of activation functions, batch size, and dropout rate [54]. The training process was considered complete when the R 2 value of the test predictions was more than 0.98. Once the LSTM-ANN model was trained, an MPC was designed, which used the trained model as an internal evaluator at every instant, instead of a complex first-principles model, to obtain the optimal input profiles. 3.2. LSTM-ANN-Based Model Predictive Controller Design MPC is widely used in chemical plants and refineries to control and optimize the process while considering real plant constraints [37,55–57]. Typically, using a nonlinear MPC is advantageous as it can accurately represent the actual process dynamics; however, a nonlinear first-principles model is computationally expensive and cannot be used online for timely control actions. Many studies have used state-space models, such as N4SID and other data-driven techniques, to form a reduced-order model for controller purposes, but these methods suffer from low accuracy—mainly when dealing with nonlinear time- varying data. In this work, we propose an LSTM-ANN-based MPC design, where a well-trained LSTM-ANN model is used to optimize the process instead of the complex first-principles model. The primary goal of the optimizer is to achieve the desired set-point values for

Made with FlippingBook Digital Publishing Software