PAPERmaking! Vol6 Nr2 2020

LEEANDSEO

11of 19

when these extracted features from RD-LNN are fed into SVM. Note that two features are extracted from each variable corresponds to each class label ( y = 0, y = 1). The normal distribution is approximated to distance data where the mean and SD are set as the sample median and the SD of distances included in the interquartile range (ie, data ranging from the first quartile Q 1 to the third quartile Q 3) to minimize the effect of outliers. 3.6 Training model Different machine learning techniques are used for CL-LNN and RD-LNN, respectively, to train the model and predict failures 2 minutes earlier based on the data type that algorithms produce. First, the C5.0 decision tree algorithm which is an improved version of its predecessor C4.5 is applied to the CL-LNN algorithm for the classification between normal and abnormal conditions. In order to improve model performance, we implemented adaptive boosting which is the process in which many trees are built and trees vote for the best class. We set boosting iterations to 10. A cost matrix is also employed by assigning a penalty to different types of errors to improve the accuracy where 1 is assigned for the false positive, and 5 is chosen for the false-negative since failing to detect breaks can be a more expensive mistake. Second, the numerical feature matrix ( X train ) from the training dataset ( W train ) is fed into SVM to generate the model, and the other matrix ( X test ) is used to evaluate the performance of the model generated with training dataset ( W test ) from RD-LNN. To train SVM model, the function of kernel which takes data as input and transforms it into the required form for training and predicting is chosen to be radial . The cost is assigned to 1 to trade off the correct classification of training examples against maximization of the decision function’s margin. 0.5 is also used for gamma parameter which defines how far the influence of a single training example reaches. These parameters are selected heuristically by experiments based on our dataset.

4 RESULTS OF EXPERIMENT 4.1 Performance analysis

We compare our methods with four other different approaches which include a type of artificial neural network and general machine learning models without the feature extraction technique we proposed in this article. The first method is an Autoencoder which is comprised of encoder and decoder for extremely rare event classification 1 . The encoder is to learn the features of input data which are normally in a reduced dimension, while decoder regenerates the original data from the encoder output. This method uses a dense layer Autoencoder which selects the instances in random without considering the correlation among instances. The second approach is the improved version of the first one by constructing LSTM (long short-term memory) Autoencoder which contemplates the temporal features 2 . Both methods also attempt to detect failures 2 minutes earlier with the same dataset we use in this article. We, in addition, compare the method without feature extraction technique (ie, decision tree without CL-LNN, SVM without RD-LNN) in order to show the benefit of the proposed algorithm. Table 2 shows the prediction result in the form of a confusion matrix to compare the performance of six methods. As we can see from these results, it looks like all six methods are comparable, and hard to find which method provides better performance. It also shows the trade-off between the true positive/negative and false positive/negative. RD-LNN, however, shows the lower number of false-positive among the four methods. Table 3 provides other metrics to compare the performance among six different methods. In the table, four metrics are used to evaluate the performance of the proposed classification algorithms. Precision (also known as the positive predictive value) is defined as the proportion of positive instances over the total number of positive. Recall (also known as sensitivity, true positive rate) is the number of true positives divided by the number of true positives plus the number of false negatives. In addition, False positive rate (1 - specificity) refers to the probability of falsely rejecting the null hypothesis for a particular test. Since, however, the distribution of class labels is highly skewed, another performance metric F-measure has been 1 The implementation of Autoencoder refers to this site (https://github.com/cran2367/autoencoder_classifier/blob/master/autoencoder_classifier. ipynb) 2 The implementation of LSTM Autoencoder refers to this site (https://github.com/cran2367/lstm_autoencoder_classifier/blob/master/lstm_ autoencoder_classifier.ipynb)

Made with FlippingBook - Online catalogs