LEEANDSEO
15of 19
FIGURE 8 The effect of the number of normal instances randomly selected in training data set
0.10
5.4
0.08
0.06
5.2
0.04
5.0
0.02
0.00
100
150
200
250
300
No. of normal instances in training dataset
TABLE 5 Root cause analysis through decision tree algorithm
Rank Variable Importance Rank Variable Importance Rank Variable Importance 1 s 3 100% 21 s 41 62% 41 s 1 0% 2 s 20 100% 22 s 39 58% 42 s 2 0% 3 s 21 100% 23 s 37 58% 43 s 7 0% 4 s 32 100% 24 s 35 49% 44 s 10 0% 5 s 33 100% 25 s 35 49% 45 s 12 0% 6 s 40 100% 26 s 34 46% 46 s 13 0% 7 s 6 81% 27 s 30 44% 47 s 24 0% 8 s 60 70% 28 s 28 44% 48 s 28 0% 9 s 15 65% 29 s 27 40% 49 s 29 0% 10 s 26 65% 30 s 26 39% 50 s 36 0% 11 s 57 64% 31 s 21 38% 51 s 42 0% 12 s 18 60% 32 s 21 36% 52 s 45 0% 13 s 31 59% 33 s 18 34% 53 s 49 0% 14 s 44 56% 34 s 18 34% 54 s 50 0% 15 s 14 49% 35 s 18 32% 55 s 52 0% 16 s 51 49% 36 s 14 31% 56 s 54 0% 17 s 37 46% 37 s 14 31% 57 s 55 0% 18 s 34 46% 38 s 13 29% 58 s 56 0% 19 s 11 45% 39 s 12 29% 59 s 58 0% 20 s 46 44% 40 s 11 28% 60 s 59 0% 61 s 61 0%
4.3 Root cause analysis Root cause analysis is implemented by measuring the importance of each variable to find the critical ones which cause the failure of paper manufacturing machinery based on the decision tree algorithm. The variable importance is estimated based on the percentage of training dataset samples that fall into all the terminal nodes after the split to find the root cause. In Table 5, 61 variables are listed in the order of importance from 1 to 61. Six variables ( s 3 , s 20 , s 21 , s 32 , s 33 , and s 40 ) which have the importance 100% are the most important variables to detect failures earlier than its occurrence. In other words, these six variables have the most impact on the classification model. One interesting fact is that a categorical variable ( s 28 ) and a binary variable ( s 61 ) do not make any contributions to this model.
Made with FlippingBook - Online catalogs