PAPERmaking! Vol6 Nr2 2020

LEEANDSEO

4of 19

3 MATERIALS AND METHODS 3.1 Dataset description

The dataset was provided by the Institute of Industrial and Systems Engineers (IISE) 2019 data competition, which recorded real sensor observations from a paper manufacturing process. 2 Many different types of data are collected over a period of time using a variety of sensors located on the machines. Some sensors measure raw materials (eg, amount of pulp fiber, chemicals, and so on) and the others represent process variables (eg, blade type, couch vacuum, rotor speed, and so on). Overall, 61 different sensor signals are collected, and 1 month of monitoring data are recorded at every 2 minute for a paper manufacturing machine, which results in the dataset of 61 streaming signals at 18 398 time points. In addition, for each time point, the system condition (ie, normal or break) has been recorded in a binary response variable. Despite such a large number of measurements, the failures only occur at 124 time points (0.67 % of total observations) during operation and this characteristic of the rare event makes it hard to predict the failure before it occurs. Table 1 sum- marizes the dataset. A data-driven approach is used for this problem instead of incorporating physical models since no information was given regarding sensor information and domain knowledge. Predicting failures for a pulp-and-paper mill is critical because a break has a significant impact on the entire process. Even though paper breaks rarely take place during operation, only one failure causes a significant loss of time and labor for identifying a cause of the failure and replacing any broken parts. Once the machine fails, the entire process should be stopped since the operation needs to be halted until the problem is found and fixed. This maintenance procedure would take more than an hour which would incur a substantial amount of cost. It indicates that only a small amount of failure reduction through early detection could give a significant amount of cost savings for industries. 3.2 Procedure The overall procedure of the proposed algorithms in this article is presented in Figure 1 consisting of preprocessing, class label of the local nearest neighbor (CL-LNN) and relative distance of the local nearest neighbor (RD-LNN) with corresponding machine learning techniques. The original MSTS dataset is preprocessed before carrying out two types of feature extraction methods and these features are fed into a decision tree or SVM based on the extracted data types for early failure detection. More detailed information is described in the following sections.

3.3 Data preprocessing The MSTS data obtained from the paper manufacturing machinery is given as

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

s 1 , 1 s 1 , 2 … s 1 , p c 1 s 2 , 1 s 2 , 2 … s 2 , p c 2 ⋮ ⋮ ⋮ ⋮ ⋮ s T , 1 s T , 2 … s T , p c T ,

MSTS = ( s 1 s 2 … s p c ) =

(1)

where s t , j ’s, t = 1, … , T , j = 1, … , p , are sensor signals measured at the time point t fromthe j thsensor, T = 18 274 is the number of measurement time points, p = 61 is the number of variables by different sensors, and c t ’s are records of the

TABLE 1 Dataset description

Element

Value Remark

Number of variables

Continuous variables 59 Categorical variables 2

s 1 ∼ s 27 , s 29 ∼ s 60

s 28 (8 categories), s 61 (2 categories)

Number of measurements Normal

18 274 Recorded by every 2 minute

Abnormal (failure)

124

Made with FlippingBook - Online catalogs