LEEANDSEO
9of 19
FIGURE 4 Three different cases based on nearest neighbor-based feature extraction
On the other hand, for an instance of the test dataset, the algorithm simply searches the nearest neighbor from the training dataset. The nearest neighbor is found by computing Euclidean distance based on the time window for each variable as in Equation (6). Note that, for a given instance, the CL-LNN features for different variables may be varied because the nearest neighbor for each variable could be different. These binary features are fed into the decision tree classifier for the model training and prediction, which will be described in more detail in Section 3.6. The features can keep the original information of each sensor signal by considering each variable separately, and correlations between different variables are expected to be handled by the decision tree algorithm. Another feature we propose in this article is called the RD-LNN. While the CL-LNN can be thought of as features of hard classification where the outcome is certainly given as 0 or 1, RD-LNN provides features of soft classification, which can be seen as probability-like features. Although the binary feature extracted from the CL-LNN provides information on which one is the closest to the instance under consideration, it is not able to measure the degrees of significance or strength of the extracted feature. Let us consider three cases to classify the label of instances with nearest neighbor-based feature extraction in Figure 4. In the first case, there is a clear decision boundary which makes it easy to separate two distinct groups where CL-LNN might show superior performance. However, outliers in the second example make it more challenging to classify the target instance. Suppose we know the CL-LNN for a given instance is, say, 1 a break signal. To build a robust prediction model, we may also want to know how reliable and accurate this signal is. For the second case, even if the nearest neighbor is the break signal, this nearest neighbor is an outlier with respect to the majority of the other break signals. In this case, relying solely on the class label of the nearest neighbor may be risky. To complement this pitfall of binary features, we may consider measuring distances from the nearest neighbor to the other instances, respectively. If the distance value is large, the nearest neighbor is thought to be located far from the majority of its same class and does not provide reliable information. Whereas if the distance is small, the nearest neighbor is thought to represent the group of the same class and the information provided by this instance is more accurate. In lieu of direct distance measure, we use probability measure which is similar to the computation of P -value for a statistical hypothesis testing. Rare events (ie, breaks) in our dataset, however, appear to be indistinguishable from the other which is ambiguous to differentiate those two groups as in the third case. In this situation, we found that it is more effective to measure the relative distance for each group, respectively, instead of applying the same nearest neighbor to different groups. Specifically, given an instance of which the class label has to be predicted, Euclidean distances to all the other instances in the training dataset are computed. For each class label ( y = 0, y = 1), the nearest neighbors are found. Let d ∗ 0 and d ∗ 1 be distances to nearest neighbors with class label 0 and 1, respectively. We can also find the approximated normal distribution for each class. Let X 0 and X 1 be random variables with these approximated normal distributions. The RD-LNN features are computed by P ( X 0 ≤ d ∗ 0 ) and P ( X 1 ≤ d ∗ 1 ) for each class, which can be interpreted as the probability that an observation is located farther than the nearest neighbor from the center of each class. That is,
s i )
i ) =Φ (
d ∗
i − 𝜇 i
P i = P ( X i ≤ d ∗
i = 0 , 1 ,
(8)
,
Made with FlippingBook - Online catalogs