PAPERmaking! Vol6 Nr2 2020

LEEANDSEO

8of 19

To select the appropriate distance measure between Euclidean distance and DTW distance, a separate experiment is conducted to compare the performance, of which the result is shown in Section 4. For our proposed algorithms, Euclidean distance is used to measure the distance between two-time series instances as the experiment shows that Euclidean distance requires much less time than DTW without a significant difference in performances between the two methods. 3.5 Nearest neighbor-based feature extraction One possible way to extend the 1-NN for a single-stream time-series data to the case of multistream signals could be to use the sum of the Euclidean distance measured by Equation (6) for each variable to measure the similarity between two multistream window instances. In this case, however, information of all variables is aggregated, which results in the loss of each variable’s information and relationships between variables. Instead, we look for the nearest neighbor considering each variable only, which we call the local nearest neighbor (ie, the nearest neighbor in an embedded space of a single stream), and extract scalar features from it. These features are fed into different classification algorithms depending on the types of extracted features. Algorithm 1. CL-LNN feature extraction 1: Input : Multistream window instances W train for training, and W test for testing, class labels of training instances y train , index set of training data Train , index set of test data Test 2: Output : Binary feature matrices X train with elements x ij , i ∈ Train , and X test with elements x ij , i ∈ Test 3: 4: for i ∈ Train do 5: for j ∈{ 1 , … , p } do d ∗ = L 6: L is a large number used for a initialization 7: for k ∈ Train ⧵ i do d = D ED ( x ij , x kj ) 8: for all instances in training data except itself (LOO-CV) 9: if d ≤ d ∗ then k ∗ ← k , d ∗ ← d 10: end if 11: endfor x ij ← y k ∗ 12: store class label of the local nearest neighbor as a feature 13: endfor 14: endfor 15: 16: for i ∈ Test do 17: for j ∈{ 1 , … , p } do d ∗ = L 18: L is a large number used for a initialization 19: for k ∈ Train do d = D ED ( x ij , x kj ) 20: for all instances in training data 21: if d ≤ d ∗ then k ∗ ← k , d ∗ ← d 22: end if 23: endfor x ij ← y k ∗ 24: store class label of the local nearest neighbor as a feature 25: endfor 26: endfor The first feature we propose is the CL-LNN, which is given as 0 or 1 for each variable. Algorithm 1 outlines the procedure of the CL-LNN feature extraction in which the MSTS data is converted into binary feature matrices X train and X test . The local nearest neighbor is found by leave-one-out cross-validation (LOO-CV) for each variable on the training dataset W train . That is, for an instance of the training dataset, LOO-CV searches all the other instances in the training dataset except itself and chooses the one that gives the highest matching with it, which is simple but effective for 1-NN. 38

Made with FlippingBook - Online catalogs