LEEANDSEO
17of 19
Considering the fact that sensor measurements are collected every 2 minutes and it takes less than 20 seconds to analyze one measurement with our algorithm to detect a failure, this algorithm would be a feasible solution in a real-world environment where a prior warning is given so that technicians can take appropriate actions to prevent a breakdown. However, it would be possible to find a more efficient way to deal with computation complexity when deploying to a real-world environment. One possible solution for the real-time application is that, based on the fact that Euclidean distance is calculated based on squared differences between two instances at m time points (see Equation (6)), if we store these squared differences from, say, t = 1 to t = m , it can be easily updated, when a new signal is mea- sured at t = m + 1, by dropping one at t = 1 and adding one at t = m + 1. In this case, by reutilizing previously computed results at t = 2, … , m , it is only required to compute one for t = m + 1 which will let us save much time to calculate the distance. It should also be noticed that the test dataset is standardized with mean and SD obtained from the training dataset since these parameters of the test dataset are not available during the model training. This fact could pos- sibly lead to a negative impact on the performance if new measurements show a significant difference from the previous ones (training dataset). Although we assume that the future examples will have similar mean and SD as the training dataset in this article, this can be alleviated by updating those parameters as we gained the new measurements. Even though cost-benefit analysis shows promising results, further research to overcome the rare event sit- uation is still necessary, since improving performance is limited by insufficient labeled data from which most of the machine learning algorithms normally suffer. More efforts should be made to overcome the lack of fail- ure data which is normally encountered when collecting data in industries such as failures, spam email, fraud credit card transactions, and so on. The concept of active learning could provide a possible solution to handle the extremely rare event problem where the dataset is severely imbalanced (skewed) with a small number of ini- tial training data available. The basic idea of active learning is that better performance in a machine learning algorithm can be achieved with fewer training labeled data if we are allowed to choose the data from which it learns. Therefore, we might be able to get better performance by adopting active learning algorithms in our future research. ACKNOWLEDGEMENT We are very grateful to the two anonymous reviewers and the Editor-in-Chief for their comments on the article. PEER REVIEW INFORMATION Engineering Reports thanks Giovanna Martinez Arellano and other anonymous reviewer(s) for their contribution to the peer review of this work. CONFLICT OF INTEREST The authors have no potential conflict of interest to declare. PEER REVIEW The peer review history for this article is available at https://publons.com/publon/10.1002/eng2.12291. DATA AVAILABILITY STATEMENT The data that support the findings of this study are openly available in arXiv.org at https://arxiv.org, reference number arXiv:1809.10717. ORCID Kangwon Seo https://orcid.org/0000-0002-2128-4079 REFERENCES 1. Bajpai P. Basic Overview of Pulp and Paper Manufacturing Process . New York, NY: Springer; 2015:11-39. 2. Ranjan C, Mustonen M, Paynabar K, Pourak K. Dataset: rare event classification in multivariate time series; 2018. arXiv preprint arXiv:1809.10717. 3. Montgomery DC. Introduction to Statistical Quality Control . Hoboken, NJ: John Wiley & Sons; 2012. 4. Karim F, Majumdar S, Darabi H, Harford S. Multivariate lstm-fcns for time series classification. Neural Netw . 2019;116:237-245.
Made with FlippingBook - Online catalogs