Machines have penetrated almost every sphere of our life: healthcare, manufacturing, transportation, agriculture, logistics and so on. Today, so many processes depend on equipment that a single failure can lead to catastrophic consequences. To avoid or mitigate possible future failures, predictive maintenance was designed. Predictive maintenance is a set of techniques created to predict machine failures and prevent equipment from malfunctioning by performing timely maintenance. We wanted to test our abilities and show off our predictive maintenance skills by attempting to calculate possible future failures of production machines.

## Time Series Forecast

To perform predictive maintenance, first, we need to add sensors to the system that will monitor and collect data about its operations. In our case, we’ll be using vibration sensors (VS). The data used for predictive maintenance is time series data. A time series, in statistical literature, is a series of observations at various moments of time. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

For this experiment, we wanted to use the data from the vibration sensors that measure the vibration magnitude of production machines. Unfortunately, we installed them not that long ago and there wasn’t enough data to make any predictions. So, we decided to mimic data from an existing source - ElectricityLoadDiagrams20112014. It contains electricity consumption data of 370 points/clients. We took one column from this data set and scaled it to our range.

*1.1 Line plot of the generated data*

## Autoregression Models for Time Series Forecasting

An autoregression model assumes that observations at previous time steps are useful to predict value at the next time step. This relationship between variables is called a correlation. When a correlation is calculated between the variable and itself at previous time steps, it’s called an autocorrelation. We made a quick visual check to see if there was an autocorrelation in our time series data set.

*1.2 Autocorrelation plot of generated data*

As you can see, this plot displays the observation at the previous time step (t-1) with the observation at the next time step (t+1) as a scatter plot. According to this plot, we can assume that the fake VS data has a large correlation of observations along a diagonal line of the plot. This clearly shows a relationship or some kind of correlation.

### Checking Stationarity of Time Series

A time series (TS) is said to be stationary if its statistical properties such as mean and variance remain constant over time. This is important because most of the TS models work on the assumption that the TS is stationary. Intuitively, we can say that if a TS has a particular behaviour over time, there is a very high probability that it will keep behaving the same in the future. Also, the theories related to stationary series are more mature and easier to implement than non-stationary series theories.

The definition of stationarity has very distinctive criteria, but for practical purposes, let’s assume that a series is stationary if it has the following statistical properties:

- constant mean

- constant variance

- autocovariance that does not depend on time

So, to check the stationarity, we’ll be using the rolling statistics plots along with the Dickey-Fuller test results.

*1.3 Rolling mean and standard deviation plots*

*1.4 Results of the Dickey-Fuller test*

As you can see on the plot, the data has a strong seasonal component. We can neutralize this component and make the data stationary by taking away the seasonal difference. That is, we can subtract the observations of a certain previous time step that occurred an assumed cyclical period ago from the observations of the present time step. This mostly works well for improving stationarity. We can use decomposing to determine this period. In this approach, both trend and seasonality are modeled separately, and the remaining part of the series is returned.

*1.5 Decomposition plots*

Take a closer look at the trend plot. Let’s assume the period value is 100 time steps.

*1.6 The scaled-up trend plot from the decomposition plots*

After differencing, the plot of our data will look like this:

*1.7 Line plot after differencing data*

It seems like this has considerably reduced the trend. Let’s verify it using the rolling mean and standard deviations plots:

*1.8 Rolling mean & standard deviation plots after differencing*

*1.9 Results of the Dickey-Fuller test after differencing*

The rolling values of the fake VS data appear to be varying slightly, but there’s no specific trend. Also, the test statistic is smaller than the 5% critical value, so we are 95% sure that this is a stationary series.

###### Autocorrelation and Partial Autocorrelation plots

Now, let’s make predictions with Auto-Regressive Integrated Moving Averages (ARIMA). The ARIMA forecasting for a stationary time series is just a linear (like a linear regression) equation. The predictors depend on the parameters (p,d,q) of the ARIMA model. We can use the autocorrelation function and the partial autocorrelation function plots to determine these numbers for the first time. After that, we’ll use the grid search method to define the optimal hyperparameters for tuning the ARIMA model.

*1.10 Autocorrelation and Partial Autocorrelation plots*

Let’s determine the values of p and q.

**p**is the lag value where the PACF chart crosses the upper confidence interval for the first time; in this case,**p=10**

**q**is the lag value where the ACF chart crosses the upper confidence interval for the first time; in this case**q=2**

Before fitting the model, the data will be divided into **train_ts** and **test_ts.test_ts** won’t be involved in model fitting; we’ll use it only to measure the accuracy of predictions.

To build the model, we’ll be using Keras, a Python library that provides classes and functions for estimating many different statistical models. So, let’s fit the **arima_model** from **statsmodels.tsa** with the following parameters: **p=10, q=2, d=4**, and differenced data from **train_ts**.

Here is the summary of model fitting:

*1.11 ARIMA Model Results*

Now, we can make multistep predictions with the **predict()** function. Let's compare the existing data with the predicted values after inverse differencing:

*1.12*

Take a close look and compare the predicted data and measure the accuracy of the prediction:

*1.13*

Here’s the accuracy of predictions:

- Mean Absolute Percentage Error (MAPE):
**14.149%**

- Root Mean Square Error (RMSE):
**81.381**

### Long Short-Term Memory Recurrent Network

Another way to make a multistep time series forecast is to use the long short-term memory recurrent network (LSTM). For this, we will be using the LSTM model in Keras. We’ll use the same fake VS data for the next prediction, but slightly transformed. One transformation will be differencing, like in the example above; another will transform time series into supervised learning.

The LSTM model in Keras assumes that the input data is divided into the input (X) and the output (Y) components. For our time series problem, we can do this by using the observation from the last time step (t-1) as the input and the observation from the current time step (t) as the output. After this transformation, we get a 2D matrix of values:

*1.14*

Like other neural networks, LSTMs expect data to be within the scale of the activation function used by the network. The default activation function for LSTMs is the hyperbolic tangent (tanh) that outputs values between -1 and 1. This is the preferred range for the time series data. So, our next step is transforming the data set to the range [-1, 1] using the MinMaxScaler class. For example:

*1.15*

We need this transformation because by default the LSTM layer in Keras maintains the state between data within one batch and expects the input data to be a matrix with dimensions [samples, time steps, features]. So, in the future, we will be able to reshape a 2D matrix into a 3D matrix by adding a fixed time step.

Now, we’ll implement multistep prediction using a recursive strategy: to predict the next step, we’ll use the previous predicted step. So, after making the prediction, we’ll need to change the format of the data to the primary format first and then measure the accuracy of predictions.

*1.16 Plot of predicted and test data*

Based on this, we get the following RMSE:

t+1 RMSE: 26.204599

t+2 RMSE: 32.382841

t+3 RMSE: 37.521856

t+4 RMSE: 39.638621

t+5 RMSE: 43.440611

### Summary

As you can see from the plots and the accuracy of our prediction results, they are pretty close to the real data. Also, the LSTM RNN model shows better results than the ARIMA model with RMSE of 43.44 as opposed to 81.381. Another benefit of using the long short-term memory network is that there is no need to determine the trend size. These predictions were made for 200 time steps.

This all proves that had our experiment been conducted in real-life conditions, it would allow us to monitor the state of machines and prevent malfunctions before they happen. Knowing when a piece of equipment may go out of order or will need maintenance is valuable for any industry that uses machinery. When done right, predictable maintenance saves valuable time and money, allows you to better plan maintenance work, increases equipment life span and decreases accidents.