Tesla stock price prediction using stacked LSTMs

7 min readApr 22, 2021

Introduction

Recurrent neural networks (RNN) differ from traditional neural networks as they keep information about past inputs hence using context to make predictions. However RNN faces vanishing gradient problem and fails to learn if we have timesteps more than 5–10 i.e as the number of inputs whose context RNN remembers grows, RNN becomes unable to learn to connect the information.

Long Short term memory networks (LSTM) are a kind of Recurrent neural network (RNN) that remembers information for a long period of time hence avoiding the long term dependency problem faced by a traditional RNN. LSTM networks have the ability to deal with both vanishing and exploding gradient problems.

For more in-depth info on RNN and LSTMs please refer to these two websites.

A Gentle Introduction to Long Short-Term Memory Networks

Understanding LSTM Networks

Dataset

The data used here is the Tesla stock price from 2016–2021. This data can be found on yahoo finance. The main aim is to predict the closing price for each day. Our model will be trained on the stock data from 2016 to 2019 and the model will be used to predict the prices from 2020 to 2021 which amounts to around 75% data for training and 25% data for testing.

df = pd.read_csv('TSLA.csv')
df

Tesla stock price data — “Close” will be used for forecasting

Plotting the closing price for the stock would yield the following graph. The training data is shown in blue colour and the testing data is shown in green colour.

plt.figure(figsize=(26,12))
plt.plot(df.Close[:942],color='blue')
plt.plot(df.Close[942:1258],color='green')
plt.gca().xaxis.set_major_locator(years)
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.legend(loc='upper left')
plt.show()

Tesla stock closing price data segregated into training and testing

Feature Creation

Stationarity

A stationary time series is the one whose properties are independent of the time in which the series is observed. Statistical properties such as variance and correlation are constant over time for a stationary time series. Hence the time series which is affected by trend and seasonality is not stationary the above graph of the tesla stock closing price data is not stationary.

Differencing

The process of making a non-stationary time series stationary by computing the difference between consecutive observations is known as differencing. It reduces trend and seasonality by stabilizing the mean of a time series. Further transformations using logarithms stabilize the variance of a time series.

Doing the differencing by using the pct_change() method. Consider the second row from the table below. The method does the following calculation for the second row in order to get the per cent change. 50.776001/50.902000–1 = -0.0024753251345722704

df = df[['Close']]
df['returns'] = df.Close.pct_change()
df.head()

df['log_returns'] = np.log(1+df['returns'])
df.head()

Visualizing our stationary time series

plt.figure(figsize=(26,12))
plt.plot(df.log_returns[:942],color='blue')
plt.plot(df.log_returns[942:1258],color='green')
plt.gca().xaxis.set_major_locator(years)
plt.xlabel('Date')
plt.ylabel('Log returns')
plt.show()

Closing stock price stationary time series

Data Preprocessing

Dropping null values(first row) and getting closing price and our newly created log_returns column values.

df.dropna(inplace=True)
X = df[['Close','log_returns']].values

Normalization of Series Data

Normalization is the process of rescaling the data from the original range so that all the values are between 0 and 1. Scikit-learn object MinMaxScaler can be used to normalize the dataset. This involves fitting the scaler on the dataset in order to estimate the minimum and the maximum values. scaling the dataset using the transform() function and then applying the same scale wherever required.

scalar = MinMaxScaler(feature_range=(0,1)).fit(X)
X_scaled = scalar.transform(X)
X_scaled[:5]

y = [x[0] for x in X_scaled]
y

Splitting our dataset into 75% and 25% for training and testing respectively. which gives us 942 rows in the training set and 315 rows in the testing set.

split = int(len(X_scaled)*0.75)
X_train = X_scaled[:split]
X_test = X_scaled[split:len(X_scaled)]
Y_train = y[:split]
Y_test = y[split:len(y)]print(len(X_train))   # 942 observations
print(len(X_test))    # 315 observations

Preparing Sequence Data

We are going to consider the first 15 days closing price of tesla stock in order to predict the closing price of the 16th day. For predicting the closing price of the 17th day we consider 2–16 days closing price and so on.

no_of_days = 15 # Looking 15 days in the past
Xtrain,Xtest,Ytrain,Ytest = [],[],[],[]for i in range(no_of_days,len(X_train)):
    Xtrain.append(X_train[i-no_of_days:i,:X_train.shape[1]])
    Ytrain.append(Y_train[i])  # predicting next record
    
for i in range(n,len(X_test)):
    Xtest.append(X_test[i-no_of_days:i,:X_test.shape[1]])
    Ytest.append(Y_test[i])  # Predicting next record

Checking the first sequence we get the following 15 days closing price

Xtrain[0]

Ytrain[0]

16th-day closing price

LSTM expect the input shape to be of the form (Number of observations, TimeSteps, Number of features). hence we need to reshape our data to this form.

Xtrain,Ytrain = (np.array(Xtrain),np.array(Ytrain))
Xtest,Ytest = (np.array(Xtest),np.array(Ytest))Xtrain = np.reshape(Xtrain,(Xtrain.shape[0],Xtrain.shape[1],Xtrain.shape[2]))
Xtest = np.reshape(Xtest,(Xtest.shape[0],Xtest.shape[1],Xtest.shape[2]))print(Xtrain.shape)
print(Xtest.shape)
print(Ytrain.shape)
print(Ytest.shape)

Input shapes of the training and test set

Hence from the above output, The training set has 927 observations with a timestep value of 15 and a total of 2 features. The test set has 300 observations with a timestep value of 15 and a total of 2 features.

Model Creation

Stacked LSTM are multiple recurrent hidden states stacked on top of each other which allows the hidden state at each level to operate at different timescale.

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense,LSTM,Dropout

Our model will have 3 LSTM layers with 50,100 and 150 units (dimensionality of the outer space) respectively along with a dropout layer after each LSTM layer to reduce overfitting. The loss function used is mean_squared_error and adam optimizer is used for the optimization purpose(reducing loss). All the above specifications are arrived at after hyperparameter tuning.

model = Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(Xtrain.shape[1],Xtrain.shape[2])))
model.add(Dropout(0.2))
          
model.add(LSTM(100,return_sequences=True,input_shape=(Xtrain.shape[1],Xtrain.shape[2])))
model.add(Dropout(0.4))
          
model.add(LSTM(150,input_shape=(Xtrain.shape[1],Xtrain.shape[2])))
model.add(Dropout(0.5))
          
model.add(Dense(1))
model.compile(loss='mse',optimizer='adam')
model.summary()

Training our model for 100 epochs with a batch size of 32.

model.fit(Xtrain,Ytrain,epochs=100,validation_data=(Xtest,Ytest),batch_size=32)

Making Predictions

Our model is used to make predictions for both the training set and the test set. However the predicted values will be normalized and we need to convert it back to the original form in order to plot it on the graph and also to compare with our dataset.

trainPredict = model.predict(Xtrain)
testPredict = model.predict(Xtest)trainPredict.shape

The shape of our predicted values

In order to convert these normalized values back to their original form, we will use the inverse_transform method. However our scaler was fitted on input with a shape of (,2) i.e having 2 features but our predicted values is of the shape (,1). Hence we could add a temporary column filled with zeros in order to use the method inverse_transform.

#Adding temporary zero filled column
trainPredict = np.c_[trainPredict,np.zeros(trainPredict.shape)]
testPredict = np.c_[testPredict,np.zeros(testPredict.shape)]#Converting the normalized predicted values back to original form
trainPredict = scalar.inverse_transform(trainPredict)
trainPredict = [x[0] for x in trainPredict]#Converting the normalized predicted values back to original form
testPredict = scalar.inverse_transform(testPredict)
testPredict = [x[0] for x in testPredict]#Adding temporary zero filled column
Ytrain = np.c_[Ytrain,np.zeros(Ytrain.shape)]
Ytest = np.c_[Ytest,np.zeros(Ytest.shape)]#Converting the normalized existing values back to original form
Ytrain = scalar.inverse_transform(Ytrain)
Ytrain =  [x[0] for x in Ytrain]#Converting the normalized existing values back to original form
Ytest = scalar.inverse_transform(Ytest)
Ytest =  [x[0] for x in Ytest]

Model Evaluation

Plotting our predictions for the training set

Plotting the predicted closing price along with the actual closing price for the stock would yield the following graph. The actual data is shown in red colour and the predicted data is shown in blue colour.

plt.figure(figsize=(26,12))
plt.plot(trainPredict,color='blue')
plt.plot(Ytrain,color='red')
plt.gca().xaxis.set_major_locator(years)
plt.xlabel('Days')
plt.ylabel('Stock Price')
plt.show()

Our model seems to fit well on the training set however as this data was exposed to the model while training making predictions for it and comparing doesn't guarantee how accurate our model would behave on unseen data.

Plotting our predictions for the testing set

The actual data is shown in red colour and the predicted data is shown in blue colour.

plt.figure(figsize=(26,12))
plt.plot(testPredict,color='blue')
plt.plot(Ytest,color='red')
plt.gca().xaxis.set_major_locator(months)
plt.xlabel('Days')
plt.ylabel('Stock Price')
plt.show()

Hence our model seems to be performing well on the test data. Further improvements can be done by changing the hyperparameters, adding regularization and dropout layers, increasing the depth of the network.

Resources

Understanding LSTM

Machine Learning Mastery

Deep Learning with Python

Keras Documentation