Time Series Analysis - Forecasting Stock Information

Financial Information using Python

About the dataset :

Certainly! It's fascinating to explore datasets that offer insights into the adoption and registration of Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs). This dataset, sourced from the Washington State Department of Licensing (DOL), presents a rich tapestry of information reflecting the growing trends and preferences in sustainable transportation within the state.

Step 1 : Importing Required Libraries and packages

import pandas as pd import numpy as np import matplotlib.pyplot as plt
from statsmodels.tsa.arima_model import ARIMA

Step 2 : Reading the dataset using Pandas:

nike_data = pd.read_csv('/Users/Sample Datasets Kaggle/Nike_historical_data.csv')

Step 3: View the head of the dataset

nike_data.head(n= 10)

Preprocessing: This preprocessing helps to remove the magnitude of the dataset and scale dominated by 1 variable.

Validate whether any of the variable is dominant to another variable and this gives us the confirmation whether to pursue the preprocessing or not?

Step 4: Prep Processing and ensure whether null values exists

nike_data.isnull().sum()

Date      0
Open      0
High      0
Low       0
Close     0
Volume    0
ticker    0
name      0
dtype: int64
nike_data.isna().sum()
Date      0
Open      0
High      0
Low       0
Close     0
Volume    0
ticker    0
name      0
dtype: int64

Step 5: Feature Engineering: We are extracting the useful features like moving averages, daily and 50-day averages

nike_data['7-Day MA'] = nike_data['Close'].rolling(window=7).mean()
nike_data['50-Day MA'] = nike_data['Close'].rolling(window=50).mean()
nike_data['Daily Return'] = nike_data['Close'].pct_change()

Step 6: Scaling the data, we have used min max scaling the scale prices between 0 and 1

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(nike_data['Close'].values.reshape(-1,1))

Step 7: Check for stationarity based on the dataset and determine whether the data is stationary or not.

from statsmodels.tsa.stattools import adfuller
close_series = nike_data['Close'].dropna()

Step 8: Run ADF test

### Run ADF test
def check_stationarity(timeseries):
    result = adfuller(timeseries)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    if result[1] <= 0.05:
        print("The time series is stationary.")
    else:
        print("The time series is non-stationary.")
check_stationarity(close_series)
ADF Statistic: -0.957955
p-value: 0.768289
The time series is non-stationary.

Step 9: Split the train and test dataset using the 80/20 principle

from sklearn.model_selection import train_test_split

def create_dataset(data, time_step=1):
    X, Y = [], []
    for i in range(len(data)-time_step-1):
        a = data[i:(i+time_step), 0]
        X.append(a)
        Y.append(data[i + time_step, 0])
    return np.array(X), np.array(Y)
X, Y = create_dataset(scaled_data, time_step=100)
trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.2, random_state=42)

Step 10: Visualize the plot


plt.figure(figsize=(14,5))
plt.plot(nike_data['Close'], label='Close Price History')
plt.plot(nike_data['7-Day MA'], label='7-Day Moving Average')
plt.plot(nike_data['50-Day MA'], label='50-Day Moving Average')
plt.title('Nike Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price USD ($)')
plt.legend()
plt.show()

Step 11: Understand the box plot

import math
import matplotlib.pyplot as plt
import seaborn as sns  
num_columns = 3
num_features = len(nike_data.columns)
num_rows = math.ceil(num_features / num_columns)

fig = plt.figure(figsize=(15, num_rows * 5))
for i, column in enumerate(nike_data.columns):
    ax = fig.add_subplot(num_rows, num_columns, i + 1)
    sns.boxplot(y=nike_data[column], ax=ax)
    ax.set_title(f'Box Plot of {column}')
plt.tight_layout()
plt.show()

Conclusion :

  • The presence of outliers across most metrics suggests non-stationarity, especially in price and volume.

  • Moving averages and daily returns offer more stable views, but still reflect underlying volatility.

  • This dataset is rich for modelling but would benefit from stationarity checks and transformations before applying time series models.