Business & Data Research
Posts
Time Series Analysis - Forecasting Stock Information

Time Series Analysis - Forecasting Stock Information

Financial Information using Python

Mahesh Gurumoorthi
December 07, 2025

About the dataset :

Certainly! It's fascinating to explore datasets that offer insights into the adoption and registration of Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs). This dataset, sourced from the Washington State Department of Licensing (DOL), presents a rich tapestry of information reflecting the growing trends and preferences in sustainable transportation within the state.

Step 1 : Importing Required Libraries and packages

import pandas as pd import numpy as np import matplotlib.pyplot as plt

from statsmodels.tsa.arima_model import ARIMA

Step 2 : Reading the dataset using Pandas:

nike_data = pd.read_csv('/Users/Sample Datasets Kaggle/Nike_historical_data.csv')

Step 3: View the head of the dataset

nike_data.head(n= 10)

Preprocessing: This preprocessing helps to remove the magnitude of the dataset and scale dominated by 1 variable.

Validate whether any of the variable is dominant to another variable and this gives us the confirmation whether to pursue the preprocessing or not?

Step 4: Prep Processing and ensure whether null values exists

nike_data.isnull().sum()

Date      0
Open      0
High      0
Low       0
Close     0
Volume    0
ticker    0
name      0
dtype: int64

nike_data.isna().sum()
Date      0
Open      0
High      0
Low       0
Close     0
Volume    0
ticker    0
name      0
dtype: int64

Step 5: Feature Engineering: We are extracting the useful features like moving averages, daily and 50-day averages

nike_data['7-Day MA'] = nike_data['Close'].rolling(window=7).mean()

nike_data['50-Day MA'] = nike_data['Close'].rolling(window=50).mean()

nike_data['Daily Return'] = nike_data['Close'].pct_change()

Step 6: Scaling the data, we have used min max scaling the scale prices between 0 and 1

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(nike_data['Close'].values.reshape(-1,1))

Step 7: Check for stationarity based on the dataset and determine whether the data is stationary or not.

from statsmodels.tsa.stattools import adfuller

close_series = nike_data['Close'].dropna()

Step 8: Run ADF test

### Run ADF test
def check_stationarity(timeseries):
    result = adfuller(timeseries)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    if result[1] <= 0.05:
        print("The time series is stationary.")
    else:
        print("The time series is non-stationary.")

check_stationarity(close_series)
ADF Statistic: -0.957955
p-value: 0.768289
The time series is non-stationary.

Step 9: Split the train and test dataset using the 80/20 principle

from sklearn.model_selection import train_test_split

def create_dataset(data, time_step=1):
    X, Y = [], []
    for i in range(len(data)-time_step-1):
        a = data[i:(i+time_step), 0]
        X.append(a)
        Y.append(data[i + time_step, 0])
    return np.array(X), np.array(Y)

X, Y = create_dataset(scaled_data, time_step=100)
trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.2, random_state=42)

Step 10: Visualize the plot


plt.figure(figsize=(14,5))
plt.plot(nike_data['Close'], label='Close Price History')
plt.plot(nike_data['7-Day MA'], label='7-Day Moving Average')
plt.plot(nike_data['50-Day MA'], label='50-Day Moving Average')
plt.title('Nike Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price USD ($)')
plt.legend()
plt.show()

Step 11: Understand the box plot

import math
import matplotlib.pyplot as plt
import seaborn as sns

num_columns = 3
num_features = len(nike_data.columns)
num_rows = math.ceil(num_features / num_columns)

fig = plt.figure(figsize=(15, num_rows * 5))
for i, column in enumerate(nike_data.columns):
    ax = fig.add_subplot(num_rows, num_columns, i + 1)
    sns.boxplot(y=nike_data[column], ax=ax)
    ax.set_title(f'Box Plot of {column}')
plt.tight_layout()
plt.show()

Conclusion :

The presence of outliers across most metrics suggests non-stationarity, especially in price and volume.
Moving averages and daily returns offer more stable views, but still reflect underlying volatility.
This dataset is rich for modelling but would benefit from stationarity checks and transformations before applying time series models.