- Business & Data Research
- Posts
- Time Series Analysis - Forecasting Stock Information
Time Series Analysis - Forecasting Stock Information
Financial Information using Python
About the dataset :
Certainly! It's fascinating to explore datasets that offer insights into the adoption and registration of Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs). This dataset, sourced from the Washington State Department of Licensing (DOL), presents a rich tapestry of information reflecting the growing trends and preferences in sustainable transportation within the state.
Step 1 : Importing Required Libraries and packages
Step 2 : Reading the dataset using Pandas:
Step 3: View the head of the dataset
nike_data.head(n= 10)
Preprocessing: This preprocessing helps to remove the magnitude of the dataset and scale dominated by 1 variable.
Validate whether any of the variable is dominant to another variable and this gives us the confirmation whether to pursue the preprocessing or not?
Step 4: Prep Processing and ensure whether null values exists
nike_data.isnull().sum()
Date 0
Open 0
High 0
Low 0
Close 0
Volume 0
ticker 0
name 0
dtype: int64nike_data.isna().sum()
Date 0
Open 0
High 0
Low 0
Close 0
Volume 0
ticker 0
name 0
dtype: int64Step 5: Feature Engineering: We are extracting the useful features like moving averages, daily and 50-day averages
nike_data['7-Day MA'] = nike_data['Close'].rolling(window=7).mean()nike_data['50-Day MA'] = nike_data['Close'].rolling(window=50).mean()nike_data['Daily Return'] = nike_data['Close'].pct_change()Step 6: Scaling the data, we have used min max scaling the scale prices between 0 and 1
from sklearn.preprocessing import MinMaxScalerscaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(nike_data['Close'].values.reshape(-1,1))Step 7: Check for stationarity based on the dataset and determine whether the data is stationary or not.
from statsmodels.tsa.stattools import adfullerclose_series = nike_data['Close'].dropna()Step 8: Run ADF test
### Run ADF test
def check_stationarity(timeseries):
result = adfuller(timeseries)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
if result[1] <= 0.05:
print("The time series is stationary.")
else:
print("The time series is non-stationary.")check_stationarity(close_series)
ADF Statistic: -0.957955
p-value: 0.768289
The time series is non-stationary.Step 9: Split the train and test dataset using the 80/20 principle
from sklearn.model_selection import train_test_split
def create_dataset(data, time_step=1):
X, Y = [], []
for i in range(len(data)-time_step-1):
a = data[i:(i+time_step), 0]
X.append(a)
Y.append(data[i + time_step, 0])
return np.array(X), np.array(Y)X, Y = create_dataset(scaled_data, time_step=100)
trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.2, random_state=42)Step 10: Visualize the plot
plt.figure(figsize=(14,5))
plt.plot(nike_data['Close'], label='Close Price History')
plt.plot(nike_data['7-Day MA'], label='7-Day Moving Average')
plt.plot(nike_data['50-Day MA'], label='50-Day Moving Average')
plt.title('Nike Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price USD ($)')
plt.legend()
plt.show()
Step 11: Understand the box plot
import math
import matplotlib.pyplot as plt
import seaborn as sns num_columns = 3
num_features = len(nike_data.columns)
num_rows = math.ceil(num_features / num_columns)
fig = plt.figure(figsize=(15, num_rows * 5))
for i, column in enumerate(nike_data.columns):
ax = fig.add_subplot(num_rows, num_columns, i + 1)
sns.boxplot(y=nike_data[column], ax=ax)
ax.set_title(f'Box Plot of {column}')
plt.tight_layout()
plt.show()
Conclusion :
The presence of outliers across most metrics suggests non-stationarity, especially in price and volume.
Moving averages and daily returns offer more stable views, but still reflect underlying volatility.
This dataset is rich for modelling but would benefit from stationarity checks and transformations before applying time series models.