Business & Data Research
Posts
Customer Behaviour Purchasing Pattern - Random Forest

Customer Behaviour Purchasing Pattern - Random Forest

Customer Purchasing Pattern

Mahesh Gurumoorthi
December 23, 2023

Customer Behaviour Purchasing pattern using Random Forest

Customer Behaviour Purchasing Pattern

Explore the reasoning behind buying habits and social trends, and gain insights into frequency patterns and background factors influencing customer decisions. Join the data analytics and data science revolution today!

Problem Statement : Over the years of market researchers have pondered the reasoning behind consumer behaviour, this study is presented based on the pattern behaviour. Marketers were still wondering what is the key determinant was for consumer behaviour based on ages, salary and gender population. Customer behaviour refers to individual buying habits, including social trends, frequency patterns and background factors influencing their decision to buy something.

Required libraries :

library(dplyr)
library(tidyverse)
library(e1071)
library(ROCR)
library(rsample)
library(randomForest)
library(caret)
library(partykit)

Data Set (kaggle) :

Cust.b←read.csv("/Users/maheshg/Library/CloudStorage/OneDrive-Microsoft365/Sample Datasets Kaggle/Customer_Behaviour.csv",

Data Interpretation:
header = TRUE)
head(customer_behavior)
glimpse(customer_behavior)
is.na(customer_behavior)
customer_behavior |>

dplyr::mutate(customer_behavior = replace_na(customer_behavior$Purchased,0))

Data Wrangling:

#Classification of Ages from the dataset:

customer_behavior$Age[customer_behavior$Age <= 25] = "Young"

customer_behavior$Age[customer_behavior$Age > 25 & customer_behavior$Age <= 45] = "Adult"

customer_behavior$Age[(customer_behavior$Age != "Young") & (customer_behavior$Age != "Adult")] = "Middle Age"

#Classification of Age based on the salaries :

customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary <= 5000 ] = "Low"

customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary > 5000 & customer_behavior$EstimatedSalary <= 9000] = "Moderate"

customer_behavior$EstimatedSalary[(customer_behavior$EstimatedSalary != "Low") & (customer_behavior$EstimatedSalary != "Moderate")] = "High"

Data Cleansing:

customer_behavior_clean <- customer_behavior |>

mutate(Gender = as.factor(Gender),

Age = as.factor(Age),

EstimatedSalary = as.factor(EstimatedSalary),

Purchased = as.factor(Purchased)) |>

select(-User.ID)

###Checking Missing Values

colSums(is.na(customer_behavior_clean))

Exploratory Data Analysis (EDA)

###Splitting the value into two - train and test data [Cross Validation]:

RNGkind(sample.kind = "Rounding")

set.seed(1234)

index <- sample(nrow(customer_behavior_clean), nrow(customer_behavior_clean) * 0.8)

customer_behavior_train <- customer_behavior_clean[index,]

customer_behviour_test <- customer_behavior_clean[-index,]

###Checking target proportion:

prop.table(table(customer_behavior_train$Purchased))

###Balancing between train and test dataset:

set.seed(1234)

customer_behavior_train_down <- downSample(x = customer_behavior_train |>

select(-Purchased),

y = customer_behavior_train$Purchased,

yname = "Purchased")

prop.table(table(customer_behavior_train_down$Purchased))###Random Forest ###Model fitting using random forest : #Creation of RDS file : set.seed(1234) ctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3) customer_behavior_model_randomforest <- train(Purchased ~., data = customer_behavior_train, method = "rf", trControl = ctrl) saveRDS(customer_behavior_model_randomforest,"model_rf.RDS") customer_behavior_model_randomforest <- read_rds("model_rf.RDS") ###Inspecting the model: customer_behavior_model_randomforest ###Predicting the model to the dataset customer_behavior_prediction_randomforest <- predict(customer_behavior_model_randomforest, newdata = customer_behviour_test, type = "raw") ###Model Evaluation: confusionMatrix(data= customer_behavior_model_randomforest, reference = customer_behviour_test$Purchased, positive = "1")

Model Evaluation Conclusion (Random Forest):

Overall, business is predicting the customer behavior based on different attributes based on the confidence level and accuracy factor. From the overall deep dive analysis, key attribute should be “Accuracy” using random forest method. Based on the previous models such as Naive Bayes/ Decision Tree and now Random forest, accuracy is 82.9% but there are some deviations in Decision Tree and it might be because of sample data which we took from the data set. Strongly recommend to use the population data or re-sampled data to come to conclusion.