Business & Data Research
Posts
Customer Behaviour Purchasing pattern using Decision Tree

Customer Behaviour Purchasing pattern using Decision Tree

Customer Purchasing Pattern

Mahesh Gurumoorthi
December 22, 2023

Customer Behaviour Patterns using Decision Tree Algorithm

Customer Behaviour Purchasing Pattern

Explore the reasoning behind buying habits and social trends, and gain insights into frequency patterns and background factors influencing customer decisions. Join the data analytics and data science revolution today!

Problem Statement : Over the years of market researchers have pondered the reasoning behind consumer behaviour, this study is presented based on the pattern behaviour. Marketers were still wondering what is the key determinant was for consumer behaviour based on ages, salary and gender population. Customer behaviour refers to individual buying habits, including social trends, frequency patterns and background factors influencing their decision to buy something.

Required libraries :

library(dplyr)
library(tidyverse)
library(e1071)
library(ROCR)
library(rsample)
library(randomForest)
library(caret)
library(partykit)

Data Set (kaggle) :

Cust.b←read.csv("/Users/maheshg/Library/CloudStorage/OneDrive-Microsoft365/Sample Datasets Kaggle/Customer_Behaviour.csv",

Data Interpretation:
header = TRUE)
head(customer_behavior)
glimpse(customer_behavior)
is.na(customer_behavior)
customer_behavior |>

dplyr::mutate(customer_behavior = replace_na(customer_behavior$Purchased,0))

Data Wrangling:

#Classification of Ages from the dataset:

customer_behavior$Age[customer_behavior$Age <= 25] = "Young"

customer_behavior$Age[customer_behavior$Age > 25 & customer_behavior$Age <= 45] = "Adult"

customer_behavior$Age[(customer_behavior$Age != "Young") & (customer_behavior$Age != "Adult")] = "Middle Age"

#Classification of Age based on the salaries :

customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary <= 5000 ] = "Low"

customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary > 5000 & customer_behavior$EstimatedSalary <= 9000] = "Moderate"

customer_behavior$EstimatedSalary[(customer_behavior$EstimatedSalary != "Low") & (customer_behavior$EstimatedSalary != "Moderate")] = "High"

Data Cleansing:

customer_behavior_clean <- customer_behavior |>

mutate(Gender = as.factor(Gender),

Age = as.factor(Age),

EstimatedSalary = as.factor(EstimatedSalary),

Purchased = as.factor(Purchased)) |>

select(-User.ID)

###Checking Missing Values

colSums(is.na(customer_behavior_clean))

Exploratory Data Analysis (EDA)

###Splitting the value into two - train and test data [Cross Validation]:

RNGkind(sample.kind = "Rounding")

set.seed(1234)

index <- sample(nrow(customer_behavior_clean), nrow(customer_behavior_clean) * 0.8)

customer_behavior_train <- customer_behavior_clean[index,]

customer_behviour_test <- customer_behavior_clean[-index,]

###Checking target proportion:

prop.table(table(customer_behavior_train$Purchased))

###Balancing between train and test dataset:

set.seed(1234)

customer_behavior_train_down <- downSample(x = customer_behavior_train |>

select(-Purchased),

y = customer_behavior_train$Purchased,

yname = "Purchased")

prop.table(table(customer_behavior_train_down$Purchased))

###Decision Tree Model

###Model fitting

customer_behavior_model_decisiontree <- ctree(formula = customer_behavior_train_down$Purchased~.,

data = customer_behavior_train_down)

###Plotting the decision tree:

plot(customer_behavior_model_decisiontree,type = "simple")

###Model Evaluation:

customer_behavior_prediction_decisiontree <- predict(customer_behavior_model_decisiontree,

newdata = customer_behviour_test,type= "response")

###Evaluation of model using confusion matrix :

confusionMatrix(data = customer_behavior_prediction_decisiontree,reference = customer_behviour_test$Purchased)

Model Evaluation Conclusion (Decision Tree):

Higher entry point is captured as Age which means purchasing decision will be based on this factor whether it could be Adult/ Young or middle age

Customer with age (adult / middle) is divided based on the salary details (high or moderate) which means Customer with higher salary likely to purchase the product from the 40 observations available from kaggle dataset whereas moderate salary people likely to purchase the product with less error rate of 9.6% (~10%)

From the decision tree confusion matrix, accuracy level is showing 73% only, this could be based on the sample dataset but if we check the statistics in detail, noticed that 95% confidence level showing 65% will be accurate. Based on the analysis, would like to deep dive into population and then train this model (To be continued…)