- Business & Data Research
- Posts
- Customer Behaviour Purchasing pattern using Decision Tree
Customer Behaviour Purchasing pattern using Decision Tree
Customer Purchasing Pattern
Customer Behaviour Patterns using Decision Tree Algorithm
Customer Behaviour Purchasing Pattern
Explore the reasoning behind buying habits and social trends, and gain insights into frequency patterns and background factors influencing customer decisions. Join the data analytics and data science revolution today!
Problem Statement : Over the years of market researchers have pondered the reasoning behind consumer behaviour, this study is presented based on the pattern behaviour. Marketers were still wondering what is the key determinant was for consumer behaviour based on ages, salary and gender population. Customer behaviour refers to individual buying habits, including social trends, frequency patterns and background factors influencing their decision to buy something.
Required libraries :
library(dplyr)
library(tidyverse)
library(e1071)
library(ROCR)
library(rsample)
library(randomForest)
library(caret)
library(partykit)
Data Set (kaggle) :
Cust.b←read.csv("/Users/maheshg/Library/CloudStorage/OneDrive-Microsoft365/Sample Datasets Kaggle/Customer_Behaviour.csv",
Data Interpretation:
header = TRUE)
head(customer_behavior)
glimpse(customer_behavior)
is.na(customer_behavior)
customer_behavior |>
dplyr::mutate(customer_behavior = replace_na(customer_behavior$Purchased,0))
Data Wrangling:
#Classification of Ages from the dataset:
customer_behavior$Age[customer_behavior$Age <= 25] = "Young"
customer_behavior$Age[customer_behavior$Age > 25 & customer_behavior$Age <= 45] = "Adult"
customer_behavior$Age[(customer_behavior$Age != "Young") & (customer_behavior$Age != "Adult")] = "Middle Age"
#Classification of Age based on the salaries :
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary <= 5000 ] = "Low"
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary > 5000 & customer_behavior$EstimatedSalary <= 9000] = "Moderate"
customer_behavior$EstimatedSalary[(customer_behavior$EstimatedSalary != "Low") & (customer_behavior$EstimatedSalary != "Moderate")] = "High"
Data Cleansing:
customer_behavior_clean <- customer_behavior |>
mutate(Gender = as.factor(Gender),
Age = as.factor(Age),
EstimatedSalary = as.factor(EstimatedSalary),
Purchased = as.factor(Purchased)) |>
select(-User.ID)
###Checking Missing Values
colSums(is.na(customer_behavior_clean))
Exploratory Data Analysis (EDA)
###Splitting the value into two - train and test data [Cross Validation]:
RNGkind(sample.kind = "Rounding")
set.seed(1234)
index <- sample(nrow(customer_behavior_clean), nrow(customer_behavior_clean) * 0.8)
customer_behavior_train <- customer_behavior_clean[index,]
customer_behviour_test <- customer_behavior_clean[-index,]
###Checking target proportion:
prop.table(table(customer_behavior_train$Purchased))
###Balancing between train and test dataset:
set.seed(1234)
customer_behavior_train_down <- downSample(x = customer_behavior_train |>
select(-Purchased),
y = customer_behavior_train$Purchased,
yname = "Purchased")
prop.table(table(customer_behavior_train_down$Purchased))
###Decision Tree Model
###Model fitting
customer_behavior_model_decisiontree <- ctree(formula = customer_behavior_train_down$Purchased~.,
data = customer_behavior_train_down)
###Plotting the decision tree:
plot(customer_behavior_model_decisiontree,type = "simple")
###Model Evaluation:
customer_behavior_prediction_decisiontree <- predict(customer_behavior_model_decisiontree,
newdata = customer_behviour_test,type= "response")
###Evaluation of model using confusion matrix :
confusionMatrix(data = customer_behavior_prediction_decisiontree,reference = customer_behviour_test$Purchased)

Model Evaluation Conclusion (Decision Tree):
Higher entry point is captured as Age which means purchasing decision will be based on this factor whether it could be Adult/ Young or middle age
Customer with age (adult / middle) is divided based on the salary details (high or moderate) which means Customer with higher salary likely to purchase the product from the 40 observations available from kaggle dataset whereas moderate salary people likely to purchase the product with less error rate of 9.6% (~10%)
From the decision tree confusion matrix, accuracy level is showing 73% only, this could be based on the sample dataset but if we check the statistics in detail, noticed that 95% confidence level showing 65% will be accurate. Based on the analysis, would like to deep dive into population and then train this model (To be continued…)