- Business & Data Research
- Posts
- Customer Behaviour Patterns using Naive Bayes Algorithm
Customer Behaviour Patterns using Naive Bayes Algorithm
Customer Behaviour
Explore the reasoning behind buying habits and social trends, and gain insights into frequency patterns and background factors influencing customer decisions. Join the data analytics and data science revolution today!
Problem Statement : Over the years of market researchers have pondered the reasoning behind consumer behaviour, this study is presented based on the pattern behaviour. Marketers were still wondering what is the key determinant was for consumer behaviour based on ages, salary and gender population. Customer behaviour refers to individual buying habits, including social trends, frequency patterns and background factors influencing their decision to buy something.
Required libraries :
library(dplyr)
library(tidyverse)
library(e1071)
library(ROCR)
library(rsample)
library(randomForest)
library(caret)
library(partykit)
Data Set (kaggle) :
Cust.b←read.csv("/Users/maheshg/Library/CloudStorage/OneDrive-Microsoft365/Sample Datasets Kaggle/Customer_Behaviour.csv",
header = TRUE)
head(customer_behavior)
glimpse(customer_behavior)
is.na(customer_behavior)
customer_behavior |>
dplyr::mutate(customer_behavior = replace_na(customer_behavior$Purchased,0))
###Data Wrangling:
#Classification of Ages from the dataset:
customer_behavior$Age[customer_behavior$Age <= 25] = "Young"
customer_behavior$Age[customer_behavior$Age > 25 & customer_behavior$Age <= 45] = "Adult"
customer_behavior$Age[(customer_behavior$Age != "Young") & (customer_behavior$Age != "Adult")] = "Middle Age"
#Classification of Age based on the salaries :
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary <= 5000 ] = "Low"
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary > 5000 & customer_behavior$EstimatedSalary <= 9000] = "Moderate"
customer_behavior$EstimatedSalary[(customer_behavior$EstimatedSalary != "Low") & (customer_behavior$EstimatedSalary != "Moderate")] = "High"
###Data Cleansing:
customer_behavior_clean <- customer_behavior |>
mutate(Gender = as.factor(Gender),
Age = as.factor(Age),
EstimatedSalary = as.factor(EstimatedSalary),
Purchased = as.factor(Purchased)) |>
select(-User.ID)
###Checking Missing Values
colSums(is.na(customer_behavior_clean))
###Splitting the value into two - train and test data [Cross Validation]:
RNGkind(sample.kind = "Rounding")
set.seed(1234)
index <- sample(nrow(customer_behavior_clean), nrow(customer_behavior_clean) * 0.8)
customer_behavior_train <- customer_behavior_clean[index,]
customer_behviour_test <- customer_behavior_clean[-index,]
###Checking target proportion:
prop.table(table(customer_behavior_train$Purchased))
###Balancing between train and test dataset:
set.seed(1234)
customer_behavior_train_down <- downSample(x = customer_behavior_train |>
select(-Purchased),
y = customer_behavior_train$Purchased,
yname = "Purchased")
prop.table(table(customer_behavior_train_down$Purchased))
###Naive Bayes Model
#Model fitting using Naive Bayes Model :
customer_behaviour_model_nb <- naiveBayes(formula = customer_behavior_train_down$Purchased~., data = customer_behavior_train_down)
customer_behaviour_model_nb
###Model Evaluation/ Prediction
customer_prediction_naive <- predict(customer_behaviour_model_nb, newdata = customer_behviour_test, type = "class")
###Evaluation of model using confusion matrix :
confusionMatrix(data = customer_prediction_naive, reference = customer_behviour_test$Purchased,
positive = "1")
Model Evaluation Conclusion :
Probability of male customer buying is 45% compared with Female 55%
Probability of adult customer buying is 44% compared with middle and young population
Probability of high salary people is 37% compared with moderate which is 64%