- Business & Data Research
- Posts
- Customer Behaviour Purchasing Pattern - Random Forest
Customer Behaviour Purchasing Pattern - Random Forest
Customer Purchasing Pattern
Customer Behaviour Purchasing pattern using Random Forest
Customer Behaviour Purchasing Pattern
Explore the reasoning behind buying habits and social trends, and gain insights into frequency patterns and background factors influencing customer decisions. Join the data analytics and data science revolution today!
Problem Statement : Over the years of market researchers have pondered the reasoning behind consumer behaviour, this study is presented based on the pattern behaviour. Marketers were still wondering what is the key determinant was for consumer behaviour based on ages, salary and gender population. Customer behaviour refers to individual buying habits, including social trends, frequency patterns and background factors influencing their decision to buy something.
Required libraries :
library(dplyr)
library(tidyverse)
library(e1071)
library(ROCR)
library(rsample)
library(randomForest)
library(caret)
library(partykit)
Data Set (kaggle) :
Cust.b←read.csv("/Users/maheshg/Library/CloudStorage/OneDrive-Microsoft365/Sample Datasets Kaggle/Customer_Behaviour.csv",
Data Interpretation:
header = TRUE)
head(customer_behavior)
glimpse(customer_behavior)
is.na(customer_behavior)
customer_behavior |>
dplyr::mutate(customer_behavior = replace_na(customer_behavior$Purchased,0))
Data Wrangling:
#Classification of Ages from the dataset:
customer_behavior$Age[customer_behavior$Age <= 25] = "Young"
customer_behavior$Age[customer_behavior$Age > 25 & customer_behavior$Age <= 45] = "Adult"
customer_behavior$Age[(customer_behavior$Age != "Young") & (customer_behavior$Age != "Adult")] = "Middle Age"
#Classification of Age based on the salaries :
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary <= 5000 ] = "Low"
customer_behavior$EstimatedSalary[customer_behavior$EstimatedSalary > 5000 & customer_behavior$EstimatedSalary <= 9000] = "Moderate"
customer_behavior$EstimatedSalary[(customer_behavior$EstimatedSalary != "Low") & (customer_behavior$EstimatedSalary != "Moderate")] = "High"
Data Cleansing:
customer_behavior_clean <- customer_behavior |>
mutate(Gender = as.factor(Gender),
Age = as.factor(Age),
EstimatedSalary = as.factor(EstimatedSalary),
Purchased = as.factor(Purchased)) |>
select(-User.ID)
###Checking Missing Values
colSums(is.na(customer_behavior_clean))
Exploratory Data Analysis (EDA)
###Splitting the value into two - train and test data [Cross Validation]:
RNGkind(sample.kind = "Rounding")
set.seed(1234)
index <- sample(nrow(customer_behavior_clean), nrow(customer_behavior_clean) * 0.8)
customer_behavior_train <- customer_behavior_clean[index,]
customer_behviour_test <- customer_behavior_clean[-index,]
###Checking target proportion:
prop.table(table(customer_behavior_train$Purchased))
###Balancing between train and test dataset:
set.seed(1234)
customer_behavior_train_down <- downSample(x = customer_behavior_train |>
select(-Purchased),
y = customer_behavior_train$Purchased,
yname = "Purchased")
prop.table(table(customer_behavior_train_down$Purchased))###Random Forest ###Model fitting using random forest : #Creation of RDS file : set.seed(1234) ctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3) customer_behavior_model_randomforest <- train(Purchased ~., data = customer_behavior_train, method = "rf", trControl = ctrl) saveRDS(customer_behavior_model_randomforest,"model_rf.RDS") customer_behavior_model_randomforest <- read_rds("model_rf.RDS") ###Inspecting the model: customer_behavior_model_randomforest ###Predicting the model to the dataset customer_behavior_prediction_randomforest <- predict(customer_behavior_model_randomforest, newdata = customer_behviour_test, type = "raw") ###Model Evaluation: confusionMatrix(data= customer_behavior_model_randomforest, reference = customer_behviour_test$Purchased, positive = "1")
Model Evaluation Conclusion (Random Forest):
Overall, business is predicting the customer behavior based on different attributes based on the confidence level and accuracy factor. From the overall deep dive analysis, key attribute should be “Accuracy” using random forest method. Based on the previous models such as Naive Bayes/ Decision Tree and now Random forest, accuracy is 82.9% but there are some deviations in Decision Tree and it might be because of sample data which we took from the data set. Strongly recommend to use the population data or re-sampled data to come to conclusion.