- Business & Data Research
- Posts
- Sample Logistic Regression Model
Sample Logistic Regression Model
Logistic Regression using inbuilt dataset mlbench
Problem Statement : Identify and Predict the diabetes patients using inbuilt dataset using R programming.
Takeaway : This dataset has 75% accuracy and used the GLM model for prediction using single variable & multi variable as the predictor.
Logistic Regression using Single Variable as Predictor
Source Code Details :
install.packages("caret")
library(tidyverse)
install.packages("caret")
library(caret)
install.packages("mlbench")
library(mlbench)
data("PimaIndiansDiabetes2")
View(PimaIndiansDiabetes2)
library(dplyr)
###Remove the na values from the dataset before moving to EDA :
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
###Creating Partitions and defining the samples from the dataset : training_samples <- PimaIndiansDiabetes2$diabetes %>%
createDataPartition(p = 0.8, list = FALSE)
###Segregating training and testing data separately :
trainingdata_samples <- PimaIndiansDiabetes2[training_samples,]
testigdata_sample <- PimaIndiansDiabetes2[-training_samples,]
###Building up the model :
model <- glm(diabetes ~., data = trainingdata_samples, family = binomial())
summary(model)
###Making Predictions 😀
probability <- model %>%
predict(testigdata_sample, type = "response")
predicted.classes <- ifelse(probability > 0.5, "pos","neg")
###Model Accuracy :
mean(predicted.classes == testigdata_sample$diabetes)
model <- glm(diabetes ~ glucose, data = trainingdata_samples, family = binomial)
summary(model) $coef
newdata <- data.frame(glucose = c(20,150))
probability_1 <- model %>% predict(newdata, type = "response")
predicted.classes_1 <- ifelse(probability_1 > 0.5, "pos","neg")
predicted.classes_1
trainingdata_samples %>%
mutate(prob = ifelse(diabetes == "pos", 1,0)) %>%
ggplot(aes(glucose,prob)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "glm", method.args = list(family = "binomial")) +
labs(
title = "Logistic Regression Model (Taking Glucose as single predictor variable)",
x = "Plasma Glucose Concentration",
y = "Probability of being Diabetes"
)