Sample Logistic Regression Model

Logistic Regression using inbuilt dataset mlbench

Problem Statement : Identify and Predict the diabetes patients using inbuilt dataset using R programming.

Takeaway : This dataset has 75% accuracy and used the GLM model for prediction using single variable & multi variable as the predictor.

Logistic Regression using Single Variable as Predictor

Source Code Details :

install.packages("caret")
library(tidyverse)
install.packages("caret")
library(caret)
install.packages("mlbench")
library(mlbench)
data("PimaIndiansDiabetes2")
View(PimaIndiansDiabetes2)
library(dplyr)

###Remove the na values from the dataset before moving to EDA : 
PimaIndiansDiabetes2 <-  na.omit(PimaIndiansDiabetes2)

###Creating Partitions and defining the samples from the dataset : training_samples <- PimaIndiansDiabetes2$diabetes %>%
createDataPartition(p = 0.8, list = FALSE)

###Segregating training and testing data separately : 

trainingdata_samples <- PimaIndiansDiabetes2[training_samples,]

testigdata_sample <- PimaIndiansDiabetes2[-training_samples,]

###Building up the model : 

model <- glm(diabetes ~., data = trainingdata_samples, family = binomial())
summary(model)

###Making Predictions 😀 
probability <- model %>%
predict(testigdata_sample, type = "response")
predicted.classes <- ifelse(probability > 0.5, "pos","neg")

###Model Accuracy : 
mean(predicted.classes == testigdata_sample$diabetes)
model <- glm(diabetes ~ glucose, data = trainingdata_samples, family = binomial)
summary(model) $coef

newdata <- data.frame(glucose = c(20,150))
probability_1 <- model %>% predict(newdata, type = "response")
predicted.classes_1 <- ifelse(probability_1 > 0.5, "pos","neg")
predicted.classes_1

trainingdata_samples %>%
mutate(prob = ifelse(diabetes == "pos", 1,0)) %>%
ggplot(aes(glucose,prob)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "glm", method.args = list(family = "binomial")) +

 labs(

    title = "Logistic Regression Model (Taking Glucose as single predictor variable)",

    x = "Plasma Glucose Concentration",

    y = "Probability of being Diabetes"

 )