Statistical Method : One Sample T-Test using Education Dataset

One Sample T-Test using Education Dataset

One Sample T-Test in Statistics using Education Dataset

Business Case : Identify the completion rate of male based on the birth rate across the globe. Use statistical methods to prove that this dataset has significant difference with normal sample mean.

In Statistics we can define the null and alternative hypothesis as below:

Null Hypothesis:

  1. H0:m=μH0:m=μ

  2. H0:mμH0:m≤μ

  3. H0:mμ

Alternative Hypothesis:

  1. Ha:mμHa:m≠μ (different)

  2. Ha:m>μHa:m>μ (greater)

  3. Ha:m<μHa:m<μ (less)

In this example we will be seeing one tailed t-test with education dataset.

The t-statistic can be calculated as follow:

where,

  • m is the sample mean

  • n is the sample size

  • s is the sample standard deviation with n−1n−1 degrees of freedom

  • μμ is the theoretical value

Step 1 : Install the required packages

install.packages("ggpubr")
library(ggpubr)

Step 2 : R function to compute one sample t-test

To perform one-sample t-test, the R function t.test() can be used as follows:

t.test(x, mu = 0, alternative = "two.sided")

Step 3 : Importing the dataset into R Environment and explore the dataset

education_districtwise <- read.csv("/Users/Library/CloudStorage/Sample Datasets Kaggle/Global_Education.csv", header =  TRUE,
)
head(education_districtwise, n = 10)
view(education_districtwise)
summary(education_districtwise)
str(education_districtwise)
class(education_districtwise)
#Summary of the educationdistrictwise male rate: 
education_districtwise %>% get_summary_stats(education_districtwise$OOSR_Primary_Age_Male,
                                             type = "mean_sd")

summary(my_data$education_districtwise.Completion_Rate_Primary_Male)
###Visualize data using box plots: 
library(ggpubr)
ggboxplot(my_data$education_districtwise.Completion_Rate_Primary_Male,
          ylab = "Completion Rate of Male",xlab = FALSE,
          ggtheme = theme_minimal())

Preliminary Test to check one sample t-test assumptions

We are using Shapiro-wilk normality test and to look at the normality plot
Shapiro-wilk test provides:

1)Null hypothesis: data are normally distributed
2)Alternative Hypothesis: the data are not normally distributed

shapiro.test(my_data$education_districtwise.Completion_Rate_Primary_Male)

Step 4 : Visual Inspection and plot using qqplot via ggpubr package

library(ggpubr)

ggqqplot(my_data$education_districtwise.Completion_Rate_Primary_Male,
         ylab = "Completion Rate for Male",
         ggtheme = theme_minimal())

Conclusion :

Here the p-value is greater than the significance level alpha = 0.05. We can conclude that the mean weight of the male completion rate is different from 41.72 with a p-value = 0.993. Hence accepting the alternative hypothesis testing.