- Business & Data Research
- Posts
- Statistical Method : One Sample T-Test using Education Dataset
Statistical Method : One Sample T-Test using Education Dataset
One Sample T-Test using Education Dataset
One Sample T-Test in Statistics using Education Dataset
Business Case : Identify the completion rate of male based on the birth rate across the globe. Use statistical methods to prove that this dataset has significant difference with normal sample mean.
In Statistics we can define the null and alternative hypothesis as below:
Null Hypothesis:
H0:m=μH0:m=μ
H0:m≤μH0:m≤μ
H0:m≥μ
Alternative Hypothesis:
Ha:m≠μHa:m≠μ (different)
Ha:m>μHa:m>μ (greater)
Ha:m<μHa:m<μ (less)
In this example we will be seeing one tailed t-test with education dataset.
The t-statistic can be calculated as follow:
where,
m is the sample mean
n is the sample size
s is the sample standard deviation with n−1n−1 degrees of freedom
μμ is the theoretical value
Step 1 : Install the required packages
install.packages("ggpubr")
library(ggpubr)
Step 2 : R function to compute one sample t-test
To perform one-sample t-test, the R function t.test() can be used as follows:
t.test(x, mu = 0, alternative = "two.sided")
Step 3 : Importing the dataset into R Environment and explore the dataset
education_districtwise <- read.csv("/Users/Library/CloudStorage/Sample Datasets Kaggle/Global_Education.csv", header = TRUE,
)
head(education_districtwise, n = 10)
view(education_districtwise)
summary(education_districtwise)
str(education_districtwise)
class(education_districtwise)
#Summary of the educationdistrictwise male rate:
education_districtwise %>% get_summary_stats(education_districtwise$OOSR_Primary_Age_Male,
type = "mean_sd")
summary(my_data$education_districtwise.Completion_Rate_Primary_Male)
###Visualize data using box plots:
library(ggpubr)
ggboxplot(my_data$education_districtwise.Completion_Rate_Primary_Male,
ylab = "Completion Rate of Male",xlab = FALSE,
ggtheme = theme_minimal())
Preliminary Test to check one sample t-test assumptions
We are using Shapiro-wilk normality test and to look at the normality plot
Shapiro-wilk test provides:
1)Null hypothesis: data are normally distributed
2)Alternative Hypothesis: the data are not normally distributed
shapiro.test(my_data$education_districtwise.Completion_Rate_Primary_Male)
Step 4 : Visual Inspection and plot using qqplot via ggpubr package
library(ggpubr)
ggqqplot(my_data$education_districtwise.Completion_Rate_Primary_Male,
ylab = "Completion Rate for Male",
ggtheme = theme_minimal())
Conclusion :
Here the p-value is greater than the significance level alpha = 0.05. We can conclude that the mean weight of the male completion rate is different from 41.72 with a p-value = 0.993. Hence accepting the alternative hypothesis testing.