Sampling with R Program

Sampling Statistics using R program

Mahesh Gurumoorthi
August 10, 2024

Sampling with R Program

Throughout the following exercises, you will learn to use R Program to simulate random sampling and make a point estimate of a sampling mean based on your sample data.

Simulate random sampling

You can use R program to simulate taking a random sample of 50 countries from your dataset. To do this, use sample function example : sample(x, size = n, replace = TRUE/ FALSE (depends on the probability) and prob = NULL). The following arguments in the sample() function will help you simulate random sampling:

x: Refers to the dataset
replace: Indicates whether you are sampling with or without replacement
set.seed() is the most important when you declare how many times probability is required

Compute the sample mean

Now that you have your random sample, use the mean function to compute the sample mean. First, name a new variable new_estimate. Next, use mean() to compute the mean for your sample data.

install.packages("dplyr")
install.packages("tidyverse")
library(tidyverse)
library(dplyr)

education_districtwise <- read.csv("/Users/user/Library/user/Sample Datasets Kaggle/Global_Education.csv", header =  TRUE,

view(education_districtwise)
summary(education_districtwise)
str(education_districtwise)

sampled_data <- sample(education_districtwise, size = 10, replace = FALSE, prob = NULL)
sampled_data2 <- sample(education_districtwise, size = 50, replace = TRUE, prob = NULL)

View(sampled_data2)

###Computing the sample mean for Gross Tertiary Education Enrollment : 
education_enrollment_country <- mean(sampled_data$Gross_Tertiary_Education_Enrollment,trim = 0.10, na.rm = FALSE)
print(education_enrollment)

###Computing the another sample mean for Gross Tertiary Education Enrollment with another sample value which is 50 and replacement value as TRUE 
education_enrollment_country2 <- mean(sampled_data2$Gross_Tertiary_Education_Enrollment, trim = 0.10, na.rm = FALSE)
print(education_enrollment_country2)

###Computing the mean of sampling distribution with more samples : 
# education_districtwise_list <- list()
education_districtwise_list <- for (i in 1:10) {
  append(education_districtwise_list, education_enrollment_country2)
}
education_districtwise_df <- data.frame(education_districtwise_list)


education_districtwise <- education_districtwise %>% drop_na()
# is.na(education_districtwise)


###Simulating Random Sampling from the dataset:
set.seed(31208)
new_sample_data <- sample(education_districtwise, size = 50, 
                          replace =  TRUE, prob = NULL)

View(new_sample_data)

###Computing the sample Mean : 
new_estimate1 <- mean(new_sample_data$Gross_Tertiary_Education_Enrollment)
print(new_estimate1)

set.seed(56810)
new_sample_data <- sample(education_districtwise, size = 50, 
                          replace =  TRUE, prob = NULL)
new_estimate2 <- mean(new_sample_data$Gross_Tertiary_Education_Enrollment)
print(new_estimate2)