- Business & Data Research
- Posts
- Sampling with R Program
Sampling with R Program
Sampling Statistics using R program
Sampling with R Program
Throughout the following exercises, you will learn to use R Program to simulate random sampling and make a point estimate of a sampling mean based on your sample data.
Simulate random sampling
You can use R program to simulate taking a random sample of 50 countries from your dataset. To do this, use sample function example : sample(x, size = n, replace = TRUE/ FALSE (depends on the probability) and prob = NULL). The following arguments in the sample()
function will help you simulate random sampling:
x
: Refers to the datasetreplace
: Indicates whether you are sampling with or without replacementset.seed()
is the most important when you declare how many times probability is required
Compute the sample mean
Now that you have your random sample, use the mean function to compute the sample mean. First, name a new variable new_estimate
. Next, use mean()
to compute the mean for your sample data.
install.packages("dplyr")
install.packages("tidyverse")
library(tidyverse)
library(dplyr)
education_districtwise <- read.csv("/Users/user/Library/user/Sample Datasets Kaggle/Global_Education.csv", header = TRUE,
view(education_districtwise)
summary(education_districtwise)
str(education_districtwise)
sampled_data <- sample(education_districtwise, size = 10, replace = FALSE, prob = NULL)
sampled_data2 <- sample(education_districtwise, size = 50, replace = TRUE, prob = NULL)
View(sampled_data2)
###Computing the sample mean for Gross Tertiary Education Enrollment :
education_enrollment_country <- mean(sampled_data$Gross_Tertiary_Education_Enrollment,trim = 0.10, na.rm = FALSE)
print(education_enrollment)
###Computing the another sample mean for Gross Tertiary Education Enrollment with another sample value which is 50 and replacement value as TRUE
education_enrollment_country2 <- mean(sampled_data2$Gross_Tertiary_Education_Enrollment, trim = 0.10, na.rm = FALSE)
print(education_enrollment_country2)
###Computing the mean of sampling distribution with more samples :
# education_districtwise_list <- list()
education_districtwise_list <- for (i in 1:10) {
append(education_districtwise_list, education_enrollment_country2)
}
education_districtwise_df <- data.frame(education_districtwise_list)
education_districtwise <- education_districtwise %>% drop_na()
# is.na(education_districtwise)
###Simulating Random Sampling from the dataset:
set.seed(31208)
new_sample_data <- sample(education_districtwise, size = 50,
replace = TRUE, prob = NULL)
View(new_sample_data)
###Computing the sample Mean :
new_estimate1 <- mean(new_sample_data$Gross_Tertiary_Education_Enrollment)
print(new_estimate1)
set.seed(56810)
new_sample_data <- sample(education_districtwise, size = 50,
replace = TRUE, prob = NULL)
new_estimate2 <- mean(new_sample_data$Gross_Tertiary_Education_Enrollment)
print(new_estimate2)