Steps to follow linear regression problem statement

Linear Regression

Steps to follow on Linear Regression Problem statement :

(a) A Simple Line Starting out, generate 10 to 20 data points for values along the x-axis. Then generate data points along the y-axis using the equation yi = β0 + β1xi . Make it a straight line, nothing fancy.

x <- runif(10, 10, 20)
# print(x)
b0 <- 0
b1 <- 1
y <- b0 + b1 * x
print(y)
y <- 2 * x + 3 + rnorm(100, 0, 1)
df <- data.frame(x, y)
print(df)

1. (b) The Error Component That is a perfect set of data points, but that is a problem in itself. In almost any real life situation, when we measure data, there will be some error in those measurements. Recall that our simple linear model is of the form: yi = β0 + β1xi + ϵi , ϵi ∼ N(0, σ2 ) Add an error term to your y-data following the formula above. Plot at least three different plots (using ggplot!) with the different values of σ 2 . How does the value of σ 2 affect the final data points? Type your answer in the Markdown cell below the R cell.

x = c(seq(1:10))
x
error_comp = rnorm(10,0,1)
error_comp
y=3+2*x+error_comp
y
qplot(x,y, main = "3 + 2*x + random error with SD = 1", ylim=c(-1, 30))
x = c(seq(1:10))
x
error_comp = rnorm(10,0,2)
error_comp
y=3+2*x+error_comp
y
qplot(x,y, main = "3 + 2*x + random error with SD = 2", ylim=c(-1, 30))
x = c(seq(1:10))
x
error_comp = rnorm(10,0,5)
error_comp
y=3+2*x+error_comp
y
qplot(x,y, main = "3 + 2*x + random error with SD = 5", ylim=c(-1, 30))


x = c(seq(1:10))
y = 3 + 2*x
error_comp = rnorm(10,0,1)
error_comp
y1=3+2*x+error_comp 
error_comp = rnorm(10,0,2)
error_comp
y2=3+2*x+error_comp
error_comp = rnorm(10,0,5)
error_comp
y5=3+2*x+error_comp
DATAFRAME <- data.frame(
x = x,
y0= y,
y1 = y1,
y2 = y2,
y3 = y5
)
#transpose to long format except year variable
df2 <- pivot_longer(DATAFRAME, cols = paste0("y",0:3))
df2
ggplot(
data = df2,
mapping = aes(x = x, y = value, color = name)
) +
geom_point(aes(size = name)) +
geom_smooth(method = lm) +
labs(
title = "Abline & Dot Plot with biggest for sd of 5 thru smallest sd = 0",
y = "y values",
x= "x values",
lintype = "Legend"
) +
theme_bw()

3 Problem 2: The Effects of Variance on Linear Models Once you’ve completed Problem 1, you should have three different “datasets” from the same underlying data function but with different variances. Let’s see how those variance affect a best fit line. Use the lm() function to fit a best-fit line to each of those three datasets. Add that best fit line to each of the plots and report the slopes of each of these lines. Do the slopes of the best-fit lines change as σ 2 changes? Type your answer in the Markdown cell 8 below the R cell. Tip: The lm() function requires the syntax lm(y~x).

x = c(seq(1:10)) x error_comp = rnorm(10,0,1) error_comp y=3+2*x+error_comp y qplot(x,y, main = "3 + 2*x + random error with SD = 1", ylim=c(-1, 30)) x = c(seq(1:10)) x error_comp = rnorm(10,0,2) error_comp y=3+2*x+error_comp y qplot(x,y, main = "3 + 2*x + random error with SD = 2", ylim=c(-1, 30)) x = c(seq(1:10)) x error_comp = rnorm(10,0,5) error_comp y=3+2*x+error_comp y qplot(x,y, main = "3 + 2*x + random error with SD = 5", ylim=c(-1, 30)) x = c(seq(1:10)) y = 3 + 2*x error_comp = rnorm(10,0,1) error_comp y1=3+2*x+error_comp error_comp = rnorm(10,0,2) error_comp y2=3+2*x+error_comp error_comp = rnorm(10,0,5) error_comp y5=3+2*x+error_comp DATAFRAME <- data.frame( x = x, y0= y, y1 = y1, y2 = y2, y3 = y5 ) #transpose to long format except year variable df2 <- pivot_longer(DATAFRAME, cols = paste0("y",0:3)) df2 ggplot( data = df2, mapping = aes(x = x, y = value, color = name) ) + geom_point(aes(size = name)) + geom_smooth(method = lm) + labs( title = "Abline & Dot Plot with biggest for sd of 5 thru smallest sd = 0", y = "y values", x= "x values", lintype = "Legend" ) + theme_bw()