Calculate the residual for a point using a linear regression model
Describe how OLS determines the line of best fit
Analyze the connection between residuals and r2 values
Interpret a regression model in context, including describing trends and identifying predicted and observed values
Common Core Math Standards
Link to all CCSS Math
CCSS.PRACTICE.MP3
CCSS.PRACTICE.MP4
CCSS.HSS.ID.C.8
CCSS.HSS.ID.C.7
CCSS.HSS.ID.B.6.C
Personal Finance Objectives
Analyze the relationship between college sticker price and student:faculty ratio.
Use regression models to summarize the relationship between college sticker price and another variable, including graduation rate, time, and ACT scores.
National Standards for Personal Financial Education
Earning Income
3a: Evaluate the costs and benefits of investing in additional education or training
DISTRIBUTION & PLANNING
Distribute to students
Student Activity Packet
Application Problems
OBJECTIVES & STANDARDS
Math Objectives
Calculate the residual for a point using a linear regression model
Describe how OLS determines the line of best fit
Analyze the connection between residuals and r2 values
Interpret a regression model in context, including describing trends and identifying predicted and observed values
Common Core Math Standards
Link to all CCSS Math
CCSS.PRACTICE.MP3
CCSS.PRACTICE.MP4
CCSS.HSS.ID.C.8
CCSS.HSS.ID.C.7
CCSS.HSS.ID.B.6.C
Personal Finance Objectives
Analyze the relationship between college sticker price and student:faculty ratio.
Use regression models to summarize the relationship between college sticker price and another variable, including graduation rate, time, and ACT scores.
National Standards for Personal Financial Education
Earning Income
3a: Evaluate the costs and benefits of investing in additional education or training
DISTRIBUTION & PLANNING
Distribute to students
Student Activity Packet
Application Problems
Intro
CONSIDER: Choosing Where to Meet Up
Three friends - Harper, Luke, and Jaxon - are trying to decide the best spot to meet up. Compare the two options based on how far each friend would need to travel to get there.
Cafe Mezze
Harper lives next door and is 0 miles away
Luke lives 15 miles away
Jaxon lives 3 miles away
Rabat Restaurant
Harper lives 6 miles away
Luke lives 5 miles away
Jaxon lives 7 miles away
3 points
3
Question 1
1.
Harper says “Let’s add up the total distance we would collectively need to travel to get to each restaurant.”
What is the sum of the distances the friends will travel to Cafe Mezze?_______
What is the sum of the distances the friends will travel to Rabat Restaurant?_______
Why isn’t Harper’s strategy a helpful representation of the scenario?_______
3 points
3
Question 2
2.
Luke says, “Of course Harper would suggest that, they live right next to Cafe Mezze! Let’s square the distance we each travel, then find the sum.”
Square the distance each friend would travel to Cafe Mezze. What is the sum of those squares_______
Square the distance each friend would travel to Rabat Restaurant. _______ What is the sum of those squares
Using Luke’s strategy, why is one of the sums much larger than the other?_______
1 point
1
Question 3
3.
Which option do you think they should choose? Why?
Learn It 1
Residuals
In the Intro, you compared how far the three friends were from a particular place. You can apply the same concept to a scatterplot with a line of best fit. The vertical distance between a data point and the line is called a residual. Just like the friends chose the restaurant that was best for all of them, we can use residuals to find the line that best fits the data points.
Let’s look at the example scatterplot below, which shows data on 6 colleges that Kelly is interested in. Kelly looked up the acceptance rate and average students grant aid for each college. Use the graph to answer the following questions.
1 point
1
Question 4
4.
What is the average student grant aid for the school that admits 10% of students? Estimate based on the data point.
1 point
1
Question 5
5.
What does Kelly’s line of best fit predict the average student grant aid will be for a school that admits 10% of students? Estimate based on the graph or by using the equation for the line of best fit: ŷ = -0.32x + 45.50.
1 point
1
Question 6
6.
What is the difference between those two values?
The value you just found is the residual. You may also hear it called an error when looking at data representing an entire population. The residuals for a line of best fit always add up to 0.
Practice It
GRAPH: Sticker Price and Student:Faculty Ratio
When choosing a college, one factor you might consider is the student:faculty ratio. This ratio tells you how many faculty members there are compared to the number of students. A lower student: faculty ratio might mean smaller class sizes, more opportunities to talk to professors, and more individualized support
The scatterplot below shows data on sticker price and student:faculty ratio for a representative sample of 27 4-year US colleges.
1 point
1
Question 7
7.
Sketch the line that you think best fits the data below.
1 point
1
Question 8
8.
Based on your line of best fit, what is the residual at x = 60?
1 point
1
Question 9
9.
Describe the correlation between the sticker price and student: faculty ratio.
Consider: is it strong or weak? Positive or negative?
1 point
1
Question 10
10.
Compare your line with your classmate(s). How do you think we can figure out which lines are the BEST possible fit?
Learn It 2
Ordinary Least Squares (OLS) Regression
Residuals are a part of finding the line of best fit. That process is called Ordinary Least Squares (OLS) Regression.
OLS regression figures out which line is the BEST fit by finding the line that minimizes the sum of the squared residuals.
Let’s see how OLS regression works for our data on sticker price and student:faculty ratio. The graph on the left shows the residuals for all of the data points. The graph on the right squares each of those residuals. The best fit line makes the total area of those squares as small as possible.
1 point
1
Question 11
11.
Which data point has the residual with the greatest vertical distance? Approximate the coordinates.
1 point
1
Question 12
12.
The residuals for this model add up to 0.
How is that possible if the line does not perfectly fit all the points?_______
In the Intro, we squared the distances from the restaurant for two reasons: 1) because both options added up to the same sum and 2) because we wanted to account for one person living far away. How can that approach be applied to the line of best fit?_______
1 point
1
Question 13
13.
The equation for this regression line is ŷ = -0.16x + 18. Which of the following statements about the slope is correct?
The slope is -0.16. This means the predicted student:faculty ratio decreases by 0.16 for every additional $1000 in tuition
The slope is -0.16. This means the predicted student:faculty ratio is the tuition multiplied by 0.16.
The slope is 18. This means the predicted student:faculty ratio increases by 18 for every additional $1000 in tuition.
The slope is 18. This means the average student:faculty ratio is 18.
1 point
1
Question 14
14.
The average annual sticker price of a 4-year college has increased by approximately $10,000 since 2000, adjusted for inflation.[2] Do you think that has caused a corresponding decrease in the average student:faculty ratio over the same time period? Why or why not?
Desmos
DESMOS: OLS Regression and College Costs
One way we can measure how well a model fits the data is by looking at the r2 value. r2 is a number between 0 and 1 that tells you how well the model can predict the data points. A high r2 value means the points are close to the line. Let’s explore how to find the line of best fit and see the connection between r2 and residuals. Follow your teacher’s instructions to complete this Desmos activity.
20 points
20
Question 15
15.
Desmos Link: What did you learn from this activity?
APPLICATION: College Tuition, Graduation Rate, and ACT Scores
Level 1
Tuition and Graduation Rate
Ethan is researching the relationship between a college’s sticker price of a college and graduation rate. He finds data for 35 4-year colleges and uses “sticker price” to represent the published annual tuition and fees for an out-of-state student.[1]
Ethan uses linear regression to find the line of best fit. Here are his results:
ŷ = 0.87x + 35.98
r^2 = 0.57
1 point
1
Question 16
16.
Describe the correlation between sticker price and graduation rate. Consider: is the correlation strong, weak, or nonexistent? Is the correlation positive or negative?
3 points
3
Question 17
17.
Answer the subquestions to find the residual for the data point at x = 40.
What is the predicted y value (ŷ) at x = 40, based on Ethan’s line?_______
Approximate based on the data points: what is the actual y value at x = 40?_______
What is residual for that data point?_______
1 point
1
Question 18
18.
What would happen to the r2 value if you removed the outlier at (6, 75). Circle your answer and explain your reasoning.
The r2 value would increase if you removed the outlier
The r2 value would decrease if you removed the outlier
The r2 value would stay the same if you removed the outlier
1 point
1
Question 19
19.
Ethan concludes, “The slope of the line of best fit is 0.87. That means that if a college increases their sticker price by $1000, it will cause their graduation rate to increase by 0.87 percentage points.” Is Ethan correct? Why or why not?
Level 2
The Costs of College Over Time
Kiana is researching how the costs of college have changed over time. She finds data on the average annual cost of attending a 4-year college since 1970, adjusted for inflation. It includes the cost of tuition, required fees, housing, and food.
Kiana uses linear regression to find the line of best fit. Here are her results:
ŷ = 0.42x + 8.11
r^2 = 0.95
1 point
1
Question 20
20.
Based on her linear regression model, what is the predicted annual cost of college in 2030 (60 years after 1970)?
3 points
3
Question 21
21.
Answer the subquestions to find the residual for the data point at x = 40.
What is the predicted y value (ŷ) at x = 40, based on Kiana’s line?_______
Approximate based on the graph: what is the actual y value at x = 40?_______
What is residual for that data point?_______
2 points
2
Question 22
22.
The equation of Kiana’s line is: ŷ = 0.42x + 8.11
What is the slope?_______
What does the slope tell you in this context?_______
Kiana decides to also try an exponential regression model. Here are her results:
ŷ = 9.90⋅(1 + 0.023)x
r^2 = 0.975
1 point
1
Question 23
23.
Based on this exponential model, what is the predicted cost of college in 2030 (after 60 years)? How does that compare to the cost predicted by the linear model in Question 1?
1 point
1
Question 24
24.
Based on the exponential model, how much are college costs increasing each year?
1 point
1
Question 25
25.
Which model do you think best describes how college costs have changed over time? Explain your reasoning.
Level 3
VIDEO: What is a Residual Plot?
We can learn a lot more about our model from residuals. A residual plot is a graph that shows just the residuals from a regression model. Analyzing it can help you notice patterns and determine whether you’ve chosen the best type of function to fit the data. Watch the video and answer the questions.
1 point
1
Question 26
26.
For a point above the line of best fit…
The residual will be positive
The residual will be negative
The residual will be equal to zero
The residual will be equal to the predicted y value
1 point
1
Question 27
27.
If a data point has a residual of -4, that means…
a. The data point is above the line of best fit
b. The data point is below the line of best fit
c. The data point is on the line of best fit
d. The data point is an outlier
1 point
1
Question 28
28.
Here are the residual plots for four different regression models. Based on the residuals, which model do you think is the best fit? Explain your reasoning.
12 points
12
Question 29
29.
DESMOS: College Sticker Price and ACT scores
How does college sticker price relate to students’ ACT scores? In this activity, you will explore real-world data and determine which model fits best. Follow your teacher’s instructions to complete this Desmos activity.