6/2 FA 9.6 Interperting Regression Models

Last updated over 2 years ago
29 questions
Note from the author:

OBJECTIVES & STANDARDS

Math Objectives
  • Calculate the residual for a point using a linear regression model
  • Describe how OLS determines the line of best fit
  • Analyze the connection between residuals and r2 values
  • Interpret a regression model in context, including describing trends and identifying predicted and observed values
Common Core Math Standards
  • Link to all CCSS Math
  • CCSS.PRACTICE.MP3
  • CCSS.PRACTICE.MP4
  • CCSS.HSS.ID.C.8
  • CCSS.HSS.ID.C.7
  • CCSS.HSS.ID.B.6.C
Personal Finance Objectives
  • Analyze the relationship between college sticker price and student:faculty ratio.
  • Use regression models to summarize the relationship between college sticker price and another variable, including graduation rate, time, and ACT scores.
National Standards for Personal Financial Education
Earning Income
  • 3a: Evaluate the costs and benefits of investing in additional education or training

DISTRIBUTION & PLANNING

Distribute to students
  • Student Activity Packet
  • Application Problems

OBJECTIVES & STANDARDS

Math Objectives
  • Calculate the residual for a point using a linear regression model
  • Describe how OLS determines the line of best fit
  • Analyze the connection between residuals and r2 values
  • Interpret a regression model in context, including describing trends and identifying predicted and observed values
Common Core Math Standards
  • Link to all CCSS Math
  • CCSS.PRACTICE.MP3
  • CCSS.PRACTICE.MP4
  • CCSS.HSS.ID.C.8
  • CCSS.HSS.ID.C.7
  • CCSS.HSS.ID.B.6.C
Personal Finance Objectives
  • Analyze the relationship between college sticker price and student:faculty ratio.
  • Use regression models to summarize the relationship between college sticker price and another variable, including graduation rate, time, and ACT scores.
National Standards for Personal Financial Education
Earning Income
  • 3a: Evaluate the costs and benefits of investing in additional education or training

DISTRIBUTION & PLANNING

Distribute to students
  • Student Activity Packet
  • Application Problems
Intro

CONSIDER: Choosing Where to Meet Up

Three friends - Harper, Luke, and Jaxon - are trying to decide the best spot to meet up. Compare the two options based on how far each friend would need to travel to get there.
Cafe Mezze
  • Harper lives next door and is 0 miles away
  • Luke lives 15 miles away
  • Jaxon lives 3 miles away
Rabat Restaurant
  • Harper lives 6 miles away
  • Luke lives 5 miles away
  • Jaxon lives 7 miles away
3
Harper says “Let’s add up the total distance we would collectively need to travel to get to each restaurant.”
  1. What is the sum of the distances the friends will travel to Cafe Mezze?_______
  2. What is the sum of the distances the friends will travel to Rabat Restaurant?_______
  3. Why isn’t Harper’s strategy a helpful representation of the scenario?_______
3
Luke says, “Of course Harper would suggest that, they live right next to Cafe Mezze! Let’s square the distance we each travel, then find the sum.”
  1. Square the distance each friend would travel to Cafe Mezze. What is the sum of those squares_______
  2. Square the distance each friend would travel to Rabat Restaurant. _______ What is the sum of those squares
  3. Using Luke’s strategy, why is one of the sums much larger than the other?_______
1

Which option do you think they should choose? Why?

Learn It 1

Residuals

In the Intro, you compared how far the three friends were from a particular place. You can apply the same concept to a scatterplot with a line of best fit. The vertical distance between a data point and the line is called a residual. Just like the friends chose the restaurant that was best for all of them, we can use residuals to find the line that best fits the data points.
Let’s look at the example scatterplot below, which shows data on 6 colleges that Kelly is interested in. Kelly looked up the acceptance rate and average students grant aid for each college. Use the graph to answer the following questions.

1

What is the average student grant aid for the school that admits 10% of students? Estimate based on the data point.

1

What does Kelly’s line of best fit predict the average student grant aid will be for a school that admits 10% of students? Estimate based on the graph or by using the equation for the line of best fit: ŷ = -0.32x + 45.50.

1

What is the difference between those two values?

The value you just found is the residual. You may also hear it called an error when looking at data representing an entire population. The residuals for a line of best fit always add up to 0.

Practice It

GRAPH: Sticker Price and Student:Faculty Ratio

When choosing a college, one factor you might consider is the student:faculty ratio. This ratio tells you how many faculty members there are compared to the number of students. A lower student: faculty ratio might mean smaller class sizes, more opportunities to talk to professors, and more individualized support
The scatterplot below shows data on sticker price and student:faculty ratio for a representative sample of 27 4-year US colleges.
1

Sketch the line that you think best fits the data below.

1

Based on your line of best fit, what is the residual at x = 60?

1

Describe the correlation between the sticker price and student: faculty ratio. Consider: is it strong or weak? Positive or negative?

1

Compare your line with your classmate(s). How do you think we can figure out which lines are the BEST possible fit?

Learn It 2

Ordinary Least Squares (OLS) Regression

Residuals are a part of finding the line of best fit. That process is called Ordinary Least Squares (OLS) Regression.
OLS regression figures out which line is the BEST fit by finding the line that minimizes the sum of the squared residuals.
Let’s see how OLS regression works for our data on sticker price and student:faculty ratio. The graph on the left shows the residuals for all of the data points. The graph on the right squares each of those residuals. The best fit line makes the total area of those squares as small as possible.

1

Which data point has the residual with the greatest vertical distance? Approximate the coordinates.

1
The residuals for this model add up to 0.
  1. How is that possible if the line does not perfectly fit all the points?_______
  2. In the Intro, we squared the distances from the restaurant for two reasons: 1) because both options added up to the same sum and 2) because we wanted to account for one person living far away. How can that approach be applied to the line of best fit?_______
1

The equation for this regression line is ŷ = -0.16x + 18. Which of the following statements about the slope is correct?
  1. The slope is -0.16. This means the predicted student:faculty ratio decreases by 0.16 for every additional $1000 in tuition
  2. The slope is -0.16. This means the predicted student:faculty ratio is the tuition multiplied by 0.16.
  3. The slope is 18. This means the predicted student:faculty ratio increases by 18 for every additional $1000 in tuition.
  4. The slope is 18. This means the average student:faculty ratio is 18.

1

The average annual sticker price of a 4-year college has increased by approximately $10,000 since 2000, adjusted for inflation.[2] Do you think that has caused a corresponding decrease in the average student:faculty ratio over the same time period? Why or why not?

Desmos

DESMOS: OLS Regression and College Costs

One way we can measure how well a model fits the data is by looking at the r2 value. r2 is a number between 0 and 1 that tells you how well the model can predict the data points. A high r2 value means the points are close to the line. Let’s explore how to find the line of best fit and see the connection between r2 and residuals. Follow your teacher’s instructions to complete this Desmos activity.
20

Desmos Link: What did you learn from this activity?

APPLICATION: College Tuition, Graduation Rate, and ACT Scores

Level 1


Tuition and Graduation Rate

Ethan is researching the relationship between a college’s sticker price of a college and graduation rate. He finds data for 35 4-year colleges and uses “sticker price” to represent the published annual tuition and fees for an out-of-state student.[1]
Ethan uses linear regression to find the line of best fit. Here are his results:
ŷ = 0.87x + 35.98
r^2 = 0.57

1

Describe the correlation between sticker price and graduation rate. Consider: is the correlation strong, weak, or nonexistent? Is the correlation positive or negative?

3
Answer the subquestions to find the residual for the data point at x = 40.
  1. What is the predicted y value (ŷ) at x = 40, based on Ethan’s line?_______
  2. Approximate based on the data points: what is the actual y value at x = 40?_______
  3. What is residual for that data point?_______
1

What would happen to the r2 value if you removed the outlier at (6, 75). Circle your answer and explain your reasoning.
  1. The r2 value would increase if you removed the outlier
  2. The r2 value would decrease if you removed the outlier
  3. The r2 value would stay the same if you removed the outlier

1

Ethan concludes, “The slope of the line of best fit is 0.87. That means that if a college increases their sticker price by $1000, it will cause their graduation rate to increase by 0.87 percentage points.” Is Ethan correct? Why or why not?

Level 2

The Costs of College Over Time

Kiana is researching how the costs of college have changed over time. She finds data on the average annual cost of attending a 4-year college since 1970, adjusted for inflation. It includes the cost of tuition, required fees, housing, and food.
Kiana uses linear regression to find the line of best fit. Here are her results:
ŷ = 0.42x + 8.11
r^2 = 0.95

1

Based on her linear regression model, what is the predicted annual cost of college in 2030 (60 years after 1970)?

3
Answer the subquestions to find the residual for the data point at x = 40.
  1. What is the predicted y value (ŷ) at x = 40, based on Kiana’s line?_______
  2. Approximate based on the graph: what is the actual y value at x = 40?_______
  3. What is residual for that data point?_______
2
The equation of Kiana’s line is: ŷ = 0.42x + 8.11
  1. What is the slope?_______
  2. What does the slope tell you in this context?_______
Kiana decides to also try an exponential regression model. Here are her results:
ŷ = 9.90⋅(1 + 0.023)x
r^2 = 0.975

1

Based on this exponential model, what is the predicted cost of college in 2030 (after 60 years)? How does that compare to the cost predicted by the linear model in Question 1?

1

Based on the exponential model, how much are college costs increasing each year?

1

Which model do you think best describes how college costs have changed over time? Explain your reasoning.

Level 3

VIDEO: What is a Residual Plot?

We can learn a lot more about our model from residuals. A residual plot is a graph that shows just the residuals from a regression model. Analyzing it can help you notice patterns and determine whether you’ve chosen the best type of function to fit the data. Watch the video and answer the questions.
1

For a point above the line of best fit…
  1. The residual will be positive
  2. The residual will be negative
  3. The residual will be equal to zero
  4. The residual will be equal to the predicted y value

1

If a data point has a residual of -4, that means…
a. The data point is above the line of best fit
b. The data point is below the line of best fit
c. The data point is on the line of best fit
d. The data point is an outlier

1

Here are the residual plots for four different regression models. Based on the residuals, which model do you think is the best fit? Explain your reasoning.

12

DESMOS: College Sticker Price and ACT scores

How does college sticker price relate to students’ ACT scores? In this activity, you will explore real-world data and determine which model fits best. Follow your teacher’s instructions to complete this Desmos activity.