Math Objectives
Calculate measures of central tendency
Identify outliers and explain their effects on central tendency calculations
Create a 5 number summary of a data set
Create a box plot to represent a data set
Analyze data using a box plot, range, and interquartile range
Common Core Math Standards
Personal Finance Objectives
Compare student loan debt change
Analyze outstanding mortgage balances
Graph and summarize auto loan data
National Standards for Personal Financial Education
There are no relevant personal finance standards for this lesson
Math Objectives
Calculate measures of central tendency
Identify outliers and explain their effects on central tendency calculations
Create a 5 number summary of a data set
Create a box plot to represent a data set
Analyze data using a box plot, range, and interquartile range
Common Core Math Standards
Personal Finance Objectives
Compare student loan debt change
Analyze outstanding mortgage balances
Graph and summarize auto loan data
National Standards for Personal Financial Education
There are no relevant personal finance standards for this lesson
Consider the following three sets of numbers.
Set 1: {1, 11, 21, 31, 41, 51, 61}
Set 2: {28, 29, 30, 31, 32, 33, 34}
Set 3: {0, 31, 31, 31, 31, 31, 62}
Hint:
The mean is the number you get by dividing the sum of a set of values by the number of values in the set.
In contrast, the median is the middle number in a set of values when those values are arranged from smallest to largest.
The mode of a set of values is the most frequently repeated value in the set.
Mean and median are measures of the center of a set of data, but they don’t tell us the whole story about the data. The three sets in the intro are all very different but they had the same mean and median.
To get a clearer picture of a set of data, we can make a box plot. A box plot is a visual representation of 5 numbers that represent the spread and skew of the data. These numbers are:
Minimum - the smallest number in the set
Maximum - the largest number in the set
Median - the middle number of the set
1st Quartile (Q1) - the number below which 25% of the data points are found. You can find this by locating the middle number of the ordered data BELOW the median
3rd Quartile (Q3) - the number below which 75% of the data points are found. You can find this by locating the middle number of the ordered data ABOVE the median
As a part of a final project for school, Nick asked 11 of his classmates how much student loan debt they will have after graduating college and received the following responses:
A box plot is a visual representation of the 5 number summary. Here are the steps to create a box plot of Nick’s student loan debt data.
Create a number line that covers the entire range of values
Place a line above each value of the 5 number summary
Create a box from Q1 to Q3 that includes the median in between
Draw lines from the middle of the box to the minimum and maximum values
Timor is looking into retirement investments and surveys 15 people about the age at which they plan to retire. He got the following responses:
35 45 50 51 52 52 64 64 65 65 66 66 66 67 70
Create an appropriate number line for this data, then draw the box plot.
When looking at a set of data, we can ask some important questions:
How spread out is the data? In the intro, the first set changed by 10s and spread from 1 to 61. The second set only changed by 1s and was grouped around the center. This is called dispersion.
Is there any data that looks like it doesn’t belong? If you are a typical A student and most of your grades are in the 90s but you get one bad grade because you were sick that week, that one bad grade would stand out. We call that type of data an outlier.
How is the data grouped? If we looked at data on age of retirement, most of the data would be near the high end of ages but if we looked at the age that someone got their driver's license, most of the data would be near the low end. This is called skewness. (You’ll learn more about this one in a couple of lessons)
Now that we have a visual representation of Nick’s data, let’s look at the conclusions that Nick can draw about his data set to include in his presentation. Here’s the box plot again for reference.
A box plot breaks the data up into quartiles, which means 4 equal parts. They might not look equal on the graph but each region represents 25% of the data:
his allows us to analyze where the data is located. For example, Nick adds the following analysis to his presentation:
“75% of survey respondents said that they will have less than $42,000 of student loan debt when they graduate.”
Because $42,000 is Q3 of Nick’s data, there are three 25% sections below that data.
If Nick wanted to include information about the bottom 25% of respondents, what statement could he make?
Are there any outliers in Nick’s data?
How can we use box plots to compare similar data sets? In this activity, you will work with your groupmates to create a box plot for your designated region then compare your results with another region to analyze the results.
Part 1: Create Your Box Plot
The ordered tables below represent average credit card debt of states in different regions of the US. Follow your teacher's directions to create a box plot for your assigned region.
The box plots below use the data on state average student loan debt for all 50 US states from 2016 and 2021. Use it to answer the questions that follow.
What is the approximate interquartile range of each set of data? Draw a conclusion about what this means about student loan borrowers in 2016 and 2021.
Use the number line to create a box plot. (and create poster with box plot to share with the class)
Which region had the highest median credit card debt?
Which region had the smallest interquartile range? What does this mean about the region?
Compare the midwest and southern regions and summarize their similarities and differences.
What general statement could you make about the US based on your answer to question 6?
If the US gained a 51st state that had $0 in credit card debt, what would that data do to the box plot?