Preskoči na glavni sadržaj
Prijava
Sign up for FREE
arrow_back
Biblioteka

1.2 Data, Sampling, and Variation in Data and Sampling (4/21/2025)

star
star
star
star
star
Posljednje ažuriranje about 1 year ago
14
Untitled Section 1
1
Examples
1
Untitled Section 4
1
1
Qualitative Data Discussion
Omitting Categories/Missing Data

The table displays Ethnicity of Students but is missing the "Other/Unknown" category. This category contains people who did not feel they fit into any of the ethnicity categories or declined to respond. Notice that the frequencies do not add up to the total number of students. In this situation, create a bar graph and not a pie chart.

Pie Chart: No Missing Data

Sampling

Non Random Sampling example

Convenience sampling involves using results that are readily available. (only interviewing people that are easily accessible)

Sampling with replacement is truly random sampling (there is a chance to be selected more than once). However, most studies do a random sample without replacement for practical reasons.

1
Examples
1
Sample Size
Variation
1
1
Homework

Classwork 53-79 odd

https://openstax.org/books/introductory-statistics-2e/pages/1-homework

1
Pitanje 1
1.

Discrete vs Continuous

Quantitative Discrete data that can only take certain numerical values

example, number of phone calls a day, money a person has, steps you take a day

Quantitative Continuous Data data that is not made up of counting numbers, but that may include fractions, decimals, or irrational numbers.

examples, length, weights, times

Pitanje 2
2.
Pitanje 3
3.
Pitanje 4
4.

Below are tables comparing the number of part-time and full-time students at De Anza College and Foothill College enrolled for the most recent spring quarter. The tables display counts (frequencies) and percentages or proportions (relative frequencies). The percent columns make comparing the same categories in the colleges easier. Displaying percentages along with the numbers is often helpful, but it is particularly important when comparing sets of data that do not have the same totals, such as the total enrollments for both colleges in this example. Notice how much larger the percentage for part-time students at Foothill College is compared to De Anza College.

Tables are a good way of organizing and displaying data. But graphs can be even more helpful in understanding the data. There are no strict rules concerning which graphs to use. Two graphs that are used to display qualitative data are pie charts and bar graphs.

In a pie chart, categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category.

In a bar graph, the length of the bar for each category is proportional to the number or percent of individuals in each category. Bars may be vertical or horizontal.

A Pareto chart consists of bars that are sorted into order by category size (largest to smallest).

Percentages That Add to More (or Less) Than 100%

Sometimes percentages add up to be more than 100% (or less than 100%). In the graph, the percentages add to more than 100% because students can be in more than one category. A bar graph is appropriate to compare the relative size of the categories. A pie chart cannot be used. It also could not be used if the percentages added to less than 100%.

A sample should have the same characteristics as the population it is representing

Must be a Random Sample

Example is Simple Random Sampling- Each group of size n is likely to be chosen as another group of size n. Examples, pulling names out of a hat, or generating random numbers on a computer to select people

Stratified Sample- Divide the population into groups and take a proportionate simple random sample of each. (ie pick three random students from each classroom)

Cluster Sample- Divide the population into clusters, and randomly select some of the clusters. (ie pick 6 random classrooms and get information from all students)

Systematic Sample- Randomly select a starting point and take every nth piece of data from a listing of the population. (Use student ID's and select every ten students for study)

Sampling Error the process in which the sampling mistakes occur, like not having a large enough sample

Non- Sampling Errors- Things not to do with sampling mistakes like a faulty machine or bad record keeping.

Sampling Bias- When not all members of the population are likely to be chosen. (i.e, you only ask your friends and family about a study to do with Salinas)

  • Problems with samples: A sample must be representative of the population. A sample that is not representative of the population is biased. Biased samples that are not representative of the population give results that are inaccurate and not valid.

  • Self-selected samples: Responses only by people who choose to respond, such as call-in surveys, are often unreliable.

  • Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if possible. In some situations, having small samples is unavoidable and can still be used to draw conclusions. Examples: crash testing cars or medical testing for rare conditions

  • Undue influence:  collecting data or asking questions in a way that influences the response

  • Non-response or refusal of subject to participate:  The collected responses may no longer be representative of the population.  Often, people with strong positive or negative opinions may answer surveys, which can affect the results.

  • Causality: A relationship between two variables does not mean that one causes the other to occur. They may be related (correlated) because of their relationship through a different variable.

  • Self-funded or self-interest studies: A study performed by a person or organization in order to support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not automatically assume that the study is good, but do not automatically assume the study is bad either. Evaluate it on its merits and the work done.

  • Misleading use of data: improperly displayed graphs, incomplete data, or lack of context

  • Confounding:  When the effects of multiple factors on a response cannot be separated.  Confounding makes it difficult or impossible to draw valid conclusions about the effect of each factor.

Pitanje 5
5.
Pitanje 6
6.

In Confidence Intervals of the text, sample size formulas are provided which will determine sample sizes when sampling from a population. The sample size will be a function of the desired precision and not a function of the population size. It may be somewhat counterintuitive that the sample size does not depend on the population size. However, this implies that a sample size of 1,000 can be adequate to represent a population of 100,000 versus 1,000,000 given that the same level of precision is desired. When working in Confidence Intervals with sample size formulas, the student will notice that population size is not a factor in determining the sample size.

Suppose ABC College has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in the fall term. Asking all 10,000 students is an almost impossible task.

Suppose we take two different samples.

First, we use convenience sampling and survey ten students from a first term organic chemistry class. Many of these students are taking first term calculus in addition to the organic chemistry class. The amount of money they spend on books is as follows:

$128; $87; $173; $116; $130; $204; $147; $189; $93; $153

The second sample is taken using a list of senior citizens who take P.E. classes and taking every fifth senior citizen on the list, for a total of ten senior citizens. They spend:

$50; $40; $36; $15; $50; $100; $40; $53; $22; $22

It is unlikely that any student is in both samples.

1
Pitanje 7
7.

Do you think that either of these samples is representative of (or is characteristic of) the entire 10,000 part-time student population?

1
Pitanje 8
8.

Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?

1
Pitanje 9
9.

Now, suppose we take a third sample. We choose ten different part-time students from the disciplines of chemistry, math, English, psychology, sociology, history, nursing, physical education, art, and early childhood development. (We assume that these are the only disciplines in which part-time students at ABC College are enrolled and that an equal number of part-time students are enrolled in each of the disciplines.) Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if they have a corresponding number. The students spend the following amounts:

$180; $50; $150; $85; $260; $75; $180; $200; $200; $150

Is the sample biased?

A local radio station has a fan base of 20,000 listeners. The station wants to know if its audience would prefer more music or more talk shows. Asking all 20,000 listeners is an almost impossible task.

The station uses convenience sampling and surveys the first 200 people they meet at one of the station’s music concert events. 24 people said they’d prefer more talk shows, and 176 people said they’d prefer more music.

1
Pitanje 10
10.

Do you think that this sample is representative of (or is characteristic of) the entire 20,000 listener population?

1
Pitanje 11
11.

What do you recommend they do to make their study unbiased?

Variation in Data

Variation is present in any set of data. For example, 16-ounce cans of beverage may contain more or less than 16 ounces of liquid. In one study, eight 16-ounce cans were measured and produced the following amount (in ounces) of beverage:

15.8; 16.1; 15.2; 14.8; 15.8; 15.9; 16.0; 15.5

Variation in Samples

Two samples taken from the population will likely be different from each other, although they may be close. The larger the sample size, the closer the data will be to each other. This is variability in samples.

Size of Sample

Samples thus far have been small. Usually polling samples of 1,200 and 1,500 is considered big enough if done random. Be aware that convince sampling and call in surveys will still create biased.

Pitanje 12
12.
Pitanje 13
13.
Pitanje 14
14.

Input homework here