Discrete vs Continuous
Quantitative Discrete data that can only take certain numerical values
example, number of phone calls a day, money a person has, steps you take a day
Quantitative Continuous Data data that is not made up of counting numbers, but that may include fractions, decimals, or irrational numbers.
examples, length, weights, times
Below are tables comparing the number of part-time and full-time students at De Anza College and Foothill College enrolled for the most recent spring quarter. The tables display counts (frequencies) and percentages or proportions (relative frequencies). The percent columns make comparing the same categories in the colleges easier. Displaying percentages along with the numbers is often helpful, but it is particularly important when comparing sets of data that do not have the same totals, such as the total enrollments for both colleges in this example. Notice how much larger the percentage for part-time students at Foothill College is compared to De Anza College.
Tables are a good way of organizing and displaying data. But graphs can be even more helpful in understanding the data. There are no strict rules concerning which graphs to use. Two graphs that are used to display qualitative data are pie charts and bar graphs.


Sometimes percentages add up to be more than 100% (or less than 100%). In the graph, the percentages add to more than 100% because students can be in more than one category. A bar graph is appropriate to compare the relative size of the categories. A pie chart cannot be used. It also could not be used if the percentages added to less than 100%.
A sample should have the same characteristics as the population it is representing
Must be a Random Sample
Example is Simple Random Sampling- Each group of size n is likely to be chosen as another group of size n. Examples, pulling names out of a hat, or generating random numbers on a computer to select people
Stratified Sample- Divide the population into groups and take a proportionate simple random sample of each. (ie pick three random students from each classroom)
Cluster Sample- Divide the population into clusters, and randomly select some of the clusters. (ie pick 6 random classrooms and get information from all students)
Systematic Sample- Randomly select a starting point and take every nth piece of data from a listing of the population. (Use student ID's and select every ten students for study)
Sampling Error the process in which the sampling mistakes occur, like not having a large enough sample
Non- Sampling Errors- Things not to do with sampling mistakes like a faulty machine or bad record keeping.
Sampling Bias- When not all members of the population are likely to be chosen. (i.e, you only ask your friends and family about a study to do with Salinas)
Problems with samples: A sample must be representative of the population. A sample that is not representative of the population is biased. Biased samples that are not representative of the population give results that are inaccurate and not valid.
Self-selected samples: Responses only by people who choose to respond, such as call-in surveys, are often unreliable.
Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if possible. In some situations, having small samples is unavoidable and can still be used to draw conclusions. Examples: crash testing cars or medical testing for rare conditions
Undue influence: collecting data or asking questions in a way that influences the response
Non-response or refusal of subject to participate: The collected responses may no longer be representative of the population. Often, people with strong positive or negative opinions may answer surveys, which can affect the results.
Causality: A relationship between two variables does not mean that one causes the other to occur. They may be related (correlated) because of their relationship through a different variable.
Self-funded or self-interest studies: A study performed by a person or organization in order to support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not automatically assume that the study is good, but do not automatically assume the study is bad either. Evaluate it on its merits and the work done.
Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
Confounding: When the effects of multiple factors on a response cannot be separated. Confounding makes it difficult or impossible to draw valid conclusions about the effect of each factor.
In Confidence Intervals of the text, sample size formulas are provided which will determine sample sizes when sampling from a population. The sample size will be a function of the desired precision and not a function of the population size. It may be somewhat counterintuitive that the sample size does not depend on the population size. However, this implies that a sample size of 1,000 can be adequate to represent a population of 100,000 versus 1,000,000 given that the same level of precision is desired. When working in Confidence Intervals with sample size formulas, the student will notice that population size is not a factor in determining the sample size.
Suppose ABC College has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in the fall term. Asking all 10,000 students is an almost impossible task.
Suppose we take two different samples.
First, we use convenience sampling and survey ten students from a first term organic chemistry class. Many of these students are taking first term calculus in addition to the organic chemistry class. The amount of money they spend on books is as follows:
$128; $87; $173; $116; $130; $204; $147; $189; $93; $153
The second sample is taken using a list of senior citizens who take P.E. classes and taking every fifth senior citizen on the list, for a total of ten senior citizens. They spend:
$50; $40; $36; $15; $50; $100; $40; $53; $22; $22
It is unlikely that any student is in both samples.
Do you think that either of these samples is representative of (or is characteristic of) the entire 10,000 part-time student population?
Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?
Now, suppose we take a third sample. We choose ten different part-time students from the disciplines of chemistry, math, English, psychology, sociology, history, nursing, physical education, art, and early childhood development. (We assume that these are the only disciplines in which part-time students at ABC College are enrolled and that an equal number of part-time students are enrolled in each of the disciplines.) Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if they have a corresponding number. The students spend the following amounts:
$180; $50; $150; $85; $260; $75; $180; $200; $200; $150
Is the sample biased?
A local radio station has a fan base of 20,000 listeners. The station wants to know if its audience would prefer more music or more talk shows. Asking all 20,000 listeners is an almost impossible task.
The station uses convenience sampling and surveys the first 200 people they meet at one of the station’s music concert events. 24 people said they’d prefer more talk shows, and 176 people said they’d prefer more music.
Do you think that this sample is representative of (or is characteristic of) the entire 20,000 listener population?
What do you recommend they do to make their study unbiased?
Variation is present in any set of data. For example, 16-ounce cans of beverage may contain more or less than 16 ounces of liquid. In one study, eight 16-ounce cans were measured and produced the following amount (in ounces) of beverage:
15.8; 16.1; 15.2; 14.8; 15.8; 15.9; 16.0; 15.5
Two samples taken from the population will likely be different from each other, although they may be close. The larger the sample size, the closer the data will be to each other. This is variability in samples.
Samples thus far have been small. Usually polling samples of 1,200 and 1,500 is considered big enough if done random. Be aware that convince sampling and call in surveys will still create biased.
Input homework here