CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

CHAPTER 11 OECD AVERAGE AND OECD TOTAL BOX
 CONTENTS PREFACE IX INTRODUCTION 1 REFERENCES 5 CHAPTER
 NRC INSPECTION MANUAL NMSSDWM MANUAL CHAPTER 2401 NEAR‑SURFACE

32 STAKEHOLDER ANALYSIS IN THIS CHAPTER A STAKEHOLDER ANALYSIS
CHAPTER 13 MULTILEVEL ANALYSES BOX 132 STANDARDISATION OF
CHAPTER 6 COMPUTATION OF STANDARD ERRORS BOX 61

Chapter 9

Chapter 9


Estimation Using a Single Sample


In many practical problems, we want to estimate some population characteristics, for example, population mean , population standard deviation , the proportion of S’s in a population, and so on. In this chapter, we will introduce two estimation techniques, point estimation and interval estimation.


9.1 Point Estimation


Definition 9.1: A point estimate of a population characteristic is a single number computed from sample data and represents a plausible value of the characteristic.


Note: (1) The adjective point reflects the fact that the estimate corresponds to a single point on the number line.

(2) A point estimate is obtained by (i) selecting an appropriate statistic; (ii) computing the value of the statistic for the given sample. For example, the computed value of the sample mean CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY provides a point estimate of a population mean .


Sometimes, there may be several statistics that can reasonably be used to obtain a point estimate of a specified population characteristic. For example, to obtain a point estimate of a population mean , we can use the sample mean CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY , a trimmed mean, or the sample median. Then which one should we choose for computing an estimate?



Generally, we choose the statistic that tends, on average, to produce an estimate closest to the true value, that is, the most accurate estimate. Information about the accuracy of estimation for a particular statistic is provided by the statistic’s sampling distribution.


  1. If a statistic whose sampling distribution is centered to the right of the true value is used to compute an estimate, the estimate will tend to be larger than the true value.


  1. If a statistic whose sampling distribution is centered to the left of the true value is used to compute an estimate, the estimate will tend to be smaller than the true value.


  1. When a statistic whose sampling distribution is centered at the true value is used to compute an estimate, there will be no long-run tendency to over- or underestimate the true value.


Definition 9.2: A statistic whose mean is equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said biased.


Questions: (1) Is CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY an unbiased statistic for estimating ? Is p an unbiased statistic for estimating a population proportion ?


Generally, we prefer an unbiased statistic. If several unbiased statistics could be used for estimating a population characteristic, the best statistic to use is the one with the smallest standard deviation.


Consider the problem of estimating a population mean, . When the population distribution is symmetric, the sample mean CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY , the sample median, and any trimmed mean are all unbiased statistics. Which statistic should be used? The following facts are helpful in making a choice.


  1. If the population distribution is normal, then CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY has a smaller standard deviation than any other unbiased statistic for estimating .

  2. When the population is symmetric with heavier tails than the normal curve, a trimmed mean has a smaller standard deviation than CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .


Thus, when the population distribution is normal, we should use CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY to estimate . But with a heavy-tailed distribution, a trimmed mean is a better statistic than CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY for estimating .


Generally, we estimate a population characteristic by the corresponding sample characteristic, which is summarized in the following table.


Population characteristic to be estimated

Statistic to use

Unbiasedness

Population proportion,

p

Unbiased

Population mean,

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

Unbiased

Population variance, 2

s2

Unbiased

Population standard deviation,

s

Biased


Table 9.1 Statistics used to estimate some important population characteristics


Note: s is not an unbiased statistic for estimating . It tends to underestimate slightly the true value of . Since unbiasedness is not the only criterion to judge a statistic, and there are other good reasons for using s to estimate , we will use s to estimate in the following discussion.


9.2 A large-Sample Confidence Interval for a Population Proportion


In Section 9.1, we saw how to use a statistic to produce a point estimate of a population characteristic. However, because of sampling variability, rarely is the point estimate from a sample exactly equal to the true value of the population characteristic. Although a point estimate may represent our best single-number guess for the value of the population characteristic, it is not the only plausible value. Thus we need to indicate in some way how precisely the population characteristic has been estimated. A point estimate by itself does not provide this information. As an alternative to a point estimate, we report an interval of reasonable values based on the sample data. Then we can have some “confidence” in the interval estimate.


Definition 9.3: A confidence interval for a population characteristic is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval.


Definition 9.4: The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval.


Note: The confidence level provides information on how much “confidence” we can have in the method used to construct the interval, not our confidence in any one particular interval. See Figure 9.4 on page 381 for interpretation.


We first consider a large-sample confidence interval for a population proportion. Let


 = proportion of individuals in the population that possess the property of interest,


p = (number of individuals in sample that possess the property of interest) / n,


the sample proportion.


We know that the sampling distribution of the statistic p has the following properties:


    1. The sampling distribution of p is centered at ; that is, p = . Therefore, p is an unbiased statistic for estimating .

    2. The standard deviation of p is CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .

    3. When n 10 and n(1-) 10, the sampling distribution of p is approximately normal with mean and standard deviation CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .


      • The development of a confidence interval for with confidence level 95%


        1. Use Appendix Table 2 to determine a value z* such that P(– z*< z < z*) = 0.95. z* =1.96.

        2. Since –1.96 < CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is equivalent to

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ,

P(CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = P(–1.96 < CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = 0.95

This implies that in repeated sampling, 95% of the time the interval


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY to CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

will contain .

(iii) Since is unknown, CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY must be estimated. When the sample size is large, the value of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY should be close to CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY and can be used in its place.


Thus when n is large, a 95% confidence interval for is


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY , CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


An abbreviated formula for the interval is


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


where + gives the upper endpoint of the interval and – gives the lower endpoint of the interval. The interval can be used as long as np 10 and n(1-p) 10.


The formula given for a 95% confidence interval can easily be adapted for other confidence levels.



When

  1. p is the sample proportion from a random sample, and

  2. the sample size n is large (np 10 and n(1-p) 10)

the general formula for a confidence interval for a population proportion is


p (z critical value) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


The desired confidence level determines the z critical value. The three most commonly used confidence levels, 90%, 95%, and 99%, use z critical values 1.645, 1.96, and 2.58, respectively.


Note: Some z critical values can be found in Appendix Table 3 on page 708.


Exercise in class: Discuss how each of the following factors affects the width of the confidence interval for : (1) The confidence level; (2) The sample size n; (3) The value of p.


Generally, the higher reliability of a interval (where “reliability” is specified by the confidence level) entails a loss in precision (as indicated by the wider interval). For example, the width of the 99% interval is 2(2.58 CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ), which is wider than the width of the 95% interval, 2(1.96 CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ). In the opinion of many investigators, a 95% interval gives a reasonable compromise between reliability and precision.



Many confidence intervals have the same general form as the large-sample intervals for :


  1. (CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY Point estimate using a specified statistic) (critical value) (standard deviation of the statistic)


If it is known


  1. (Point estimate using a specified statistic) (critical value) (estimated standard deviation of the statistic)

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

If it is unknown


Definition 9.5: The standard error of a statistic is the estimated standard deviation of the statistic.



Definition 9.6: If the sampling distribution of a statistic is normal (approximately), the bound on error of estimation, B, associated with a confidence interval is


(z critical value)(standard deviation of the statistic).


When we use p to construct a 95% confidence interval for , the bound is B = CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY . Sometimes, we may wish to determine a sample size such that a particular value of the bound B is achieved. For such purposes, solving the equation B = CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY for n, we obtain

n = (1-) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


Generally, the sample size required to estimate a population proportion to within an amount B with a confidence level is

n = (1-) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


The value of may be estimated using prior information. In the absence of any such information, using = .5 in this formula gives a conservatively large value for the required sample size.


Note: CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY 0 CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY (1-) ¼ (1-) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY for any .


Example 9.1 A survey designed to obtain information on = the proportion of registered voters who are in favor of a constitutional amendment requiring a balanced budget results in a sample of size n = 400. Of the 400 voters sampled 272 are in favor of a constitutional amendment requiring a balanced budget.

  1. Give a point estimate of .

  2. Determine the estimated standard deviation of your estimate in part a).

  3. Calculate a 99% confidence interval for and interpret the confidence interval.

  4. Based on this confidence interval, do the majority of registered voters favor the constitutional amendment?

  5. How large would n have needed to be in order to have estimated to within .03 with 95% confidence?


  1. The point estimate of is p = 272 / 400 = 0.68.

  2. The estimated standard deviation of p is CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY =CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = .0233

  3. Since np = 400 0.68 = 272 > 10 and n(1-p) = 400(1-0.68) = 128 > 10, we can use the formula for a large-sample confidence interval to obtain a 99% confidence interval for .

p (z critical value) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 0.68 2.58CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 0.68 2.58 0.0233

= 0.68 0.0601 = (0.6199, 0.7401).


We are 99% confident that is between 0.6199 and 0.7401.

  1. Yes, since the entire interval is above 0.5.

  2. Using a conservative value of = .5 in the formula for required sample size gives

n = (1-) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 0.5(1-0.5) CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 1067.11

Thus, n would need to be 1068 in order to estimate to within .03 with 95% confidence.


Question: Are the following statements correct?

(1) Since (0.6199, 0.7401) is a 99% confidence interval for , P((0.6199, 0.7401) contains ) = 99%.

(2) If the process of selecting a sample of size 400 and then computing the corresponding 99% confidence interval is repeated 100 times, 99 of the resulting intervals will include .


9.3 A Confidence interval for a population mean


In this section, we consider how to use information from a random sample to construct a confidence interval estimate for a population mean.


Recall the four properties about the sampling distribution of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY :


1. The mean of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ,CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

2. The standard deviation of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ,CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

  1. When the population distribution is normal, the sampling distribution of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is also normal.

  2. When n is sufficiently large (generally n 30), the sampling distribution of CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is approximately normal.



When

1. CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is the sample mean of a random sample from a population

2. the population distribution is normal OR the sample size n is large (generally n 30), and

  1. the population standard deviation is known

the formula for a confidence interval for population mean is


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY  ( z critical value) (CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY )


Example 9.2 The McClatchy News Service reported on a sample of prime-time television hours. The following table summarizes the information reported for two networks.


Network

Mean Number of Violent Acts per Hour

ABC

15.6

FOX

11.7


Suppose that each of these sample means was computed on the basis of viewing n = 50 randomly selected prime-time hours and that the population standard deviation for each of the two networks is known to be = 5.


Since n = 50 > 30 and = 5, we can use the one-sample z confidence interval formula.


  1. The 95% confidence interval for ABC is

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY  (z critical value) (CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = 15.6 (1.96)(CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = 15.6 1.39 = (14.21, 16.99)


  1. The 95% confidence interval for FOX is

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY  ( z critical value) (CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = ? ? × ? = ? ? = (?, ?)


  1. Yes, because the plausible values for ABC is at least 14.21, while the plausible values for FOX are not greater than 13.09.


Let us look at the development of the 95% confidence interval for when is known. When the population distribution is normal, the sampling distribution ofCHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is normal.

Thus,

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


has the standard normal distribution.


Since CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is equivalent to CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ,

P(CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = P(CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY ) = 0.95

Then a confidence interval for is CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .


When is unknown, we must use the sample data to estimate . A natural estimate of is s. Now we use

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY

To use t to develop a confidence interval for , we must know the probability distribution of t.


Let x1, x2, , xn be a random sample from a normal population distribution. Then the probability distribution of the standardized variable


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


is the t distribution with n-1 df.


When

1. CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY is the sample mean of a random sample from a population

2. the population distribution is normal OR the sample size n is large (generally n 30), and

  1. the population standard deviation is unknown

the formula for a confidence interval for population mean is


CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY (t critical value) (CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY )


where the t critical value is based on n-1 df, which can be found by Appendix Table 3 on page 708.

Note: Appendix Table 3 jumps from 30 df to 40 df, then 60 df, then 120 df, and finally to the row of z critical values. If we need a critical value for a number of degrees of freedom between those tabulated, we just use the critical value for the closest df. For df > 120, we use the z critical values.

Example 9.3 A medical researcher from the National Institute of Health has collected samples on the life expectancies of people who are long-time smokers and those who are nonsmokers. The sample data is summarized in the table below.


Group

Sample Size

Sample Mean

Sample Standard Deviation

Smokers

50

67.6

5

Nonsmokers

60

74.5

3.5


  1. Compute a 95% confidence interval for the mean life expectancy of a smoker.

  2. Compute a 95% confidence interval for the mean life expectancy of a nonsmoker.

  3. Do the confidence intervals in parts (a) and (b) provide convincing evidence that nonsmokers live longer on the average than do smokers? Explain.


  1. Since n1 = 50 > 30 and is unknown, we can use the one-sample t confidence interval formula.

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY  (t50-1 critical value)CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 67.6 2.02 CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = 67.6 1.4284

= (66.1716, 69.0284)


  1. Since n2 = 60 > 30 and is unknown, we can use the one-sample t confidence interval formula.

CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY  (t60-1 critical value)CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY = ? ? ? = ? ?

= (?, ?)


  1. The confidence intervals in parts a) and b) do provide convincing evidence that nonsmokers live longer than long-time smokers since the largest value in the confidence interval for smokers is roughly 4.5679 years less than the smallest value in the confidence interval for non-smokers.



When we use CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY to construct a 95% confidence interval for , the bound on error of estimation is


B = CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY


Before collecting any data, an investigator may wish to determine a sample size for which a particular value of the bound is achieved. Solving B = CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY for n, we obtain


n =CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .





Generally, we have the following result.


The sample size required to estimate a population mean to within an amount B with a confidence level is


n =CHAPTER 9 ESTIMATION USING A SINGLE SAMPLE IN MANY .


If is unknown, it may be estimated based on previous information or, for a population that is not too skewed, by using (range)/4


Example 9.4 The financial aid office wishes to estimate the mean cost of textbooks per semester for students at a university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy?


To determine the required sample size, we must have a value for . The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $50 and $450. A reasonable estimate of is then


(range) / 4 = (450 – 50) / 4 = 400 / 4 = 100.


The required sample size is


n = [ 1.96 / B ]2 = [(1.96)(100) / 20]2 = [9.8]2 = 96.04.


Rounding up, a sample size of 97 or larger is required.




CONFIGURING USER STATE MANAGEMENT FEATURES 73 CHAPTER 7 IMPLEMENTING
INTERPOLATION 41 CHAPTER 5 INTERPOLATION THIS CHAPTER SUMMARIZES POLYNOMIAL
PREPARING FOR PRODUCTION DEPLOYMENT 219 CHAPTER 4 DESIGNING A


Tags: chapter 9, estimation, single, sample, using, chapter