Sample size determination a comparison of attribute

Pdf File 1,071.67 KByte,

A n n als o f L ib rary S cien ce an d D o cu m en tatio n 4 2 ,3 ; 1 9 9 5 ; 9 6 -1 0 0 .



Library Department University of Calabar Calabar


E a ch o f th e tw o m e th o d s o f sa m p le size d e te rm in a tio n - th e A ttrib u te a n d th e C o n tin u o u s V a ria b le M e th o d h a s its u se in th e in ve stig a tio n o f so cia l scie n ce p ro b le m s. T h e fo rm e r a llo w s th e co m p u ta tio n o f sa m p le size w ith re te re n ce to a n y p a ra m e te rs o f th e va ria b le a n d , th e re fo re , ca n su b stitu te fo r th e C o n tin u o u s V a ria b le M e th o d , b u t, w ith a n d p ro b a b le in cre a se in sa m p le size . T h e la te r is ve ry u se fu l w h e n d a ta a re co lle cte d in ra tio fo rm . H o w e ve r, it d e m a n d s e stim a te s o f d isp e rsio n fro m th e m e a n w h ich m a y b e p rim a ry p u rp o se o f th e re se a rch in th e first p la ce . T h e A ttrib u te M e th o d is h ig h ly re co m m e n d e d fo r lib ra ry a n d in fo rm a tio n scie n ce sin ce it ca n b e su b stitu te d fo r C o n tin u o u s V a ria b le M e th o d .


Researchers intending to use sampling procedures for studying library and information science subject face many and diverse methodological problems. One of these problems is the decision about the sample size such as, the minimum amount of data to be collected or whether enough cases available to be statistically valid.

To determine appropriate sample size, one must have a thorough knowledge of the levels of measurement to be used, the hypotheses to be tested and the type of statistical tests most appropriate to the problem. Some knowledge of population parameters is necessary, for example, percentage of occurrence or the standard deviation as well as desired level of confidence and desired accuracy for the sample, expressed as tolerance. Hence, a thorough idea of statistical problems and training are required to decide which of the methods namely, the Attribute and Continuous Variable Method is more suitable for library and information science surveys.

One of the simple and informal method for determination of sample size is to use the sample size used by others studying a similar problem. For example, Roscoe [4] states that in behavioral research there are few occasions when samples smaller than 30 and larger than 500 in size can be justified. While the range from 30 to 500 may appear to be a large one, it does narrow the number to some extent. Unfortunately, no rationale is given for this recommendation, and this view is shared by Uko [5] and others.

The most frequently used statistical approaches are the Attribute and the Continuous Variable Methods. In this paper a basis for comparison with the two approaches is given for calculation of sample size for library and information science surveys. Convenience alone dictated the comparison of the two approaches.


To generalize from a sample to a universe or population, that is, to hypothesize that the mean of the population variable falls within a certain range of values at a certain level of confidence, statistical techniques must be used.

In discussions of the Attribute Method, and the Continuous Variable Method, found in the documents on research methodology and statistics, two key factors are always mentioned [3]. First, the need to establish the level of confidence and the second, the need to establish the degree of accuracy or tolerance that is required.

The Attribute Method deals with the significance of proportions and requires an estimate of the percentage of occurrence of the key variable in the study. The Continuous Variable Method requires


A n n L ib S ci D o c


an estimate of the dispersion within the key vari-

where F == the sample size

able, usually, the standard deviation. In both cases,

C2 == the z-score squared represent-

estimates based on prior knowledge are required.

ing the desired confidence level.

The attribute method does permit the selection of

the desired tolerance expressed

the largest possible sample size by estimating the

occurrence of the key variable atN5M0 %LK. JIHGFEDCBA

as a fraction or decimal p == the estimated percentage occur-

rence of the attribute being mea-

The confidence levels establish the degree of sam-


pling error that will be permitted in the study. For

example, when a confidence level of 9 5 % is used, The z-score squared and tolerance squared have

it is said that the probability is 9 5 % . That means

been used for ease of computation. For those who

that, most of the time, the confidence interval will are more familiar with the z-score notation, the fol-

contain the true mean and in 5 % of the time it will lowing gives the value of the z-score, and C2, for

not. Carpenter and Vasu [1] stated that the most six typical confidence levels:

commonly used confidence levels were the 9 5 %

level and might be used as a standard. The 9 9 %

level increases the sample size and the 9 0 % level

Confidence level



is least used. To minimize total error that results

from non-response, inaccuracy of recording re-


2 .5 8

6 .6 5 6 4

sponses or copying figures from one file to another


2 .3 3

5 .4 2 8 9

and so on, the confidence level should be set at a


2 .2 7

5 .1 5 2 9

conservatively high level. The degree of accuracy


2 .0 5

4 .2 0 2 5

required of a sample is translated into statements


1 .9 6

3 .8 4 1 6

such as, ''The population mean falls with plus or


1 .6 5

2 .7 2 2 5

minus 5 units of the sample mean". The researcher

establishes this level of tolerance through inspec-

tion of the variable in question and a need for ac- The percentage occurrence portion of the formula,

curacy. But, as will be shown, sample size in- p (t-p), has the property of maximizing sample size

creases dramatically for higher level of tolerance.

at the 5 0 % level of occurrence for a given toler-


ance. Hence, sample size is the same for both p == 4 0 % and p == 6 0 % or p == 3 0 % and p == 7 0 % [ 2 ,5 ] .

The Attribute Method of sample size determination requires an estimate of the proportion of occurrence of a property or activity in the universe. Dougherty and Heinritz [2 ] used the example of "books that have been in circulation for fourteen days or more" to illustrate the concept of attribute. In this case, there are only two possible conditions - the case has the attribute or it does not. In effect, there are only two values. Other examples would be books borrowed for five days or more than five days; students who use libraries heavily or not heavily; documents that contain ten or more citations and those containing three or less.

A formula for the computation of Attribute sample size is given by Dougherty and Heinritz [2 ] and Uko [5 ].

C F ==




This, however, eases the problem of deciding on the criterion variable for sample size calculation in a multivariable study. For example, a hypothetical study might have three variables - age, distance and rate of visiting. Age could be defined as having the attribute of "greater than 1 8 years", distance having the attribute of "more than three kilometres" and rate of visiting as, "three or more visits per month". A problem in some people's mind would be to decide which of the three should be used in calculating sample size. In the Attribute Method, estimating the percentage of, occurrence at 5 0 % would maximize sample size for any variable. If it is necessary to decrease sample size, the choice should go to that variable with the percentage closest to 5 0 % .

If in the above example, p == 6 5 % for age, p == 3 0 % for distance and p == 2 0 % for rate of visiting, sample size is highest for age because it is closest to 5 0 % .

V o l 4 2 N o 3 S ep tem b er 1995



Table 1 illustrates the results of the computation at the 95%, 98% and 99% confidence levels with

tolerance set atN:1M:5 LfoKr JaIHll GvaFrEiaDbCleBs.A

The rationale for choosing age as the criterion variable rather than distance or rate of visiting is simple. By choosing the greatest sample size, all the other

Table 1A C o m p u te d s a m p le s iz e s fo r th r e e v a r ia b le s w ith to le r a n c e s e t a t :1:5%.

Age 65% Distance 30% Rate of visiting 20% Maximum size 50%

95% confidence 350 323 246 384

98% confidence 494 456 347 543

99% confidence 606 559 426 666

variables will be generalizable to the population. If the maximum size is selected, it is not even necessary to estimate the percentage occurrence of any of the other variables. In some cases, the difference between the maximum size and the size computed for a variable may be great enough to choose the lower of the two. In Table 1, under the heading 95% confidence, there is a difference of 34 cases between the maximum size of 384 and the next highest age, 350. A difference of 34 cases might be a rationale for choosing the lower figure. This rationale could be .used when the cost of collecting data for each case is high. If the cost of collecting data is low, choosing the maximum sample size would negate any errors in estimation of the percentage occurrence. This assumes, of course, the same level of tolerance for all variables, but, such an assumption may not be warranted. If the tolerance in Table 1 is changed to 3% for distance, and remains at 5% for both age and rate of visiting, the sample size would be more than double to 896 and would result In the highest sample size. It is important to bear in mind that this interrelationship between tolerance and percentage occurrence exists.

When the cost of data collection is low, the Attribute Method is very good to be used, but when the cost of data collection is high, estimates of the percent-

age occurrence on each variable should be made. Standard tables can be constructed for various levels of confidence and tolerance with p = 50%.


The Continuous Variable Method is similar to the Attribute Method with the substitution of a measure of dispersion for the estimate of the percentage occurrence.

The formula used by Uko [5] is typical:





zq =




sample size z-score for the confidence level standard deviation tolerance or degree of accuracy.

The Continuous Variable Method is used when the variables are in the form of a ratio scale. In this case, an estimate is needed of the standard deviation. The standard deviation can be determined by using an electronic calculator and a small random sample of 10 or 20 cases from the intended sampling frame. A sampling frame consists of a list, directory, index, maps and other records, listing the population elements from which the sample


A n n L ib S ci D o c


may be drawn. Some commonly used sampling frames are the telephone directory for telephone interviews, the staff nominal roll for sampling employees of institutions, and membership lists for associations and clubs.

To our example, we might estimate the standard deviations of distance as 0.5 kilometres, age as 5 years and rate of visiting as 0.65 visits per month.

Deciding on the tolerance level is also somewhat different than the procedure used in Attribute Method. A percentage of accuracy was decided upon in the previous method. In the Continuous Variable Method, an absolute value for the degree of accuracy is required such as, ?0.5 kilometres

or ?N5 MyeLaKrs JoIHt aGgeF. ETDheCdBeAsired degree of accu-

racy affects sample size considerably.

Table 2 A

V a ria tio n in sa m p le size g ive n d iffe re n t to le ra n ce a n d sta n d a rd d e via tio n s fo r th re e va ria b le s.

Age Distance Visits

Tolerance (?)

4 Years 3 Years 2 Years 1 Year

0.50 Kilometres 0.25 Kilometres 0.20 Kilometres 0.10 Kilometres

0.50 visits 0.25 visits 0.20 visits 0.10 visits

Confidence level

95% 95% 95% 95%

95% 95% 95% 95%

95% 95% 95% 95%

Standard deviation

15.5 15.5 15.5 15.5

1.5 1.5 1.5 1.5

1.05 1.05 1.05 1.05

S a m p le size

58 103 231 923

35 138 216 864

17 68 106 435

Table 2 illustrates the differences in sample size, given various hypothetical degrees of accuracy. With the Attribute Method, the sample size of the study should be based on the variable that yields the maximum sample size. In the example shown in Table 2, the tolerance of ?0.10 visits should be used as our criterion variable, as this would allow us to generalize to the population for distance as well as age.


"n", the sample size, is known. For example, the attribute formula for sample size is given as :

F = C 2 p(1-p).



solving for t

...1 C 2

t = V F P (1 - p)

The formulae for calculating the sample size can be used to solve other aspects such as, tolerance, confidence level or proportion accounted for when

The tolerance can be determined for a given sample size (F) ,confidence level (C 2 ), and percentage of occurrence. Taking the sample size of

V o l 4 2 N o 3 S ep tem b er 1 9 9 5



500, a 95% confidence level, and the 50% occurrence level, tolerance is

with 95% confidence, ?5% tolerance, and 50% occurrence would yield a sample size of 384 which is 153 cases larger.

V t =NM3L.~K~6JIH(.G5) F(E.5D) CBA

= ? 4.4%

Tolerances can be computed for various level of confidence, sample sizes and confidence rates, using this formula.


The main intention of the study is to compare Attribute Method with Continuous Variable Method. The Attribute Method is highly recommended for library and information science surveys as it allows the computation of sample size with reference to any parameters of the variable and can be substituted for Continuous Variable Method.

Continuous Variable Method can be used to calculate either confidence level or tolerance, given an estimate of standard deviation and a known sample size. The Continuous Variables sample size formula is given as :

The choice of the 95% level of confidence is arbitrary in many respects and the estimates of the desired tolerance or the standard deviation are often equally arbitrary depending on the knowledge the researcher has of the population parameters.

solving for t

where zq is the z-score, s is the standard deviation, and n is the sample size.

The ultimate criterion for choosing sample size is cost. If the costs of gathering each unit of information are high, the method that keeps sample size at a minimum can be used and when the costs per unit are low, methods that increase understanding, planning, and flexibility are to be considered. It is the author's contention that statistically trained researchers will probably continue to use Attribute Method as previously recommended.


From the example in Table 2, where the standard deviation for age was 15.5 and with a 95% confidence level, and a sample size of 500, tolerance can be calculated thus:

1.96 (15.5) = r500

= ? 1.36 years

1. CARPENTER (R L) and VASU (E S). Statistical methods for librarians. 1978. American Library Association; Chicago. p. 39.

2. DOUGHERTY (R M) and HEINRITZ (F J). Scientific management of library operations. 1982. The Scarecrow Press; Metuchen. pp.212-216.


(F N). Foundations


behavioural research. 1986.Holt, Rinehart and

Winston; New York. pp. 34-36,148-149.

It is possible to compute the confidence level given a tolerance, sample size and standard deviation.

In using the Continuous Variable Method with 95% confidence, ?2 units of tolerance, and a standard deviation of 15.5 yields a sample size of only 231. A calculation of sample size by the Attribute Method

4. ROSCOE (J T). Fundamental research statistics for behavioural science. 1975. Holt, Rinehart and Winston; New York. p. 184.

5. UKO (J U). Basic research method; GDM Handout. 1994. University of Calabar; Calabar. pp. 20-24.


A n n L ib S ci D o c

Download Pdf File