ABSTRACT
A problem of the ratio-type estimators in Stratified Sampling is the use of non-attribute auxiliary information. In this study, some ratio-type estimators in stratified random sampling using attribute as auxiliary information are proposed. The sample mean of study variable and proportion of auxiliary attribute were transformed linearly and using auxiliary parameters respectively. Biases and mean square errors (MSE) for these estimators were derived. The MSE of these estimators were compared with the MSE of the traditional combined ratio estimator. The results show that the proposed estimators are more efficient and less bias than the combined ratio estimate in all conditions. An empirical study was also conducted using students height data from each faculty of the Usmanu Danfodiyo University, Sokoto. The results also show that the proposed estimators are more efficient and less bias than the combined ratio estimator. In addition, formulae for determination of sample sizes when the proposed estimators are adopted under various allocations (Optimum, Neyman and Proportional) for fixed cost and desired precision were obtained.
TABLE OF CONTENTS
TITLE PAGE………………………………………………………………………………………………….. I
CERTIFICATION………………………………………………………………………………………….. II
DEDICATION ………………………………………………………………………………………………. III
ACKNOWLEDGEMENTS …………………………………………………………………………….. IV
TABLE OF CONTENTS ………………………………………………………………………………… V
LIST OF TABLES ………………………………………………………………………………………… VII
ABBREVIATIONS/NOTATIONS…………………………………………………………………. VIII
ABSTRACT ………………………………………………………………………………………………….. IX
CHAPTER ONE: INTRODUCTION …………………………………………………………………1
1.1 INTRODUCTION …………………………………………………………………………………………1
1.2 CENSUS VERSUS SAMPLE SURVEY ………………………………………………………………..3
1.3 RANDOM SAMPLING …………………………………………………………………………………..4
1.4 DEFINITION OF BASIC TERMS ……………………………………………………………………..5
1.5 AIM AND OBJECTIVES ………………………………………………………………………………..9
1.6 SIGNIFICANCE OF THE STUDY ……………………………………………………………………10
1.7 SCOPE AND LIMITATION …………………………………………………………………………..10
CHAPTER TWO: LITERATURE REVIEW ……………………………………………………12
2.1 RATIO ESTIMATORS …………………………………………………………………………………12
2.2 RANKED SET SAMPLING …………………………………………………………………………..15
2.3 STRATIFIED RATIO ESTIMATOR…………………………………………………………………15
CHAPTER THREE: MATERIALS AND METHODS ………………………………………20
3.1 INTRODUCTION ……………………………………………………………………………………….20
3.2 DATA USED FOR THE ANALYSIS …………………………………………………………………20
3.3 SOFTWARE USED FOR THE ANALYSIS ………………………………………………………….20
3.4 PROPOSED ESTIMATORS …………………………………………………………………………..21
3.5 BIAS AND MEAN SQUARE ERROR (MSE) OF ESTIMATOR
Tˆ …………………………..24
3.6 BIAS AND MEAN SQUARE ERROR OF THE PROPOSED ESTIMATORS
ˆ
i T …………….30
3.7 EFFICIENCY COMPARISONS ………………………………………………………………………37
3.8 PROPERTIES OF THE PROPOSED ESTIMATORS ……………………………………………..39
3.9 DETERMINATION OF SAMPLE SIZE …………………………………………………………….39
3.9.1 Constants of Proportionality for Fixed Cost ………………………………………….43
vi
3.9.2 Constants of Proportionality for Fixed precision ……………………………………45
CHAPTER FOUR: EMPIRICAL STUDY ………………………………………………………..49
4.1 PRE-AMBLE …………………………………………………………………………………………….49
4.2 RESULTS AND DISCUSSION ………………………………………………………………………..51
CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATION …..54
5.1 SUMMARY ………………………………………………………………………………………………54
5.2 CONCLUSION …………………………………………………………………………………………..54
5.3 RECOMMENDATION …………………………………………………………………………………55
REFERENCES ………………………………………………………………………………………………56
APPENDIX I: DATA ……………………………………………………………………………………..60
APPENDIX II: SOURCE CODE ……………………………………………………………………..70
CHAPTER ONE
INTRODUCTION
1.1 INTRODUCTION
Prior knowledge about population mean along with coefficient of variation, kurtosis and correlation of the population of an auxiliary variable are known to be very useful particularly when the ratio, product and regression estimators are used for estimation of population mean of a variable of interest. The use of auxiliary information can increase the precision of an estimator when study variable is highly correlated with auxiliary variable. Srivastava and Jhajj (1981) suggested a class of estimators of the population mean, provided that the mean and variance of the auxiliary variable are known. Singh and Tailor (2003) considered a modified ratio estimator by exploiting the known value of correlation coefficient of the auxiliary variable. Singh and Upadhyaya (1999) suggested two ratio-type estimators when the coefficient of variation and kurtosis of the auxiliary variable are known.
However, the fact that the known population proportion of an attribute also provides similar type of information has not drawn as much attention. In several situations, instead of existence of auxiliary variables there exists some auxiliary attributes, which are highly correlated with study variable (Singh et. al.,2008). For example, sex and height of the persons, amount of milk produced by a particular breed of cow, amount of yield of wheat crop by a particular variety of wheat etc. (Jhajj et. al., 2006). In such situations, taking the advantage of point-biserial correlation between the study variable and the auxiliary attribute, the estimators of parameters of interest can be constructed by using prior knowledge of the parameters of auxiliary attribute.
2
It is often useful to incorporate auxiliary information of the population in a sampling
procedure. In practice, auxiliary information can be obtained in different ways. For
example, the sampling frames often used in official statistics production may include
auxiliary information on the population elements or these data are extracted from
administrative registers and are merged with the sampling frame elements. In other
words, aggregate-level of auxiliary information can be obtained from different sources,
such as published official statistics. Use of auxiliary information in sampling and
estimation can be very useful in the construction of an efficient sampling design.
In the estimation of population parameters, auxiliary information is used to improve
efficiency for the variable of interest. Whenever there is auxiliary information, the
researcher wants to utilize it in the method of estimation to obtain the most efficient
estimator.
In simple random sampling, the variance of the estimate (say, of population mean Y )
depends, apart from the sample size, on the variability of the character y in the
population. If the population is very heterogeneous and considerations of cost limit the
size of the sample, it may be found impossible to get a sufficiently precise estimate by
taking a simple random sample from entire population. And populations encountered in
practice are generally very heterogeneous (Raj and Chandhok, 1998). In surveys of
manufacturing establishments, for example, it can be found that some establishment are
very large, that is, they employ 1000 or more persons, but there are many others which
have only two or three persons on their rolls. Any estimate made from a direct random
sample taken from the totality of such establishments would be subject to exceedingly
large sampling fluctuations. But suppose it is possible to divide this population into parts
3
or strata on the basis of, say employment, thereby separating the very large ones, the
medium-sized ones and the smaller ones. If a random of establishments is now taken
from each stratum, it should be possible to make a better estimate of the strata average,
which in turn should help in producing a better of the population average. Similarly, if a
sample is selected with probability proportionate to x from the entire population, the
variance of the population-total estimate may be very high because the ratio of y to x
varies considerably over the population. If a way can be found of subdividing the
population so that the variation of the ratio of y to x is considerably reduced within the
subdivisions or strata, a better estimate of the population can be made. This is the basic
consideration involved in the use of stratification for improving the precision of
estimation (Raj and Chandhok, 1998).
1.2 CENSUS VERSUS SAMPLE SURVEY
Broadly speaking, information on population may be collected in two ways. Either every
unit in the population is enumerated (called complete enumeration, or census) or
enumeration is limited to only a part or sample selected from the population (called
sample enumeration or sample survey). A sample survey will usually be less costly than a
complete census because the expense of covering all units would be greater than that of
covering only a sample fraction. Also, it will take less time to collect and process data
from a sample than from a census. But economy is not the only consideration; the most
important point is whether the accuracy of the results would be adequate for the end in
view. It is a curious fact that the results from a carefully planned and well executed
sample survey are expected to be more accurate (near to the aim of study) than those
from a complete enumeration that can be taken. A complete census ordinarily requires a
4
huge and unwieldy organization and therefore many types of errors creep in which cannot
be controlled adequately. In a sample survey the volume of work is reduced considerably,
and it becomes possible to employ persons of higher caliber, train them suitably, and
supervise their work adequately. In a properly designed sample survey it is also possible
to make a valid estimate of the margin of error and hence decide whether the results are
sufficiently accurate. A complete census does not reveal by its self the margin of
uncertainty to which it is subject. But there is not always a choice of one versus the other.
For example, if the data are required for every small administrative area in a country, no
sample survey of a reasonable size will be able to deliver the desired information; only a
complete census can do this (Raj and Chandhok, 1998).
1.3 RANDOM SAMPLING
Simple random sampling is a method of selecting n units out of the N such that every
one of the N n C distinct samples has an equal chance of being drawn. In practice a simple
random sample is drawn unit by unit. The units in the population are numbered from 1
to N . A series of random numbers between 1 and N is then drawn, either by means of a
table of random numbers or by means of a computer program that produces such a table.
At any draw the process used must give an equal chance of selection to any number in the
population not already drawn. The units that bear these n numbers constitute the sample.
It is easily verified that all N n C distinct samples have an equal chance of being selected
by this method. Consider one distinct sample, that is, one set of n specified units. At the
first draw the probability that some one of the n specified units is selected is n
N
. At the
second draw the probability that some one of the remaining (n1) specified units is
5
drawn is, and so on. Hence the probability that all n specified units are selected in n
draws is
( 1) ( 2) 1 !( )! 1
. . …
( 1) ( 2) ( 1) ! N n
n n n n N n
N N N N n N C
Since a number that has been drawn is removed from the population for all subsequent
draws, this method is also called random sampling without replacement. Random
sampling with replacement is entirely feasible; at any draw, all N members of the
population are given equal chance of being drawn, no matter how often they have been
drawn. The formulas for the variances and estimated variances of estimates made from
the sample are often simpler when sampling is with replacement than when it is without
replacement. For this reason sampling with replacement is sometimes used in the more
complex sampling plans (Cochran, 1977).
1.4 DEFINITION OF BASIC TERMS
Sample:- A sample is a group of units selected from larger group (population). By
studying the sample, it is hoped to draw valid conclusions about the larger group. A
sample is generally selected for study because the population is too large to study in its
entirety. The sample should be representative of general population. This is often best
achieved by random sampling. Also, before collecting the sample, it is important that the
researcher carefully and completely defines the population, including a description of the
members to be included (Cochran, 1977).
Parameter:- A parameter is a value usually unknown (and which therefore has to be
estimated), used to represent a certain population characteristic. Within a population, a
6
parameter is fixed value which does not vary. They are often denoted by Greek letters
(Cochran, 1977).
Statistic:- A statistic is a quantity that is calculated from a sample data. It is used to give
information about unknown values in the corresponding population. it is possible to draw
more than one sample from the same population and the value of a statistic will in general
vary from sample to sample. Therefore, statistic is a random variable (Cochran, 1977).
Estimator:- An estimator is a rule for calculating an estimate of a given quantity based
on observed data. There are point and interval estimators. The point estimator yields
single-valued results, although this includes the possibility of single vector-valued results
and results that can be expressed as a single function. This is in contrast to an interval
estimator, where the results would be a range of plausible values (or vectors or
functions). An estimator is a statistic, (that is, a function of data) that is used to infer the
value of an unknown parameter in statistical model. The parameter being estimated is
sometimes called estimand. It can be either finite-dimensional (in parametric and semiparametric)
or finite-dimensional (in nonparametric and semi-nonparametric models). If
the parameter is denoted by , then the estimator is typically written as ˆ
. Being a
function of data, the estimator is a random variable (Cochran, 1977).
Bias:-the bias of an estimator is the difference between this estimator’s expected value
and the true value of the parameter being estimated. An estimator with zero bias is called
unbiased. Otherwise the estimator is said to be biased. Suppose we have a statistical
model parameterized by giving rise to a probability distribution for observed data
\ p x and a statistic ˆ
which serves as an estimator based on the any observed data
7
x . That is, we assume that our data follows some unknown distribution p x \ (where
is a fixed constant that is part of this distribution, but is unknown), and then we
construct some estimators ˆ
that maps observed data to values that we hope are close to.
Then the bias of this estimator is defined to be;
Bias ˆ Eˆ (Cochran, 1977).
Mean Square Error:- The MSE of an estimator is one of many ways to quantify the
difference between values implies by an estimator and the true values of the quantity
being estimated. MSE is a risk function, corresponding to the expected value of the
squared error loss or quadratic loss. MSE measures the average of the squares of the
errors. The error is the amount by which the value implied by the estimator differs from
the quantity to be estimated. The difference occurs because of randomness or because the
estimator doesn’t account for information that could produce a more accurate estimate.
The MSE is the second moment (about the origin) of the error, and thus incorporates both
the variance of the estimator and its bias. For an unbiased estimator, the MSE is the
variance. The MSE of an estimator ˆ
with respect to the estimated is defined
mathematically as;
2 MSE ˆ E ˆ
2
var ˆ Bias ˆ
The MSE thus assess the quality of an estimator in terms of its variat ion and
unbiasedness (Cochran, 1977).
8
Kurtosis:- Kurtosis is any measure of the peakedness of the probability distribution of a
real-valued random variable. It is descriptor of the shape of probability distributions. One
common measure of kurtosis originated by Pearson, is based on a scaled version of the
fourth moment of the data or population. For this measure, higher kurtosis means more of
the variance is the result of infrequent extreme deviations as opposed to frequent
modestly sized deviations. Distributions with negative or positive excess are called
platykurtic or leptokurtic respectively. The fourth standardized moment is defined as;
4
2 4
, Where 4 is the fourth moment about the mean and is the standard
deviation (Cochran, 1977).
Point-biserial correlation coefficient:- Point-biserial correlation coefficient denoted by
pb is a correlation used when one variable (e.gY ) is dichotomous ; Y can either be
naturally dichotomous like gender or an artificial dichotomous variable. Point-biserial
correlation is mathematically equivalent to the Pearson product moment correlation; that
is, if we have one continuously measured variable X and a dichotomous variable Y .
This can be shown by assigning two distinct numerical values (say, 1 and 2) to
dichotomous variable. The Point-biserial correlation coefficient is given as;
1 2 1 2
pb 2
n
M M n n
s n
Where 2
1
1 n
n i
i
s x x
n
the standard deviation for X , 1 M is the mean value on the
continuous variable for all data points in group 1, 2 M is the mean value on the
9
continuous variable for all data points in group 2, 1 n is the number of data point in
group1, 2 n is the number of data point in group 2 and n is the sample size (John, 2008).
Coefficient of Variation (CV):- CV is a normalized measure of dispersion of a
probability distribution. It is known as unitized risk or the variation coefficient. The
absolute value of CV is sometimes known as relative standard deviation (RSD), which is
express as a percentage. The CV is defined as the ratio of the standard deviation to the
mean;
CV
which is the inverse of the signal-to-noise ratio. It shows the extent of variability
in relation to mean of the population (Cochran, 1977).
1.5 AIM AND OBJECTIVES
The aim of this research work is to develop some ratio-type estimators under stratified
random sampling scheme using auxiliary attributes that will produce more precise
estimates than the conventional estimator.
The above aim is achieved through the following objectives;
1. To linearly transform the sample mean of the variable of interest.
2. To transform the proportion of auxiliary attributes using auxiliary parameters like
kurtosis, coefficient of variation and coefficient of Point-biserial correlation.
3. To obtain the biases and mean square errors of the proposed estimators up to first
order approximation using Taylors’ expansion.
10
4. To obtain the conditions for efficiency of the proposed estimators over the conventional estimator.
1.6 SIGNIFICANCE OF THE STUDY
Ratio estimators of population parameters are more precise than their simple random sampling estimators’ counterparts (Cochran 1942). The mean square error of ratio estimator can be reduced with the application of transformation on the study and auxiliary variables (Chaudhuri and Adrikari 1979). Situations arise when the available auxiliary information are inform of attributes instead of variables. Based on these situations, some ratio-type estimators had been proposed by several researchers in simple random sampling which regards the population units as homogeneous. There are possibilities in which population units are heterogeneous as a whole but homogeneous within sub-populations (strata). In such situations, there is need to develop estimators that capture the variability within and between the strata for population parameters of interest with emphasis to bias reduction and efficiency improvement.
1.7 SCOPE AND LIMITATION
This research work primarily considers some ratio-type estimators in stratified sampling using attribute as auxiliary information. The transformation of the study variable mean is linear and kurtosis, coefficient of variation and coefficient of point-biserial correlation are the parameters of auxiliary attribute used for the transformation of proportion of auxiliary attribute. The data used for the empirical study was taken from Students Pre-medical Registration, Usmanu Danfodiyo University, Sokoto (2011/2012 Session). The results of the analysis are limited to the data used, the set of the proposed estimators and the sample
11
sizes taken from the data used. In future research, efforts will be made toward modification of the proposed estimators to obtain unbiased or almost unbiased estimators with higher precisions.
Do you need help? Talk to us right now: (+234) 08060082010, 08107932631 (Call/WhatsApp). Email: [email protected].
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»