## ABSTRACT

A problem of the ratio-type estimators in Stratified Sampling is the use of non-attribute auxiliary information. In this study, some ratio-type estimators in stratified random sampling using attribute as auxiliary information are proposed. The sample mean of study variable and proportion of auxiliary attribute were transformed linearly and using auxiliary parameters respectively. Biases and mean square errors (MSE) for these estimators were derived. The MSE of these estimators were compared with the MSE of the traditional combined ratio estimator. The results show that the proposed estimators are more efficient and less bias than the combined ratio estimate in all conditions. An empirical study was also conducted using students height data from each faculty of the Usmanu Danfodiyo University, Sokoto. The results also show that the proposed estimators are more efficient and less bias than the combined ratio estimator. In addition, formulae for determination of sample sizes when the proposed estimators are adopted under various allocations (Optimum, Neyman and Proportional) for fixed cost and desired precision were obtained.

## TABLE OF CONTENTS

TITLE PAGE………………………………………………………………………………………………….. I

CERTIFICATION………………………………………………………………………………………….. II

DEDICATION ………………………………………………………………………………………………. III

ACKNOWLEDGEMENTS …………………………………………………………………………….. IV

TABLE OF CONTENTS ………………………………………………………………………………… V

LIST OF TABLES ………………………………………………………………………………………… VII

ABBREVIATIONS/NOTATIONS…………………………………………………………………. VIII

ABSTRACT ………………………………………………………………………………………………….. IX

CHAPTER ONE: INTRODUCTION …………………………………………………………………1

1.1 INTRODUCTION …………………………………………………………………………………………1

1.2 CENSUS VERSUS SAMPLE SURVEY ………………………………………………………………..3

1.3 RANDOM SAMPLING …………………………………………………………………………………..4

1.4 DEFINITION OF BASIC TERMS ……………………………………………………………………..5

1.5 AIM AND OBJECTIVES ………………………………………………………………………………..9

1.6 SIGNIFICANCE OF THE STUDY ……………………………………………………………………10

1.7 SCOPE AND LIMITATION …………………………………………………………………………..10

CHAPTER TWO: LITERATURE REVIEW ……………………………………………………12

2.1 RATIO ESTIMATORS …………………………………………………………………………………12

2.2 RANKED SET SAMPLING …………………………………………………………………………..15

2.3 STRATIFIED RATIO ESTIMATOR…………………………………………………………………15

CHAPTER THREE: MATERIALS AND METHODS ………………………………………20

3.1 INTRODUCTION ……………………………………………………………………………………….20

3.2 DATA USED FOR THE ANALYSIS …………………………………………………………………20

3.3 SOFTWARE USED FOR THE ANALYSIS ………………………………………………………….20

3.4 PROPOSED ESTIMATORS …………………………………………………………………………..21

3.5 BIAS AND MEAN SQUARE ERROR (MSE) OF ESTIMATOR

Tˆ …………………………..24

3.6 BIAS AND MEAN SQUARE ERROR OF THE PROPOSED ESTIMATORS

ˆ

i T …………….30

3.7 EFFICIENCY COMPARISONS ………………………………………………………………………37

3.8 PROPERTIES OF THE PROPOSED ESTIMATORS ……………………………………………..39

3.9 DETERMINATION OF SAMPLE SIZE …………………………………………………………….39

3.9.1 Constants of Proportionality for Fixed Cost ………………………………………….43

vi

3.9.2 Constants of Proportionality for Fixed precision ……………………………………45

CHAPTER FOUR: EMPIRICAL STUDY ………………………………………………………..49

4.1 PRE-AMBLE …………………………………………………………………………………………….49

4.2 RESULTS AND DISCUSSION ………………………………………………………………………..51

CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATION …..54

5.1 SUMMARY ………………………………………………………………………………………………54

5.2 CONCLUSION …………………………………………………………………………………………..54

5.3 RECOMMENDATION …………………………………………………………………………………55

REFERENCES ………………………………………………………………………………………………56

APPENDIX I: DATA ……………………………………………………………………………………..60

APPENDIX II: SOURCE CODE ……………………………………………………………………..70

## CHAPTER ONE

INTRODUCTION

1.1 INTRODUCTION

Prior knowledge about population mean along with coefficient of variation, kurtosis and correlation of the population of an auxiliary variable are known to be very useful particularly when the ratio, product and regression estimators are used for estimation of population mean of a variable of interest. The use of auxiliary information can increase the precision of an estimator when study variable is highly correlated with auxiliary variable. Srivastava and Jhajj (1981) suggested a class of estimators of the population mean, provided that the mean and variance of the auxiliary variable are known. Singh and Tailor (2003) considered a modified ratio estimator by exploiting the known value of correlation coefficient of the auxiliary variable. Singh and Upadhyaya (1999) suggested two ratio-type estimators when the coefficient of variation and kurtosis of the auxiliary variable are known.

However, the fact that the known population proportion of an attribute also provides similar type of information has not drawn as much attention. In several situations, instead of existence of auxiliary variables there exists some auxiliary attributes, which are highly correlated with study variable (Singh et. al.,2008). For example, sex and height of the persons, amount of milk produced by a particular breed of cow, amount of yield of wheat crop by a particular variety of wheat etc. (Jhajj et. al., 2006). In such situations, taking the advantage of point-biserial correlation between the study variable and the auxiliary attribute, the estimators of parameters of interest can be constructed by using prior knowledge of the parameters of auxiliary attribute.

2

It is often useful to incorporate auxiliary information of the population in a sampling

procedure. In practice, auxiliary information can be obtained in different ways. For

example, the sampling frames often used in official statistics production may include

auxiliary information on the population elements or these data are extracted from

administrative registers and are merged with the sampling frame elements. In other

words, aggregate-level of auxiliary information can be obtained from different sources,

such as published official statistics. Use of auxiliary information in sampling and

estimation can be very useful in the construction of an efficient sampling design.

In the estimation of population parameters, auxiliary information is used to improve

efficiency for the variable of interest. Whenever there is auxiliary information, the

researcher wants to utilize it in the method of estimation to obtain the most efficient

estimator.

In simple random sampling, the variance of the estimate (say, of population mean Y )

depends, apart from the sample size, on the variability of the character y in the

population. If the population is very heterogeneous and considerations of cost limit the

size of the sample, it may be found impossible to get a sufficiently precise estimate by

taking a simple random sample from entire population. And populations encountered in

practice are generally very heterogeneous (Raj and Chandhok, 1998). In surveys of

manufacturing establishments, for example, it can be found that some establishment are

very large, that is, they employ 1000 or more persons, but there are many others which

have only two or three persons on their rolls. Any estimate made from a direct random

sample taken from the totality of such establishments would be subject to exceedingly

large sampling fluctuations. But suppose it is possible to divide this population into parts

3

or strata on the basis of, say employment, thereby separating the very large ones, the

medium-sized ones and the smaller ones. If a random of establishments is now taken

from each stratum, it should be possible to make a better estimate of the strata average,

which in turn should help in producing a better of the population average. Similarly, if a

sample is selected with probability proportionate to x from the entire population, the

variance of the population-total estimate may be very high because the ratio of y to x

varies considerably over the population. If a way can be found of subdividing the

population so that the variation of the ratio of y to x is considerably reduced within the

subdivisions or strata, a better estimate of the population can be made. This is the basic

consideration involved in the use of stratification for improving the precision of

estimation (Raj and Chandhok, 1998).

1.2 CENSUS VERSUS SAMPLE SURVEY

Broadly speaking, information on population may be collected in two ways. Either every

unit in the population is enumerated (called complete enumeration, or census) or

enumeration is limited to only a part or sample selected from the population (called

sample enumeration or sample survey). A sample survey will usually be less costly than a

complete census because the expense of covering all units would be greater than that of

covering only a sample fraction. Also, it will take less time to collect and process data

from a sample than from a census. But economy is not the only consideration; the most

important point is whether the accuracy of the results would be adequate for the end in

view. It is a curious fact that the results from a carefully planned and well executed

sample survey are expected to be more accurate (near to the aim of study) than those

from a complete enumeration that can be taken. A complete census ordinarily requires a

4

huge and unwieldy organization and therefore many types of errors creep in which cannot

be controlled adequately. In a sample survey the volume of work is reduced considerably,

and it becomes possible to employ persons of higher caliber, train them suitably, and

supervise their work adequately. In a properly designed sample survey it is also possible

to make a valid estimate of the margin of error and hence decide whether the results are

sufficiently accurate. A complete census does not reveal by its self the margin of

uncertainty to which it is subject. But there is not always a choice of one versus the other.

For example, if the data are required for every small administrative area in a country, no

sample survey of a reasonable size will be able to deliver the desired information; only a

complete census can do this (Raj and Chandhok, 1998).

1.3 RANDOM SAMPLING

Simple random sampling is a method of selecting n units out of the N such that every

one of the N n C distinct samples has an equal chance of being drawn. In practice a simple

random sample is drawn unit by unit. The units in the population are numbered from 1

to N . A series of random numbers between 1 and N is then drawn, either by means of a

table of random numbers or by means of a computer program that produces such a table.

At any draw the process used must give an equal chance of selection to any number in the

population not already drawn. The units that bear these n numbers constitute the sample.

It is easily verified that all N n C distinct samples have an equal chance of being selected

by this method. Consider one distinct sample, that is, one set of n specified units. At the

first draw the probability that some one of the n specified units is selected is n

N

. At the

second draw the probability that some one of the remaining (n1) specified units is

5

drawn is, and so on. Hence the probability that all n specified units are selected in n

draws is

( 1) ( 2) 1 !( )! 1

. . …

( 1) ( 2) ( 1) ! N n

n n n n N n

N N N N n N C

Since a number that has been drawn is removed from the population for all subsequent

draws, this method is also called random sampling without replacement. Random

sampling with replacement is entirely feasible; at any draw, all N members of the

population are given equal chance of being drawn, no matter how often they have been

drawn. The formulas for the variances and estimated variances of estimates made from

the sample are often simpler when sampling is with replacement than when it is without

replacement. For this reason sampling with replacement is sometimes used in the more

complex sampling plans (Cochran, 1977).

1.4 DEFINITION OF BASIC TERMS

Sample:- A sample is a group of units selected from larger group (population). By

studying the sample, it is hoped to draw valid conclusions about the larger group. A

sample is generally selected for study because the population is too large to study in its

entirety. The sample should be representative of general population. This is often best

achieved by random sampling. Also, before collecting the sample, it is important that the

researcher carefully and completely defines the population, including a description of the

members to be included (Cochran, 1977).

Parameter:- A parameter is a value usually unknown (and which therefore has to be

estimated), used to represent a certain population characteristic. Within a population, a

6

parameter is fixed value which does not vary. They are often denoted by Greek letters

(Cochran, 1977).

Statistic:- A statistic is a quantity that is calculated from a sample data. It is used to give

information about unknown values in the corresponding population. it is possible to draw

more than one sample from the same population and the value of a statistic will in general

vary from sample to sample. Therefore, statistic is a random variable (Cochran, 1977).

Estimator:- An estimator is a rule for calculating an estimate of a given quantity based

on observed data. There are point and interval estimators. The point estimator yields

single-valued results, although this includes the possibility of single vector-valued results

and results that can be expressed as a single function. This is in contrast to an interval

estimator, where the results would be a range of plausible values (or vectors or

functions). An estimator is a statistic, (that is, a function of data) that is used to infer the

value of an unknown parameter in statistical model. The parameter being estimated is

sometimes called estimand. It can be either finite-dimensional (in parametric and semiparametric)

or finite-dimensional (in nonparametric and semi-nonparametric models). If

the parameter is denoted by , then the estimator is typically written as ˆ

. Being a

function of data, the estimator is a random variable (Cochran, 1977).

Bias:-the bias of an estimator is the difference between this estimator’s expected value

and the true value of the parameter being estimated. An estimator with zero bias is called

unbiased. Otherwise the estimator is said to be biased. Suppose we have a statistical

model parameterized by giving rise to a probability distribution for observed data

\ p x and a statistic ˆ

which serves as an estimator based on the any observed data

7

x . That is, we assume that our data follows some unknown distribution p x \ (where

is a fixed constant that is part of this distribution, but is unknown), and then we

construct some estimators ˆ

that maps observed data to values that we hope are close to.

Then the bias of this estimator is defined to be;

Bias ˆ Eˆ (Cochran, 1977).

Mean Square Error:- The MSE of an estimator is one of many ways to quantify the

difference between values implies by an estimator and the true values of the quantity

being estimated. MSE is a risk function, corresponding to the expected value of the

squared error loss or quadratic loss. MSE measures the average of the squares of the

errors. The error is the amount by which the value implied by the estimator differs from

the quantity to be estimated. The difference occurs because of randomness or because the

estimator doesn’t account for information that could produce a more accurate estimate.

The MSE is the second moment (about the origin) of the error, and thus incorporates both

the variance of the estimator and its bias. For an unbiased estimator, the MSE is the

variance. The MSE of an estimator ˆ

with respect to the estimated is defined

mathematically as;

2 MSE ˆ E ˆ

2

var ˆ Bias ˆ

The MSE thus assess the quality of an estimator in terms of its variat ion and

unbiasedness (Cochran, 1977).

8

Kurtosis:- Kurtosis is any measure of the peakedness of the probability distribution of a

real-valued random variable. It is descriptor of the shape of probability distributions. One

common measure of kurtosis originated by Pearson, is based on a scaled version of the

fourth moment of the data or population. For this measure, higher kurtosis means more of

the variance is the result of infrequent extreme deviations as opposed to frequent

modestly sized deviations. Distributions with negative or positive excess are called

platykurtic or leptokurtic respectively. The fourth standardized moment is defined as;

4

2 4

, Where 4 is the fourth moment about the mean and is the standard

deviation (Cochran, 1977).

Point-biserial correlation coefficient:- Point-biserial correlation coefficient denoted by

pb is a correlation used when one variable (e.gY ) is dichotomous ; Y can either be

naturally dichotomous like gender or an artificial dichotomous variable. Point-biserial

correlation is mathematically equivalent to the Pearson product moment correlation; that

is, if we have one continuously measured variable X and a dichotomous variable Y .

This can be shown by assigning two distinct numerical values (say, 1 and 2) to

dichotomous variable. The Point-biserial correlation coefficient is given as;

1 2 1 2

pb 2

n

M M n n

s n

Where 2

1

1 n

n i

i

s x x

n

the standard deviation for X , 1 M is the mean value on the

continuous variable for all data points in group 1, 2 M is the mean value on the

9

continuous variable for all data points in group 2, 1 n is the number of data point in

group1, 2 n is the number of data point in group 2 and n is the sample size (John, 2008).

Coefficient of Variation (CV):- CV is a normalized measure of dispersion of a

probability distribution. It is known as unitized risk or the variation coefficient. The

absolute value of CV is sometimes known as relative standard deviation (RSD), which is

express as a percentage. The CV is defined as the ratio of the standard deviation to the

mean;

CV

which is the inverse of the signal-to-noise ratio. It shows the extent of variability

in relation to mean of the population (Cochran, 1977).

1.5 AIM AND OBJECTIVES

The aim of this research work is to develop some ratio-type estimators under stratified

random sampling scheme using auxiliary attributes that will produce more precise

estimates than the conventional estimator.

The above aim is achieved through the following objectives;

1. To linearly transform the sample mean of the variable of interest.

2. To transform the proportion of auxiliary attributes using auxiliary parameters like

kurtosis, coefficient of variation and coefficient of Point-biserial correlation.

3. To obtain the biases and mean square errors of the proposed estimators up to first

order approximation using Taylors’ expansion.

10

4. To obtain the conditions for efficiency of the proposed estimators over the conventional estimator.

1.6 SIGNIFICANCE OF THE STUDY

Ratio estimators of population parameters are more precise than their simple random sampling estimators’ counterparts (Cochran 1942). The mean square error of ratio estimator can be reduced with the application of transformation on the study and auxiliary variables (Chaudhuri and Adrikari 1979). Situations arise when the available auxiliary information are inform of attributes instead of variables. Based on these situations, some ratio-type estimators had been proposed by several researchers in simple random sampling which regards the population units as homogeneous. There are possibilities in which population units are heterogeneous as a whole but homogeneous within sub-populations (strata). In such situations, there is need to develop estimators that capture the variability within and between the strata for population parameters of interest with emphasis to bias reduction and efficiency improvement.

1.7 SCOPE AND LIMITATION

This research work primarily considers some ratio-type estimators in stratified sampling using attribute as auxiliary information. The transformation of the study variable mean is linear and kurtosis, coefficient of variation and coefficient of point-biserial correlation are the parameters of auxiliary attribute used for the transformation of proportion of auxiliary attribute. The data used for the empirical study was taken from Students Pre-medical Registration, Usmanu Danfodiyo University, Sokoto (2011/2012 Session). The results of the analysis are limited to the data used, the set of the proposed estimators and the sample

11

sizes taken from the data used. In future research, efforts will be made toward modification of the proposed estimators to obtain unbiased or almost unbiased estimators with higher precisions.

Do you need help? Talk to us right now: (+234) 08060082010, 08107932631 (Call/WhatsApp). Email: [email protected]

**IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»**