A Generalized Two Phase Sampling Estimator of Ratio of Population Means Using Auxiliary Information

Tarunpreet Kaur Ahuja1, Peeyush Misra1,* and O. K. Belwal2

1Department of Statistics, D.A.V.(P.G.) College, Dehradun-248001,

Uttarakhand, India

2Department of Statistics, HNB Garhwal University, Srinagar-246174, Garhwal,

Uttarakhand, India

E-mail: tarunpreetkaur23@gmail.com; dr.pmisra.dav@gmail.com; okbelwal@rediffmail.com

*Corresponding Author

Received 02 July 2020; Accepted 20 December 2020; Publication 08 March 2021


This paper addresses the problem of estimating ratio of two population means by using quantitative auxiliary knowledge in the form of first and second moments. Through this paper, an improved generalized two phase sampling estimator has been proposed. The relative bias and mean squared error of the suggested estimator has been derived and studied. Also, a comparative study with the conventional estimators has been included to establish its superiority. Besides theoretical comparisons, a subset of optimum estimators having the same minimum mean squared error (MSE) is also explored. An empirical study is also carried out to support theoretical results.

Keywords: Auxiliary Character, Two-phase Sampling, Taylor’s Series, Bias, Mean Squared Error and Efficiency.

1 Introduction

Sampling theory deals with optimum combination of sampling and estimation procedures so that inferences about the population parameters are made with minimum error. The challenging issue of estimating ratio of two population means assumes great importance in the enormous literature of sampling theory. Since past sixty decades, numerous authors have highlighted the concept of incorporating auxiliary information at estimation stage resulting in enhancement of the efficiency and precision of estimators. In almost every field of scientific study like agriculture, forestry, economics surveys, management, biomedical sciences and the likes, estimation of population ratio assumes significant importance. Input-output ratio in an industrial survey, outlay on employees to the entire expenses, proportion of liquid to total assets, profitability rate, crop production rate, literacy rate are some illustrations of estimating ratio of two population parameters. Many a times, a health analyst may be interested in estimating growth index by measuring the ratio of weight to height using chest and skull circumference as auxiliary variables. Many elite survey statisticians have made meritorious efforts to estimate population ratio. For greater knowledge one may see Murthy (1967), Cochran (1977), Sukhatme et al. (1984), Singh and Chaudhary (1997) and Mukhopadhyay (2012).

A wide literature depicts the contribution of several authors who addressed this problem and estimated population ratio by using supplementary knowledge in whatsoever form available. To acquire knowledge on the various historical developments in this context, distinguished works of Singh (1965, 1967, 1969), Shah and Shah (1978), Tripathi (1980), Singh (1982), Singh (1998), Biradar and Singh (1997–98), Upadhyaya et al. (2000), Singh and Rani (2005, 2006), Singh and Naqvi (2015) and Kumar and Srivastava (2018) can be revisited. Although, ample of similar estimators by renowned statisticians are registered in sampling literature, yet there always remains potential and possibilities for improvements. An earnest effort in this regard is made in the subsequent sections of this manuscript.

2 Proposed Estimator

Subsidiary information on one or more auxiliary variable may be known beforehand through census reports, pilot survey, and historical data. Many a times a sampler may encounter a practical situation wherein parametric information associated with the auxiliary variables is not known apriori. This subsidiary information may sometimes be completely or partially lacking. To overcome such difficult situations, Neyman (1938) proposed double or two phase sampling technique. This sampling technique is highly recommended as it happens to be more flexible, robust and considerably cost effective (economical) procedure to develop reliable estimates of unknown population characteristics.

Let (y1,y2) be the variables under the reference of study highly correlated with the auxiliary variable x. Using SRSWOR design in either phases, a double sampling or two-phase sampling technique is described as:

(i) At the first phase, we select a preliminary large sample (x1,x2,,xn) whose size is n from a population U having N distinct units. The first phase sample is taken on only ancillary variableX and its sample mean is denoted byx¯.

(ii) At the second phase, we select a small sub sample {(y11,y21,x1),(y12, y22,x2),,(y1n,y2n,xn)} of size n from the large first phase sample. The second phase sample is observed on both the study variable Y1,Y2 and the auxiliary variable X and their respective means is represented by y¯1,y¯2 and x¯.

Let us denote


Let population parameters Y¯1,Y¯2 and X¯ denotes population mean and SY12,SY22 and SX2 denotes population variance of main variable under study and correlated ancillary character.

We have

Y¯1 =1Ni=1NY1i,Y¯2=1Ni=1NY2i,X¯=1Ni=1NXi,
SY12 =1N-1i=1N(Y1i-Y1¯)2,SY22=1N-1i=1N(Y2i-Y2¯)2,
SX2 =1N-1i=1N(Xi-X¯)2
Sy1x =1N-1i=1N(Y1i-Y¯1)(Xi-X¯),
Sy2x =1N-1i=1N(Y2i-Y¯2)(Xi-X¯)
Sy1y2 =1N-1i=1N(Y1i-Y¯1)(Y2i-Y¯2)and
μrst =1Ni=1N(Y1i-Y¯1)r(Y2i-Y¯2)s(Xi-X¯)t;r,s,t=0,1,2,3,4.
σY12 =1Ni=1N(Y1i-Y1¯)2,σY22=1Ni=1N(Y2i-Y2¯)2,
σX2 =1Ni=1N(Xi-X¯)2,ρ=SY1Y2SY1SY2,ρ1=SY1XSY1SX,
ρ2 =SY2XSY2SX,CY12=SY12Y¯12,CY22=SY22Y¯22,CX2=SX2X¯2.

It is indeed essential to emphasize that auxiliary information in terms of moments about zero, that is x¯,x¯, θ¯x and θ¯x has been utilized to define a generalized class of double sampling estimator represented as R^g for efficient estimation of population ratio as

R^g=g(y¯1,y¯2,x¯x¯,θ¯xθ¯x)=g(y¯1,y¯2,u1,u2) (1)

where R=Y¯1Y¯2, R^=y¯1y¯2, x¯x¯=u1, θ¯xθ¯x=u2, θ¯x=1ni=1nxi2, θ¯x=1ni=1nxi2 and g(y¯1,y¯2,u1,u2) satisfies the validity conditions of Taylor’s series expansion is a bounded function of t=(Y¯1,Y¯2,u1,u2) such that

(i) At the point T=(Y¯1,Y¯2,1,1) we have

g(t=T)=R=Y¯1Y¯2 (2)

(ii) The first order partial derivatives are

g1 =(g(y¯1,y¯2,u1,u2)y¯1)T=1Y¯2,
g2 =(g(y¯1,y¯2,u1,u2)y¯2)T=-Y¯1Y¯22
g3 =(g(y¯1,y¯2,u1,u2)u1)Tandg4=(g(y¯1,y¯2,u1,u2)u2)T (3)

(iii) Also, the second order partial derivatives are

g11 =(2g(y¯1,y¯2,u1,u2)y¯12)T=0,g22=(2g(y¯1,y¯2,u1,u2)y¯22)T
g33 =(2g(y¯1,y¯2,u1,u2)u12)T,g44=(2g(y¯1,y¯2,u1,u2)u22)T,
g13 =(2g(y¯1,y¯2,u1,u2)y¯1u1)T,g14=(2g(y¯1,y¯2,u1,u2)y¯1u2)T,
g23 =(2g(y¯1,y¯2,u1,u2)y¯2u1)T,g24=(2g(y¯1,y¯2,u1,u2)y¯2u2)T,
g34 =(2g(y¯1,y¯2,u1,u2)u1u2)T. (4)

3 The Expression for Bias and Mean Squared Error

For analyzing distinct properties relating to suggested estimator, we define

y¯1 =Y1¯(1+e1)y¯2=Y2¯(1+e2)
x¯ =X¯(1+e3)x¯=X¯(1+e3)
θ¯x =θ¯X(1+e4)θ¯x=θ¯X(1+e4) (5)

Since population under consideration is large enough relative to sample, for simplicity we ignore finite population correction terms. So that

E(ei) =0(i=1,2,3,4)andE(ej)=0(j=3,4) (6)
E(e12) =1nCY12E(e22)=1nCY22
E(e32) =1nCX2E(e32)=1nCX2
E(e42) =1nθ¯X2(μ004+4X¯μ003+4X¯2μ002-μ0022)
E(e42) =1nθ¯X2(μ004+4X¯μ003+4X¯2μ002-μ0022)
E(e1e2) =1nρCY1CY2=1nY¯1Y¯2μ110
E(e1e3) =1nρ1CY1CX=1nY¯1X¯μ101
E(e1e3) =1nρ1CY1CX,E(e2e3)=1nρ2CY2CX=1nY¯2X¯μ011
E(e2e3) =1nρ2CY2CX,E(e1e4)=1nY¯1θ¯X(μ102+2X¯μ101)
E(e1e4) =1nY¯1θ¯X(μ102+2X¯μ101),
E(e2e4) =1nY¯2θ¯X(μ012+2X¯μ011)
E(e2e4) =1nY¯2θ¯X(μ012+2X¯μ011),E(e3e3)=1nCX2=1nX¯2μ002
E(e3e4) =1nX¯θ¯X(μ003+2X¯μ002),
E(e3e4) =1nX¯θ¯X(μ003+2X¯μ002)
E(e3e4) =1nX¯θ¯X(μ003+2X¯μ002),
E(e3e4) =1nX¯θ¯X(μ003+2X¯μ002)
E(e4e4) =1nθ¯X2(μ004+4X¯μ003+4X¯2μ002-μ0022) (7)

For further simplifications, we now use Taylor’s series to expand g(y¯1,y¯2,u1,u2) about the point (Y¯1,Y¯2,1,1), we have

R^g =g(Y¯1,Y¯2,1,1)+(y¯1-Y¯1)g1+(y¯2-Y¯2)g2+(u1-1)g3
+(u2-1)u2}3g(y¯1*,y¯2*,u1*,u2*) (8)

where y¯1*=Y¯1+h(y¯1-Y¯1), y¯2*=Y¯2+h(y¯2-Y¯2), u1*=1+h(u1-1), u2*=1+h(u2-1) for 0<h<1.

Retaining only second order terms and rewriting above equation (3) in terms of ei’s, the result obtained to the approximation of order one is

R^g-R =(Re1-Re2+e3g3-e3g3+e4g4-e4g4)-e3e3g3
+2Y¯2(e2e4-e2e4)g24+2(e3e4-e3e4-e3e4+e3e4)g34} (9)

Further we take expectation on both the sides of Equation (3), the bias of the formulated generalized double sampling estimator R^g up to terms of order O(1/n) is expressed as

Bias(R^g) =12{Y¯12nCY12g11+Y¯22nCY22g22+2Y¯1Y¯2nρCY1CY2g12}
+2θ¯X(μ012+2X¯μ011)g24+2X¯θ¯X(μ003+2X¯μ002)g34} (10)

We now square Equation (3), the obtained MSE(R^g) after taking expectation, is given by

MSE(R^g) =E{(R^g)-R}2

The above expression is simplified further by substituting expected values given in Equations (6) and (7) as

MSE(R^g) =R2n(CY12+CY22-2ρCY1CY2)+(1n-1n)
+2g3g41X¯θ¯X(μ003+2X¯μ002)] (11)

The value of MSE(R^g) given in Equation (3) depends on the values of g3 and g4, Hence we differentiate the above Equation (3) with respect to g3 and g4. The obtained optimum result of g3 and g4 for which MSE(R^g) in Equation (3) attains the minimum value are

g3 =-RCCX3{C2CX2+δ2(δ2-δ1)Δ} (12)
g4 =RX¯θ¯X(δ2-δ1)Δ (13)


Δ =CX2X¯2(μ004+4X¯μ003+4X¯2μ002-μ0022)-(μ003+2X¯μ002)20
δ1 =CX2X¯{(μ102+2X¯μ101)Y¯1-(μ012+2X¯μ011)Y¯2}
δ2 =(ρ1CY1-ρ2CY2)CX(μ003+2X¯μ002)
C =(ρ1CY1-ρ2CY2)

Additionally, the resultant value for minimum mean squared error of R^g represented by MSE(R^g)min can be acquired by substituting results obtained in Equations (12) and (13) in Equation (3) as

MSE(R^g)min =R2n(CY12+CY22-2ρCY1CY2)



4 Efficiency Comparison

(i) The usual estimator of ratio of two population means and its particular cases with their respective mean squared error are given as follows

Table 1 Some particular estimators with relative MSE

Estimators MSE
R^=y¯1y¯2 R2n(CY12+CY22-2ρCY1CY2)
R^1=y¯1y¯2+k(x¯-X¯)=R^+k(x¯-X¯) MSE(R^)-(1n-1N)CX2R2D2
R^2=y¯1+k(x¯-X¯)y¯2 MSE(R^)-(1n-1N)CX2R2D2
R^3=y¯1y¯2.x¯X¯ MSE(R^)+(1n-1N)CX2R2(1+2D)
R^4=y¯1y¯2.X¯x¯ MSE(R^)+(1n-1N)CX2R2(1-2D)
R^5=y¯1+k(x¯-X¯)y¯2+(x¯-X¯) MSE(R^)-(1n-1N)CX2R2D2
R^6=y¯1+k(x¯-X¯)y¯2(x¯X¯) MSE(R^)-(1n-1N)CX2R2D2



(ii) The generalized estimator of ratio of two population means by Singh and Naqvi (2015) and its respective mean squared error is given as R^7=g(y¯1,y¯2,x¯)

MSE(R^7)min=MSE(R^)-(1n-1N)CX2R2D2 (16)

The proposed generalized estimator for ratio of two populations mean R^g has minimum mean squared error as


From MSE given in Equations (15)–(17) and the section given below, it can be clearly concluded that the suggested class of generalized double sampling estimator utilizing known values of first and second moment about zero has lesser MSE as compared to the usual estimator where in no such information is used. Therefore, for obtaining precise results, the use of proposed estimator under practical situation is recommended.

5 An Empirical Study

To support theoretical results, a numerical illustration has been carried out using the data set given on page 177, Singh and Chaudhary (2009). The summary of the population data set is as follows.

Y¯1 =856.4118,Y¯2=208.8824,X¯=199.4412,CY1=0.8372,
CY2 =0.7205,CX=0.7532,ρY1Y2=0.2090,ρY1X=0.2105,
ρY2X =0.9801,n=12,n=34

Table 2 MSE and PRE comparison of traditional estimators with R^g

Estimators MSE PRE
R^ 1.555166 159
R^1 1.103932 113
R^2 1.103932 113
R^3 1.184017 121
R^4 3.749341 384
R^5 1.103932 113
R^6 1.103932 113
R^7 1.103932 113
R^g 0.974356 100

6 Conclusions and Discussion

(i) The minimum MSE for the estimator represented by R^g is

MSE(R^g)min=MSE(R^)-(1n-1n){R2C2+R2(δ2-δ1)2ΔCX2} (18)

Any estimator belonging to the suggested generalized class of estimators represented by R^g cannot have mean squared error smaller than the expression (18).

(ii) There exists a subset of estimators satisfying Equations (12) and (13) in the class R^g such that every member of this subset attains the similar minimum mean squared error (MSE) as obtained in Equation (3). For example, the estimators

R^P1 =1y2¯{y¯1(x¯x¯)(θ¯xθ¯x)} (19)
R^P2 =1y2¯{y¯1(x¯x¯-1)(θ¯xθ¯x-1)} (20)
R^P3 =1y2¯{y¯1(x¯x¯)k1(θ¯xθ¯x)k2} (21)
R^P4 =1y2¯{y¯1+k1(x¯x¯-1)+k2(θ¯xθ¯x-1)} (22)

are some particular members of the proposed generalized class and also attains the similar minimum mean squared error as given in Equation (3).

(iii) The MSE (R^g) of the formulated estimator R^g is minimized for the optimum values given in Equations (12) and (13), the obtained optimum values of g3 and g4 are

g3 =-RCCX3{C2CX2+δ2(δ2-δ1)Δ} (23)
g4 =RX¯θ¯X(δ2-δ1)Δ (24)

Under many practical situations, the values of some unknown parameters involved in optimum values may not be known a priori. Hence to overcome such situations it is suggested to use unbiased estimators of unknown parameters of the optimum values.

(iv) From theoretical and empirical efficiency comparison it can be reasonably concluded that the suggested estimator will yield valid and accurate results and is also relatively efficacious than the traditional estimators. Therefore, the application of proposed estimator for practical situations is substantially advisable.


The authors are very much indebted to the editor-in-chief Prof. (Dr.) Vinod Kumar and learned referees for their valuable suggestions leading to improvement of the quality in the contents of the paper.


[1] Biradar, R.S. and Singh, H.P. (1997–98). A class of estimators for population parameters using supplementary information, Aligarh J. Statist., 17 & 18, pp. 54–71.

[2] Cochran, W.G. (1977). Sampling Techniques, 3rd edition, John Wiley and Sons, New York.

[3] Kumar, K. and Srivastava, U. (2018). Estimation of ratio and product of two population means using exponential type estimators in sample surveys, International Journal of Mathematics and Statistics, 19–3, pp. 102–109.

[4] Mukhopadhyay, P. (2012). Theory and Methods of Survey and Sampling, 2nd edition, PHI Learning Private Limited, New Delhi, India.

[5] Murthy, M. (1967). Sampling Theory and Methods, 1st edition, Calcutta Statistical Publishing Society, Kolkata, India.

[6] Neyman, J. (1938). Contribution to the theory of sampling human populations, Journal of the American Statistical Association, 33(201), pp. 101–116.

[7] Shah S.M. and Shah D.N (1978). Ratio cum product estimator for estimation ratio (product) of two population parameters, Sankhya C, 40, pp. 156–166.

[8] Singh, D. and Chaudhary, F.S. (1997). Theory and Analysis of Sampling Survey Designs, New Age International Publishers, New Delhi, India.

[9] Singh, D. and Chaudhary, F.S. (2009). Sampling Techniques (2nd ed.), New Age International Publishers, New Delhi, India.

[10] Singh, G.N. and Rani, R. (2005, 2006). Some linear transformations on auxiliary variable for estimating the ratio of two population means in sample surveys, Model Assisted. Statistics and Applications, 1(1), IOS Press, 1–5.

[11] Singh, H.P. (1998). On the estimation of ratio and product of two finite populations means, Proc. Nat. Acad. Sci. India, Sec. A, 58, pp. 399–402.

[12] Singh, M.P. (1965). On the estimation of ratio and product of population parameters, Sankhya B, 27, pp. 321–328.

[13] Singh, M.P. (1967). Ratio cum product method of estimation, Metrika, 12, pp. 34–43.

[14] Singh, M.P. (1969). Comparison of some ratio cum product estimators, Sankhya B, 31, pp. 375–378.

[15] Singh, R. K. and Naqvi, N. (2015). A generalized class of estimator of the ratio of two population means using auxiliary information, Sri Lankan Journal of Applied Statistics, 16–3, pp. 179–193.

[16] Singh, R.K. (1982). On estimating ratio and product of population parameters, Calcutta Statist. Assoc. Bull., 20, pp. 39–49.

[17] Sukhatme, P. V., Sukhatme, B. V., Sukhatme, S. and Asok, C. (1984). Sampling Theory of Surveys with Applications, 3rd Edition, Ames, Iowa (USA) and Indian Society of Agricultural Statistics, New Delhi, India.

[18] Tripathi, T.P. (1980). A general class of estimators of population ratio, Sankhya C, 42, pp. 63–75.

[19] Upadhyaya, L.N., Singh, G.N. and Singh, H.P. (2000). Use of transformed auxiliary variable in the estimation of population ratio in sample survey, Statistics in Transition, 4, pp. 1019–1027.



Tarunpreet Kaur Ahuja is a Research Scholar in Department of Statistics, D.A.V. (P.G) College, Dehradun, which is affiliated to Hemvati Nandan Bahuguna Garhwal University (A Central University), Srinagar, Uttarakhand. She has completed her B.Sc. (Physics, Mathematics & Statistics) degree in 2014 and M.Sc. in Statistics in 2016 from D.A.V. (P.G) College, Dehradun, Uttarakhand. She worked as an intern for Data Analytics team in Pitney Bowes, a commerce company in 2016. Her research interest is in the area of developing sampling theory and she has also published four research papers in different national and international journals of repute.


Peeyush Misra is working as an Associate Professor in Department of Statistics, D.A.V. (P.G) College, Dehradun, which is affiliated to Hemvati Nandan Bahuguna Garhwal University (A Central University), Srinagar, Uttarakhand. He has completed his Graduation, Post Graduation and PhD degree in Statistics from the University of Lucknow, Lucknow, Uttar Pradesh. He is contributing to the scientific field of Statistics mainly in the area of sampling theory. He has published more than sixty research papers in various national and international journals of repute. He has also successfully completed one research project sponsored by UGC.


O. K. Belwal is currently working as a Professor and Head of Department of Statistics, Hemvati Nandan Bahuguna Garhwal University (A Central University), Srinagar, Uttarakhand. He has contributed to the various fields of Statistics through many research publications in different national and international journals of repute. He has also successfully completed some research projects in the field of Statistics. He has also successfully organized some national seminars/conferences of repute. He has also supervised many research scholars.


1 Introduction

2 Proposed Estimator

3 The Expression for Bias and Mean Squared Error

4 Efficiency Comparison

5 An Empirical Study

6 Conclusions and Discussion