A CLASS OF REGRESSION-CUM-RATIO ESTIMATORS IN TWO-PHASE SAMPLING FOR UTILIZING INFORMATION FROM HIGH RESOLUTION SATELLITE DATA

Two-phase sampling design offers a variety of possibilities for effective use of auxiliary information such as those from high resolution remote sensing data. Continuous satellite data with large area coverage provide scope for deriving population values of the auxiliary variables, which can effectively be used for estimating the population parameters of the variable of interest. This study has been made to examine the possibilities of different forms of auxiliary information derived from remote sensing data in two-phase sampling design and suggest an appropriate estimator that will be of broad interest and applications. A new class of regression-cumratio estimators has been proposed for two-phase sampling using information on two auxiliary variables derived from high resolution satellite data. The bias and the mean square error (MSE) of the proposed estimators have been obtained up to first order approximation. Efficiency comparison of the proposed estimators has been made with some traditional estimators. The proposed estimator is found to be more efficient than the usual regression and ratio estimators. Numerical illustration has been carried out to examine the efficiency of the estimator in case of forest timber volume estimation utilizing tree crown diameter and tree height as auxiliary variables. It is shown that these estimators can be employed in a variety of conditions where there is strong correlation of satellite derived information with sample based ground measurements and when the cost of ground measurements is relatively high.


INTRODUCTION
With the relatively recent widespread availability of operational fine spatial resolution imagery from satellites, there is more opportunity to conduct spatial sampling with combinations of spatial resolution data (Haack and Rafter, 2010).Estimation of required parameters can efficiently be done with ratio and regression method of estimation with two-phase sampling or double sampling (Cochran, 1999) technique.There is large volume of theoretical developments with ratio and regression estimators in two-phase sampling suggesting a large number of estimators to suit different conditions (Khan and Tripathi, 1967;Sarndal and Swenson, 1987).Most of the estimators differ in the sense of availability of different auxiliary variables and number of phases considered (Srivastava and Jhajj, 1981).In addition, a large deal of work has been done in suggesting regression and ratio type estimators, regression in regression, regression in ratio and ratio in regression estimators (Kiregyera, 1984;Mukerjee et al., 1987;Ahmed, 1998;Senapati and Sahoo, 2006).Similarly, there has been significant theoretical work on transform auxiliary variables (Mohanty and Sahoo, 1995;Singh, 2001).Since most of the estimators are suggested based on their efficiency on theoretical comparison, practical applications are limited.Moreover, with the scope of utilizing wide range of remote sensing derived auxiliary variables, there is scope for suggesting new estimators, which will be of broad interest and applications.Two-phase sampling design offers a variety of possibilities for effective use of auxiliary variables such as those from high resolution remote sensing data (Schreuder et al., 1993;Johnson, 2000).The estimation method consists of first using a coarser spatial resolution sensor over a large area.For selected samples, a second delineation is done with a finer spatial resolution system or ground based observations which are presumed to be more accurate.Regression and ratio estimators were effectively used to integrate AVHRR-GAC and Landsat MSS digital data to estimate forest area in the continental United States (Nelson, 1989).Köhl, and Kushwaha (1994) demonstrated a four-phase sampling method for assessing standing volume using Landsat TM data, aerial photography and field measurements.
In the past decade, the use of high resolution satellite images, such as IKONOS, QUICKBIRD, Kompsat2 etc., has allowed the development of interactive models of photo interpreting biometrical parameters of stands and trees (Barnoaiea, 2007).With such high-spatial-resolution commercial images as well as the advancement of image processing techniques, there is significant improvement in population estimation accuracies (Goetz, et al., 2003).QuickBird satellite images when used to extract auxiliary variables (image data), such as photogrammetric crown diameter, stand volume, stand density and diameter distribution, using visual interpretation was found to be significantly cost-efficient.(Kamwi and Katsch, 2009).Similarly, satellite derived NDVI can be used as auxiliary variable in many applications while estimating study variables such as biomass, fuelwood etc.A double sampling technique could be effectively used with NDVI and field data to have a rapid estimate of fuelwood in northern Zimbabwe.(Mutanga, & Skidmore, 2004).In this paper, a new class of regression-cumratio estimators is proposed and efficiency of the estimator is tested with satellite derived auxiliary variables.

Regression and ratio estimators in two-phase sampling
There are two types of two-phase sampling design.In the first type, the auxiliary variable does not depend on the measurements, but is purely an indicator variable showing the stratum to which the variable of interest is to be allocated.This type is termed as two-phase sampling for stratification.In the second type, the relationship between the auxiliary variables and the variables of interest is described by means of ratio or regression.In this case, the design is termed as two-phase sampling with ratio estimators and two-phase sampling with regression estimators.
In the ratio method, the first sample which is usually a large one of size n is selected with probability sampling design to obtain good estimate ′ x′ of X and then a probability sample of size either out of n or from remaining units of the population to observe the main character y under study.Assuming Simple Random Sampling without Replacement (SRSWOR) in both the phases, two phase sample estimate of the population mean given by The regression type estimators of the population mean or total of y assume advance knowledge of either population mean X or total X of the auxiliary variable x .In the absence of such information a large one of size n′ is selected to observe x and thereby to estimate X while a subsample of size n is drawn to measure y.Thus the two-phase regression type estimator of population mean Y is ( ) Now suppose that information on yet another auxiliary variable z is available on all units of the population, with population mean N Z .Mohanty (1967) suggested the following regression in ratio estimator assuming that the population mean of the second auxiliary variable (z) is known; x being the first auxiliary variable.Chand (1975) proposed the chain ratio type estimator with similar conditions.
(2) Singh et al. (2007) suggested a transformed chain ratio type estimator for the population mean Y utilizing the known correlation coefficient ( of the second auxiliary character through a simple transformation for estimating the population mean of auxiliary character more precisely in the first phase (preliminary) sampleas is the sample mean of the transformed variable v in the firstphase sample and V = Z + xz ρ is the corresponding population mean.

Proposed Estimator
Motivated with the above estimators we propose a class of regression cum ratio estimators T p Considering the fact that in practical situations it is easy to construct regression equation of y on x or y on z and utilizing the known correlation coefficient between the auxiliary variables as additive factor to the second variable. ( Mean Square Error (MSE) of the estimator upto first order of approximation is derived as - Derivation of MSE of the proposed estimator is given in the appendix.

Efficiency comparison
To compare the efficiency of estimator with similar estimators such those given in Equations 1-3, we write the MSE of the estimators as for any positive value of xz ρ , which will be true in all practical situations under study.It is obvious that for situations where double sampling is used.
will be always true since ( will always be positive, will be always true since will always be greater than 0. 2 x C So we may conclude for all practical situations under double sampling scheme with two auxiliary variables, MSE (T P ) will be lesser than MSE (T M ), MSE (T C ) and MSE (T S ).

Empirical Studies
Let us assume that the samples are selected by SRSWOR.Then the appropriate estimator based on single phase sampling without using any auxiliary variable is the mean per unit estimator y , whose variance is given by ( ) It may be shown that the ratio method of estimation will give a more precise result whenever

Population I: Murthy (1967)
We have considered this population, which has widely been used by many authors (Singh and Singh, 1991;Patel and Patel, 2012)  The second population is selected from the dataset under a research project carried by Handique et al. (2011) A sample study site was selected from a subtropical semievergreen forest in north eastern India with a dimension of 250 Hactre.The site is a part of Borail reserve forest in North Cachar Hills district of Assam.Topography of the forest is undulating with elevation ranging from 600 meter MSL to 960 meter MSL.Distribution of timber species is quite good as the area is fairly undisturbed.Important timber species available are, Mesua ferrea, Michelia champaca and Amoora wallichii.Lagerstroemia spp.pterospermum acerifolium, Duabanga spp., Elaeocarpus rugosus, Vatica lanceaefolia, etc. are major co-dominant species.IKONOS images were used to measure the plot wise average crown diameter and photogrammetric mean tree height per sample plot (Figure 1).As per convention of the local forest department, 0.1 Ha plot was made, which makes population size 2500 for the sample area.Following parameters have been derived to make the population estimate and calculate the MSE.We derive the parameters of the estimator as-N=2500 y: forest timber volume in cubic meter (Cum) in 0.1 ha sample plot x: average tree height in the sample plot in meter (m) z: average crown diameter in the sample plot in meter (m)

Figure1. IKONOS image of the part of study area
The plot centres were located on IKONOS images and estimates of the amounts of timber volume, average stand height and average crown diameter were made on each plot with the help of standard volume tables.The extent of relationship among these three variables is given in Figures 2, 3  and 4. Average stand height (variable x) has greater correlation (r=0.794) with timber volume estimates than the average crown diameter (variable z; r=0.718).The estimates could be made rapidly when images were viewed stereoscopically.This method is definitely much faster and cost effective than the traditional field inventories.Here our concern is that since the volume tables were based on regional data rather than local measurements they are not very accurate.In addition there is likelihood of photointerpreter bias.
To overcome this, 25 plots from image were randomly selected using sampling without replacement and enumerated in the field in traditional manner.Taking =25 and n n′ =200, MSE of the estimates were made using the formulae given in Equations 6-8 for both the populations I and II.Relative efficiencies of the proposed and existing estimators are given in Table 1.

Conclusions
We have observed that even with relatively low correlation values in population II, the proposed estimator has outperformed the existing estimators in terms of accuracy of the estimates.This suggests that the proposed estimator can be employed in a variety of conditions where there is significant correlation of study variable and auxiliary variables exist.Satellite derived information with sample based ground measurements can be best utilised with the proposed estimator, where we can evaluate inter-relation of the auxiliary variables by means of regression.Appropriate cost function can be built to optimise the sampling design in case of cost is a limiting factor.Interested workers can evaluate the efficiency of the proposed estimator with the LIDAR derived auxiliary variables such as those used by Wang and Glenn, (2008) and Tesfamichael et al., (2010) which is expected to have even better fit in the regression models and will result higher accuracy in the population estimates.)

Appendix
Writing T p in terms of e's we get, ( )   ( ) To examine the relative efficiencies of various estimators discussed in the chapter, we considered three population data two used earlier by others and one with remote sensing data.Relative efficiency has been measured as Relative Efficiency (RE) = 100V ( y )/MSE ( Y ˆ)

=
so as to test the efficiency of the proposed estimators.Here, N=34 y: Area under wheat in 1964 x: Area under wheat in 1963 z: Cultivated area in 1961

Figure 2 .Figure 4 .
Figure 2. Relation of timber volume with tree height Following parameters were estimated from image and ground based observations-= Y 4.63, = X 21.09, = Z 13 Expanding the right hand side of xxx and retaining terms up to first powers of e's, we have both sides in xxx and then subtracting Y from both sides, we get the bias of the estimator , up to the first order of approximation as

Table 1 .
Relative efficiency of proposed and existing estimators