2003 Master Sample (MS) Design

Beginning July 2003, the former National Statistics Office (NSO) employs the 2003 master sample (MS) design in the conduct of its household surveys. The 2003 MS extensively employed the results of the 2000 Census of Population and Housing as well as results of past national surveys, such as the 2000 Family Income and Expenditure Survey (FIES), the 2001 Labor Force Survey (LFS), and the 1997 Family Planning Survey (FPS).

 

 Related Links

This note provides an overview and general description of the different aspects of the 2003 MS. More thorough discussions are given in the main technical documentation (The 2003 Master Sample Documentation).

A master sample is defined as a sample from which subsamples are drawn to serve the needs of several surveys. Master samples are usually employed for several surveys covering different themes that are integrated in terms of target population, sample design and field operations. The use of master samples promotes efficiency on the use of limited resources (e.g. single cost for the development of survey design and preparation of sampling frames). It also allows the linking of the different survey variables thereby creating a richer database for more meaningful and useful analyses. Usually, a master sample is an area sample of clusters of households referred to as Primary Sampling Units (PSUs).

With the availability of updated information for the general household population from the 2000 Census of Population and Housing, a redesign of the master sample was done.

Target Population

The 2003 master sample design covers all households in the Philippines excluding institutional households as well as households in the Least Accessible Barangays (LABS).

For the 2003 MS, a barangay is classified as LAB if: (a) there is no regular means of transportation (frequency of transportation is less than three times a week); (b) the cost of a oneway fare is more than 500 pesos; or, (c) it takes more than 8 hours of walking to reach the barangay. The LABS were identified by the PSA (former NSO) field offices. The final list was determined after further consultation by the PSA (former NSO) Central Office MS project team with the NSO field offices. A total of 350 barangays were classified as LABs and were excluded in the MS frame.

page3image

Primary Sampling Units (PSU)

Do you know that…
  • There are 41,942 barangays in the country, 350 of which were considered least accessible barangays (LABs) and were excluded from the frame
  • The total number of PSUs formed from 41,592 barangays is 16,586
  • The average number of households in a 2003 MS PSU (or PSU size) is 923

A master sample is a sample of PSUs. A PSU, on the other hand, is a cluster of households with clear and stable boundaries, that is, the boundaries do not change rapidly over time. A PSU should also contain sufficient number of households to support all the household surveys for which it will be used as sample. The 2003 MS for instance, needs PSUs with at least 500 households.

The barangays were found to be the most suitable administrative unit (in terms of number) to form the PSUs for the 2003 MS. However, more than half of the barangays do not satisfy the minimum size requirement (number of household) of an ideal PSU, thus, “small” barangays were grouped with contiguous barangays within the municipality to form the desired PSUs.

A list of all the PSUs formed and their characteristics in terms of the stratification variables used is contained in the Master Sample Frame (MSF).

Domains

Survey estimates are generally needed for the nation as a whole as well as for various subgroups. These subgroups may refer to socio-demographic subdivisions that are usually spread throughout the population such as female-headed households by age of head or educational levels by age and sex, or geographic subdivision such as regions or provinces. Thus, the survey may be designed taking into consideration the provision of estimates with adequate level of precision for such subdivisions. At the design stage, geographic subdivisions are usually treated as domains. A domain refers to such subdivisions in which estimates of adequate precision are desired.

Based on past surveys and other available resources, most national surveys are able to produce estimates of adequate precision at the regional level only. The precision of estimates may be measured in several ways. One way is to construct a 95% confidence interval estimate (note that a wider confidence interval estimate is deemed imprecise and less useful).

Example: The estimated proportion of poor families for a given domain is 30%

Coefficient of Variation (CV)Standard Error (SE)95% Confidence Interval Estimate
10% 3% 30% ± (2*3%) ⇒ 24% to 36%
20% 6% 30% ± (2*6%) ⇒ 18% to 42%

The example above means that with a CV of 10%, the true proportion of poor families lies between 24% to 36% ninety-five percent of the time. A CV of 20%, on the other hand, assures that the true proportion of poor families lies between 18% to 42% ninety-five percent of the time. Notice that the width of the interval widens as the CV or SE values increases. A summary of the provincial and regional level CV values of the estimated proportion of poor families is shown in Table 1.

Table 1. Distribution of Regional and Provincial Estimates of the Proportion of Poor Families Based on the Results of the 2000 Family Income and Expenditures Survey (FIES)

Range of CV ValuesNumber of Regional Estimates%
<5> 6 35.3
5% - 10% 11 64.7
Total 17 100.0
Range of CV ValuesNumber of Provincial Estimates%
<5> 3 3.7
5% - 10% 36 43.9
10% - 15% 33 40.2
15% -20% 8 9.8
20% - 25% 1 1.2
>25% 1 1.2
Total 82 100.0

Source of Primary Data: PSA (former NSO), 2000 FIES

For domain specification, an estimate is considered precise if the CV value of the estimated proportion of poor families does not exceed 10%. This criterion was used in specifying regions as domains of the MS. Note that in Table 1, only 39 out of 82 provincial estimates of the proportion of poor households yielded CV values less than 10%.

The importance of generating provincial level estimates was seriously considered in defining major sampling domains for the MS. However, generating provincial level estimates with adequate precision requires larger sample size that is usually not feasible and sustainable given the resources available for the survey.

With regions as domains, the computed total sample size that would give the desired reliability in the estimates for each domain is manageable. In particular, the required sample size per region was computed so that the expected CV of the estimated proportion of poor households would not exceed 5% except in the NCR where the CV value was set to 10%. The exception was made through the observation that the estimated proportion of poor households in NCR is small (around 8%). The total sample size computed that satisfies this reliability condition is about 43,000 households. If provinces were to be specified as domains, the total sample size requirement would be much larger than this.

Sample Allocation

The procedure in allocating the total sample size in each domain directly affects the precision of the estimates based on two important purposes. These are:

  • The need to generate precise estimates at the national level or subclasses of the population that cuts across domains. Examples of subclass estimates are the proportion of poor households among female-headed households or the employment rate by major industry classification (e.g. agriculture, manufacturing, etc.). For this purpose, allocating the sample proportional to the total number of households in the domain is considered the best solution.
  • The need to generate precise estimates at the domain level for purposes of comparison. In this case, allocating the total sample size equally across domains is the best solution

Clearly, the best solutions for each of the two concerns are not consistent with one another. Because of this, a compromise allocation scheme was used. In particular, the Kish Allocation Scheme was used to allocate the total sample size to each domain.

The final sample size per region was further adjusted (increased) to consider projected non-response and population growth. These adjustments resulted to a total sample size of about 47,000 households.

Under the Kish Allocation Scheme, the sample size in each domain, denoted by nd, is determined by page8formula   Equation 1

where:

n - total sample size (about 43,000);
H - number of specified domains/regions (=17); and
Wd = Nd / N - proportion of the total household population (N) found in region d.

Note that Equation 1 gives equal importance to the two allocation concerns mentioned.

Number of PSUs per Domain/Region

The number of sampled PSUs per domain was computed by simply dividing the total sample size by the desired sample size per PSU. The desired sample size per PSU was determined using: (1) the information on the cost of data collection efforts in the region; and, (2) the indication of similarity or homogeneity of the households within the PSU. The basic idea is to take smaller samples with PSUs consisting of homogeneous households and if the cost of data collection is more expensive. With these information gathered from past survey results, the number of sample households from each PSU was set at 16 for areas outside the National Capital Region (NCR) and 12 for the NCR. This means that for NCR, the total number of PSUs is equal to the allocated sample size divided by 12. For the other regions, it is equal to the allocated sample size divided by 16.

Definition

SR PSU or Self-Representing Primary Sampling Unit – a very large PSU in the region/domain with a selection probability of approximately 1 or higher and is outright included in the MS; it is properly treated as a stratum; also known as certainty PSU

NSR PSU or Non-SelfRepresenting Primary Sampling Unit – a regular to small sized PSU in a region/domain; also known as non-certainty PSU

The final number of sample PSUs for each domain was determined by first classifying PSUs as either selfrepresenting (SR) or non-selfrepresenting (NSR). In addition, to facilitate the selection of subsamples, the total number of NSR PSUs in each region was adjusted to make it a multiple of 4.

The 2003 MS consist of a sample of 2,826 PSUs. The sample size distribution across regions and provinces are shown in the attached Table A.

Stratification of PSUs

Stratification involves the division of the entire population into non-overlapping subgroups called strata, from which samples are being selected independently. This procedure is done to:

  • Improve the efficiency of the estimates as a result of combining units that are similar in characteristic. This means improving on the precision of the estimates for a given sample size.
  • Provide samples for specific subgroups of the population in which separate estimates are desired.

The stratification procedure used in the 2003 MS is described in Diagram A.

page10image

A total of 955 explicit strata were formed, 330 of which were the SR PSUs.

1 This allows the generation of either direct or indirect subregional estimates
2 Proportion of strongly built houses
3 An indication of the proportion of households engaged in agriculture
4 Per capita municipal income

Sample Selection

In each explicit stratum, a sample of PSUs, and then sample EAs within PSUs, was selected with probability proportional to size (PPS) where size is the number of households enumerated in the 2000 Census of Population and Housing (CPH). Within each sampled EA, a sample of housing units was selected with equal probability. All households in the housing units sampled are completely enumerated, except for few cases when the housing units have more than three households. For operational considerations, the maximum number of household that could be enumerated in each sampled housing units is three. In the case of SR PSUs, the EAs were the PSUs and a minimum of two EAs were selected with PPS to ensure valid estimation of the variances.

Formation of Replicates

Another important feature of the 2003 MS design is its flexibility to meet the needs of different surveys. Some surveys require only a fewer set of sample and thus the need to sub-sample from the master sample. To facilitate the selection of sub-samples, the MS was divided into four replicates. A replicate is defined as a subsample that possesses the properties of the full master sample such that each replicate is able to generate national level estimates of adequate precision.

For the NSR PSUs, each of the four PSUs in every stratum is assigned to one replicate. In the case of SR PSUs, on the other hand, the EAs were distributed to the replicates in such a way that a balance between two half samples (each of two replicates) can be achieved. A balanced distribution of EAs of the SR PSUs to the four replicates can not be achieved because most of the SR PSUs have only two EAs.

Selection of Subsamples

Several options are available in the selection of subsamples from the new master sample. These options depend on whether the survey is done together with the regular Labor Force Survey (LFS) or as a stand-alone survey.

  • If a survey that requires only a subsample is conducted together with the LFS, then it is more efficient to select a subsample of housing units within a PSU. For instance, suppose the total number of sampled housing units within a PSU is 16, a quarter sample is drawn by selecting 4 housing units from among the 16 with equal probability.
  • If the survey is to be conducted independently of the LFS, then it is more efficient to select a subsample of PSUs rather than a subsample of housing units in all PSUs. The subsampling of PSUs can be done by selecting one or more replicates. For instance, if a 50% sample is desired, then this can be achieved by selecting two replicates. This applies on both SR and NSR PSUs.

 

Estimation Procedures

The generation of the survey weights for each responding element is one of the key activities in generating estimates using the MS. The weight may be interpreted as the relative importance given to the responding unit in the generation of estimates. This can also be interpreted as the number of non-sampled units that each responding unit represents in the sample. Basically, the final survey weight is defined as the product of: (1) Base weights; (2) The nonresponse adjustment weight; and, (3) Weight adjustment based on known population totals or simply post-stratification weight. The base weight is determined by taking the inverse of the selection probabilities of each unit of analysis. The nonresponse adjustment weight is determined by taking the inverse.

Rotation of Samples

The MS will be used for a period of 10 years. As such, sample elements need to be replaced by a new set at certain points in time. Retaining the original sample elements would create problems such as response burden that would eventually affect the overall quality of the survey results. In addition, units repeatedly interviewed increase the likelihood of non-response. A solution to this problem is to devise a sample rotation plan so that a unit may stay in the sample for some period and then replaced permanently by a new set of sample. To facilitate a sample replacement scheme, each replicate will form a panel. In each PSU, all units were divided into rotation groups of equal size. The sample replacement scheme is such that every quarter of the year, a new rotation group in each panel will be selected. However to maximize the effect of the correlation of the estimates between years, 50% of the panels will have common samples for a quarter in consecutive years. For illustration, refer to the proposed sample rotation design in Table 2.

Future Direction

The completion of the research for 2003 master sample design directed the PSA (former NSO), through the Statistical Methodology Unit (SMU), to conduct other related research studies. For 2004, the research study line up is as follows:

  • Validation of Raking Procedure used for LFS Estimates;
  • Provincial Estimation of Unemployment Using Aggregated Four Quarter Samples;
  • Comparison of Estimates (levels/rates and precision) Using Old and New Nonresponse Adjustment Procedure; and
  • Comparison of the number of households obtained in C2K and CA/CF listing by EA.

 

Table 2. Sample rotation design from 2004 to 2008

YearQuarterSample/Rotation Cluster*
2004 January A1 B1
April A2 B2
July A3 B3
October A4 B4
2005 January A1 B5
April A2 B6
July A3 B7
October A4 B8
2006 January A5 B5
April A6 B6
July A7 B7
October A8 B8
2007 January A7 B7
April A6 B9
July A5 B10
October A8 B11
2008 January A9 B12
April A10 B9
July A11 B10
October A12 B11

* Numbers represent rotation groups formed for the housing units withn the sampled EAs and letters represent rotation clusters. Rotation cluster A includes replicates one and two while rotation cluster B includes replicates 3 and 4

Table A: Sample size distribution by region and province. 2003 PSA (former NSO) Master Sample.

Region / ProvinceTotal Pop'nNo. of HhldsNo. of PSUAllocated Sample SizeNo. of Sample PSUFinal PSU Allocation
OriginalAdj. For Non ResponseSR PSUNSR PSUTtal PSU
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)
          
PHILIPPINES 76,311,169 15,312,424 16,579 43,882 46,976 2,835 330 2,496 2,826
                   
REGION 1 4,192,048 837,348 1,199 2,408 2,543 150 0 148 148
Ilocos Norte 513,850 108,477 164 312 329 19 0 20 20
Ilocos Sur 589,797 119,270 197 343 362 21 0 20 20
La Union 655,651 131,140 189 377 398 24 0 24 24
Pangasinan 2,432,840 478,461 649 1,376 1,453 86 0 84 84
                   
REGION 2 2,776,100 568,347 839 2,085 2,240 130 0 132 132
Batanes 16,548 3,489 5 13 14 1 0 48 48
Cagayan 969,824 196,046 292 719 773 45      
Isabela 1,172,502 239,624 362 879 944 55 0 56 56
Nueva Vizcaya 362,603 75,920 109 279 299 17 0 16 16
Quirino 144,203 29,904 46 110 118 7 0 8 8
Santiago City 110,420 23,364 25 86 92 5 0 4 4
                   
REGION 3 8,228,567 1,676,713 1,780 3,726 3,882 233 7 224 231
Bataan 556,930 113,596 135 252 263 16 0 16 16
Bulacan 2,235,626 465,743 420 1,035 1,078 65 1 64 65
Nueva Ecija 1,659,257 342,216 447 761 792 48 0 48 48
Pampanga 1,629,273 310,483 309 690 719 43 3 40 43
Tarlac 1,077,289 217,940 259 484 505 30 1 28 29
Zambales 431,625 91,269 115 203 211 13 0 12 12
Aurora 172,963 34,896 50 78 81 5 0 4 4
Angeles City 271,383 57,367 30 127 133 8 1 8 9
Olongapo City 194,221 43,203 15 96 100 6 1 4 5
                   
CALABARZON 9,383,464 1,936,232 1,842 4,181 4,346 261 32 232 264
Batangas 1,908,864 378,091 485 816 849 51 0 52 52
Cavite 2,090,786 436,356 459 942 979 59 5 56 61
Laguna 1,972,247 419,163 350 905 941 57 6 52 58
Quezon 1,473,460 298,778 399 645 671 40 0 40 40
Rizal 1,746,603 364,886 126 788 819 49 20 28 48
Lucena City 191,504 38,958 23 84 87 5 1 4 5
                   
MIMAROPA 2,253,006 452,790 596 1,974 2052 123 2 124 126
Marinduque 216,887 43,892 67 191 199 12 0 12 12
Occidental Mindoro 368,210 74,420 84 324 337 20 1 20 21
Oriental Mindoro 676,651 133,971 191 584 607 36 0 36 36
Palawan 728,723 147,069 179 641 666 40 1 40 41
Romblon 262,535 53,438 75 233 242 15 0 16 16
                   
REGION 5 4,659,730 892,720 1,242 2,483 2667 155 0 156 156
Albay 1,083,327 208,039 300 579 621 36 0 36 36
Camarines Norte 465,098 90,982 126 253 272 16 0 16 16
Camarines Sur 1,408,937 261,686 383 728 782 45 0 44 44
Catanduanes 215,616 41,109 61 114 123 7 0 8 8
Masbate 709,737 140,458 199 391 420 24 0 24 24
Sorsogon 644,364 124,944 152 347 373 22 0 24 24
Naga City 132,651 25,502 21 71 76 4 0 4 4
                   
REGION 6 6,227,183 1,220,660 1,503 2,970 3282 186 7 176 183
Aklan 456,822 89,375 125 217 240 14 0 12 12
Antique 454,149 90,186 135 219 242 14 0 12 12
Capiz 652,955 128,479 183 313 345 20 0 20 20
Iloilo 1,554,030 298,585 468 726 803 45 0 44 44
Negros Occidental 2,163,915 423,839 420 1,031 1,140 64 0 64 64
Guimaras 140,741 27,496 39 67 74 4 0 4 4
Iloilo City 363,706 72,459 91 176 195 11 0 12 12
Bacolod City 440,865 90,241 42 220 243 14 7 8 15