Administrative Records for Survey Methodology. Группа авторов

Administrative Records for Survey Methodology - Группа авторов


Скачать книгу
rel="nofollow" href="#fb3_img_img_637906a3-e8ea-5dba-a8da-b617091948fe.png" alt="images"/> that sums to both Y(r) and
marginally. The technique has many applications including small area estimation (Purcell and Kish 1980) and statistical matching (D’Orazio, Di Zio, and Scanu 2006; Zhang 2015a) – more in Section 1.3.2.

      A key difference between the asymmetric-linked setting and the asymmetric-unlinked setting discussed above is that, one generally does not expect a benchmarked adjustment method based on unlinked data to yield unbiased results below the level where the benchmarks are imposed. For instance, repeated weighting of Renssen and Nieuwenbroek (1997) can yield design-consistent domain estimates subjected to population benchmark totals, because the overlapping survey variables are both considered as the target measure here and no relevance bias is admitted. However, when the same technique is applied to reweight a register dataset, e.g. with the initial weights all set to 1, one cannot generally claim design or model-based consistency below the level of the imposed benchmarks, regardless of whether the benchmarks themselves are true or unbiased from either the design- or model-based perspective. Similarly, provided suitable assumptions, the one-number census imputation can yield model-consistent estimates below the level of the imposed constraints, because the donor records are taken from the enumerated census records that are considered to provide the target measures. However, the model-consistency would fall apart when the donor pool is a register dataset that suffers from relevance bias, even if all the other “suitable” assumptions are retained. Assessment of the statistical uncertainty associated with benchmarked adjustment is therefore an important research topic. An illustration in the contingency table case will now be given in Section 1.3.2.

      1.3.2 Uncertainty Evaluation: A Case of Two-Way Data

      For the asymmetric-linked setting, suppose there is available an observed sample two-way classification of (a, j). For survey weighting, let s denote the sample and let di = 1/πi be the sampling weight of unit is, where πi is the inclusion probability. Let yi(a, j) = 1 if sample unit is has classification (a, j) according to the target measure and yi(a, j) = 0 otherwise; let xi(a, j) = 1 if it has classification (a, j) according to the proxy measure and xi(a, j) = 0 otherwise. Post-stratification with respect to X yields then the poststratification weight, say,

, where

      This is problematic when there are empty and very small sample cells of (a, j). Raking ratio weight can then be given by

, where
is derived by the IPF of
to row and column totals Xa+ and X+j, respectively. Deville, Särndal, and Sautory (1993) provide approximate variance of the raking ratio estimator, say,
where

      A drawback of the weighting approach above is that no estimate of Yaj will be available in the case of empty sample cell (a, j), and the estimate will have a large sampling variance when the sample cell (a, j) is small in size. This is typically the situation in small area estimation, where, e.g. a is the index of a large number of local areas. Zhang and Chambers (2004) and Luna-Hernández (2016) develop prediction modeling approach.

      The within-area composition (Ya1, Ya2, …, YaJ) is related to the corresponding proxy composition (Xa1, Xa2, …, XaJ) by means of a structural equation

      where

is the area-vector of interactions on the log scale, i.e.
where
=
, and similarly for
, and β a matrix of unknown coefficients that sum to zero by row and by column.

      The structural equation can be used to specify a generalized linear model of the observed sample cell counts, or their weighted totals, which allows one to estimate β and Y. It is further possible to develop the mixed-effects modeling approach that is popular in small area estimation, by introducing the mixed structural equation

. The associated uncertainty will now be evaluated under the postulated model. The prediction modeling approach can thus improve on the survey weighting approach in the presence of empty and very small sample cells.

      For an example under the asymmetric-unlinked setting, consider the Norwegian register-based household statistics. At the time the household register was first introduced for the year 2005, there were still about 6% persons with missing dwelling identification in the Central Population Register. As the missing rate differed by local areas as well as household types, direct tabulation did not yield acceptable results compared to the Census 2001 outputs. The IPF was applied to the sub-population of households that have the dwelling identification to yield a weight for every such household. The method falls under the benchmarked adjustment approach. However, direct evaluation of the associated uncertainty is not straightforward. Zhang (2009b) extends the prediction modeling approach above to accommodate the informative missing data. By comparison with the model-based predictions, one is able to assess indirectly the benchmarked adjustment results.

      Using the IPF for small area estimation is known as structure preserving estimation (SPREE, Purcell and Kish 1980). The model underpinning the SPREE is a special case of the prediction models mentioned above, i.e. by setting β = 1. It does not require linkage between the proxy data X and the data that yield the benchmarks Ya+ and Y+i. While this is convenient for deriving the estimates, a difficulty arises when it comes to uncertainty evaluation directly under the SPREE model. See also Dostál et al. (2016) for a benchmarked adjustment method based on the chi-squared measure in this respect.

      Finally, let Y


Скачать книгу