Administrative Records for Survey Methodology. Группа авторов

Administrative Records for Survey Methodology - Группа авторов


Скачать книгу
three of the examples of linked data provided in this paper rely on some version of secure data enclaves to provide microdata access to approved researchers. HRS data are made available to tenure-track researchers who sign a data use agreement and provide documentation of a secure local computing environment. An additional option for HRS data is to visit to the Michigan Center on the Demography of Aging data enclave, which makes data accessible to researchers in a physical data enclave at “headquarters,” like many NSOs. More recently, HRS has started to offer secure VDI access to researchers. The confidential data underlying the SSB, and against which validation requests are run, are also available either within the FSRDC network, or by sending validation requests by email to staff at Census headquarters (a form of “remote processing”). LEHD microdata are only available through the FSRDC.

      An open question is whether the disclosure risks addressed through physical security measures are greater for linked data. Enabling researchers to measure some of the heuristic disclosure risk such as n cell count or p-percent rule (O’Keefe et al. 2013) becomes more important when any possible combination of k variables (k large) leads to small cells or dominated cells. Even subject matter experts cannot assess these situations a priori.

      2.4.5 Data Silos

      Such administrative barriers may also be driven by ethical or confidentiality concerns. The question of consent by survey or census respondents may explicitly prevent the linkage of their survey responses or of their biological specimen with other data. For example, the Canadian Census long form of 2006 offered respondents the option to either answer survey questions on earnings, or consent to linking in their tax data on earnings. In the 2016 census, the question was no longer asked, and users were simply notified that linkage would happen.

      In the case of the LEHD data, as of December 2015, all 50 states as well as the District of Columbia had signed agreements with the Census Bureau to share data and produce public-use statistics. It would thus seem possible for researchers to access a comprehensive LEHD jobs database through the FSRDC network, by linking together the job databases from 51 administrative entities. However, all but 12 of the States had declined to automatically extend the right to use the data to external researchers within the FSRDC network. Nevertheless, some of the same states that declined to provide such permission in the FSRDC give access to researchers through their state data centers or other means. The UI state-level data is thus siloed, and researchers may be faced with nonrepresentative data on the American job market. Several European projects, such as Data without Boundaries (DwB), have investigated cross-national access with elevated expectations but relatively limited success (Schiller and Welpton 2014; Bender and Heining 2011). Increasingly, the U.S. Census Bureau and CASD also host data from other data providers, through collaborative agreements, moving toward a reduction of the siloing of data.

      Secure multiparty computing may be one solution to this problem (Sanil et al. 2004; Karr et al. 2005, 2006, 2009). However, implementation of such methods, at least in the domain of the social and medical sciences cooperating with NSOs, is in its infancy (Raab, Dibben, and Burton 2015). The typical limitations are the throughput of the secure interconnection between the sources and the requirement of manual model output checking. These limitations drastically slow down any iterative procedure.

      In concluding, we note that from a theoretical perspective, there does not appear to be a clear distinction between the threats to confidentiality in linked data relative to unlinked data, or in survey data relative to administrative data. Richly detailed data pose disclosure risks, irrespective of whether that richness is inherent in the data design, or comes from linkages of variables from multiple sources. Likewise, there are no special methods to protect confidentiality in linked versus unlinked data. Any data with a network, relational, panel or hierarchical structure poses special challenges to data providers to protect confidentiality while preserving analytical validity. Our example of the QWI shows one way this challenge has been successfully managed in a linked data setting, but the same tools could be effective in application to the QCEW, which uses the same frame, but does not involve worker-firm linkages.

      However, from a legal perspective, linking two datasets can change the nature of confidentiality protection in a more practical manner. Any output must conform to the strongest privacy protections required across each of the linked datasets. For example, when the LEHD program links SSA data on individuals to IRS data on firms, any downstream research must comply with the confidentiality demands of all three agencies. Likewise, the data must conform to the U.S. Census Bureau publication thresholds for data involving individuals and firms. Hence, linking data can produce a maze of confidentiality requirements that are difficult to articulate, comply with, and monitor. Harmonizing or standardizing such requirements and practices across data providers, both public and private, and across jurisdictions would be helpful. Privacy and confidentiality issues also invite updated and continuing research on the demand for privacy from citizens and businesses, as well as the social benefit that arises from the dissemination of data.

       ACS – American Community Survey, a large survey conducted continuously by the U.S. Census Bureau, on topics such as jobs and occupations, educational attainment, veterans, housing characteristics, and several other topics (https://www.census.gov/programs-surveys/acs/)

       BDS – Business Dynamics Statistics, produced by the U.S. Census Bureau, see https://www.census.gov/programs-surveys/bds.html for more details.

       CBP – County Business Patterns, produced by the U.S. Census Bureau, see www.census.gov/programs-surveys/cbp.html for more details.

       COEP – Canadian Out-of-Employment Panel, a survey initially conducted by McMaster University in Canada, subsequently taken over by the Statistics Canada (Browning, Jones, and Kuhn 1995)

       COMPUSTAT – a commercial database maintained by Standard and Poor’s, with information on companies in the United States and around the world (http://www.compustat.com/).

       HRS – Health and Retirement Study, a long-running survey run by the Institute for Social Research at the University of Michigan in the United States on aging in the United States population (http://hrsonline.isr.umich.edu/)

       LEHD – Longitudinal Employer-Household Dynamics Program at the U.S. Census Bureau, which links data provided by 51 state administrations to data from federal agencies and surveys (https://lehd.ces.census.gov/)

       LODES – LEHD Origin-Destination Employment Statistics describe the geographic distribution of jobs according to the place of employment and the place of worker residence, in part through


Скачать книгу