Administrative Records for Survey Methodology. Группа авторов

Administrative Records for Survey Methodology - Группа авторов


Скачать книгу
was to make the newly available linked administrative data at LEHD accessible to researchers. The network operates under physical security constraints managed by the Census Bureau and the IRS, in locations that are considered part of the Census Bureau itself, and staffed by Census Bureau employees.

      Statistical data enclaves can be central locations, in which a single location at the statistical agency is made available to approved researchers. In the United States, NCHS and BLS follow this model, in addition to using the FSRDC network. In Canada, business data can be accessed at Statistics Canada headquarters, while other data may be accessed both there and at the geographically dispersed RDCs, which obtain physical copies of the confidential data.

      The location of remote access points is often limited to the country of the data provider (United States, Canada), or to countries with reciprocal or common enforcement mechanisms (within the European Union, for European NSOs). Cross-border access, even within the European Union, remains exceedingly rare, with only a handful of cross-border secure remote access points open in the European Union. The most prolific user of cross-border secure remote access points, as of this writing, is the German IAB, with multiple data access points in the United States and a recently opened one in the United Kingdom.

      2.4.2 Remote Processing

      Two other alternative remote access mechanisms are often used: manual and automatic remote processing. Manual remote processing occurs when the remote “processor” is a staff member of the data provider. This can be as simple as sending programs in by email, or finding a co-author who is an employee of the data provider. The U.S. NCHS, German IAB, and Statistics Canada provide this type of access. Generally, the costs of manual remote processing are paid by the users.

      More sophisticated mechanisms automate some or all of the data flow. For instance, programs may be executed automatically based on email or web submission, but disclosure review is performed manually. This method is used by the IAB’s JoSuA (Institute for Employment Research 2016). Fully automated mechanisms, such as LISSY (Luxembourg), ANDRE (U.S. NCHS), DAS (U.S. NCES), Australia’s Remote Access Data Laboratory (RADL), Canada’s Real Time Remote Access (RTRA), generally restrict the command set from the allowed statistical programming languages (SAS, Stata, and SPSS) and limit what the users can do to certain statistical procedures and languages for which known automated disclosure limitation procedures have been implemented.

      Most of these systems only provide access to household and person surveys. Of the known systems surveyed above, only Australia’s RADL systems and the Bank of Italy’s implementation of LISSY (Bruno, D’Aurizio, and Tartaglia-Polcini 2009, 2014) seem to provide access to business microdata through automated remote processing facilities.

      2.4.3 Licensing

      In the United States, some surveys (NCES, NLSY, and HRS) use licensing to distribute portions of the data they collect on their respondents. Commercial data providers (COMPUSTAT, etc.) also license the data distributed to researchers. Penalties for license infractions range from restricting future research grant funding, for example in HRS, to monetary penalties, for example in commercial data licenses. We are not aware of any studies that quantify the violation rates or financial penalties actually incurred due to license violations. Licensing may be limited by the enforceability of laws or contracts, and thus may be limited to residents of the same jurisdiction in which the data provider is housed. Often, some licensing is combined with the creation of ad-hoc data enclaves, the simplest of these being stand-alone, nonnetworked computer workstations.

      2.4.4 Disclosure Avoidance Methods

      Data enclaves exist to allow researchers to perform analyses within the restricted environment, and then extract or publish some form of statistical summary that can be released from the secure environment. Generally, these summaries are estimates from a statistical model. In general, model-based output is evaluated in accordance with the same criteria traditionally used for tabular output (minimum number of units within a reporting cell, minimum percentage of global activity within a reporting cell). In contrast to licensing arrangements, which allow researchers to self-monitor, statistical data enclaves have regimented output monitoring, typically by staff of the data provider. Generally, released statistical outputs are registered in some fashion, but documentation of the full provenance chain may be limited.


Скачать книгу