Title: Disclosure-Protected Regression Coefficients with Linked Micro-data
Industry Partner: Australian Bureau of Statistics
Large amounts of micro-data are collected by data custodians, eg ABS, Department of Families, in the form of censuses, surveys and administrative sources. Often data custodians will collect different information on the same individuals or businesses. Important information can be obtained by linking the micro-data collected by the different data custodians. There is very strong demand from analysts, within government, business and universities, for linked micro-data. Potential analysts are either data custodians or non-custodians (e.g. academic or a member of the public). Data custodians are often legally obliged to ensure that the risk of disclosing information about a person or organisation is acceptably low.
The task is to link the data custodians’ micro-data and then to facilitate the analysts’ access to the linked micro-data while ensuring each data custodian appropriately manages its disclosure risk. This project considers a particular version of this scenario.
A simple scenario involves an analyst’s query and a remote server where the analysis is carried out.
1. An analyst submits a query, via the internet, to the analysis server, a remote server.
2. The analysis server processes the analyst’s query on the sensitive micro-data. The statistical output (e.g. regression coefficients) is modified or restricted in order to ensure the risk of disclosure is acceptably low.
3. The analysis server sends the modified output, via the internet, to the analyst.
In the above scenario the analyst is restricted from viewing the micro-data but may attempt to use the regression output to learn the values of variables on the linked micro-data.
To date research has exclusively focused on analysts who are non-custodians. Solutions have included restricting the set of models that can be fitted and perturbing the output values. This project concerns the disclosure risks posed by data custodians. A data custodian may use the micro-data it supplied to the Integrating Authority and the regression output based on the linked micro-data to disclose information about a person or organisation that was collected by the other data custodian. Data custodians will commonly collect names and addresses. This means that if a data custodian learns the value of a sensitive variable from the regression output, there is a real possibility it could be attributed to the person who provided that information.
This project will investigate various approaches to protecting against these possible attacks so that the risk of disclosure is unlikely. The ABS legislation is legally obliged to ensure that the risk of disclosing information about a particular person or organisation is unlikely.