What does database privacy mean exactly?

    The difficulty with privacy (or more correctly, information confidentiality) in database terms is that databases are supposed to maintain huge amounts of information, and processing and recording data is difficult, if not impossible without them. Public bodies especially, have difficulty in defining and maintaining the boundaries of information disclosure that they should provide, whilst maintaining the utility of the information for the improvement of welfare and services.

  Privacy is contingent on first having a correctly secured database. Additional privacy controls are required when sensitive data leaves the protected trust perimeter of the database to be utilised by third parties.

Context Sensitive Privacy

   What is clear is that database confidentiality, like privacy itself, is dependent upon context [Refs 1, 2] and good privacy is about maintaining contextual integrity.

   Contexts can be divided into the privacy of the person to whom the records correspond (Respondent privacy); keeping private the information owned by a corporation in market conditions (Data Owner privacy); and the privacy of the records that individuals create as part of the everyday data trail of their online activities (User privacy).

   Information confidentiality depends upon whose privacy is being sought and different tactics can be employed to assure privacy under different conditions.

Privacy Dimension

Privacy Method Implementation Structure Details
Respondent Privacy Statistical Disclosure Control (SDC)

 

Modification of the data by recoding variables or by perturbing (damaging) the data by changing sensitive values.  GSS Advice on SDC

 

Record Pseudonymisation

 

Pseudonymise the most identifying fields in order to maintain anonymity of the respondent and retain structure in the database.
Data Owner Privacy Privacy Preserving Data Mining (PPDM)

 

PPDM preserves data privacy for the owner whilst maintaining the utility of data for the production of information and insight that is of general use. See my post here.
User Privacy Private Information Retrieval (PIR) PIR mechanisms allow a user to retrieve information from a database while maintaining the privacy of the queries from the database. See Wikipedia

 

Database Roles

 

Configuring the database to ensure that individuals can access their own records and data administrators cannot easily access sensitive information.

   However, the presence of the three dimensions (Respondent, Owner, and User) makes the correct privacy controls more difficult to apply. Context defines the boundaries of acceptable data disclosure and certain contexts, such as healthcare records, demand both respondent and data owner types of privacy. The problem comes in how to create privacy controls that are appropriate for boundary setting across more than one type of context.

Cross-Context Privacy Controls

   Where cross contextualisation of privacy concerns are an issue it is possible that additional disclosure protection may be required. The transfer of sensitive data can be protected by layering these additional data level protections onto the output(s) of a database query.

Protection Level Method Related Articles Drawbacks
Highest Encryption

symmetric or asymmetric

‘Gold Standard’ protection

Encryption Key matching; Processing overhead; Key management
Anonymisation

Strong Identity Protection

k-anonymity Not applicable to all data sets; re-identification risk; need to know data shape
Pseudonymisation

‘Golden mean’ Protection

Pseudonym service needed; quality assurance; user management
Lowest Masking

Basic Protection

Dynamic Data Masking Key Matching; reusability in large/ complex sets

   For example, data protection using K-anonymity could be used to provide both respondent and owner privacy, and when combined with Private Information Retrieval could provide an implementation of all three privacy protection dimensions.

Summary

   When dealing with privacy issues in databases the guardians of the shared data store should first ask ‘Who is the privacy control here to protect?’ Answering this will suggest some of the sharing  data implementations that are presented here.

   Having answered the first question, the second question should aim to answer the question ‘How are the data protected across contexts ?’ The answer to the second question should give clarity on implementation data controls to protect the data from disclosure where multiple contexts are likely to be in play.

References:

[1]: Domingo-Ferrer, J. and Torra, V., 2008, March. A critique of k-anonymity and some of its enhancements. In 2008 Third International Conference on Availability, Reliability and Security (pp. 990-993). IEEE.

[2]: Barth, A., Datta, A., Mitchell, J.C. and Nissenbaum, H., 2006, May. Privacy and contextual integrity: Framework and applications. In 2006 IEEE Symposium on Security and Privacy (S&P’06) (pp. 15-pp). IEEE.