Regardless of the position of the UK in or out of the European Union organisations will have to ensure that their data processing and handling is compliant with the General Data Protection Regulation (GDPR) which will come into effect on 25th May 2018 and supersedes the existing Data Protection Act (DPA).

Whilst this may appear to be a long way off, it will require organisations to ensure they are handling data in ways that are appropriate. The GDPR builds on existing legislation, but also builds on the protections afforded in several key areas. https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

The legislation will affect organisations that process personal data relating to individuals resident in the EU and will apply to organisations based in the EU, OR who process data relating to individuals resident in the EU. These more stringent regulations are backed up by stiffer penalties of up to 20 million Euros or up to 4% of the previous year turnover, and it is the risk of these sanctions that are making information security a top priority for companies in the UK.

The GDPR will afford greater protection to data subjects in many areas, and organisations should be taking steps now to fully understand the implications of this legislation on their data processing and transfer life cycles to assess the impact (I briefly touched on data life cycles in this post here..).

The existing DPA principle as it relates to the minimisation of data requires that personal data shall be “adequate, relevant, and not excessive in relation to the purpose or purposes for which they are processed”. The GDPR requirement, however, states that personal data should be “adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed”.

Although the statements sound similar, it is the introduction of the word “necessary” into the statement which means that the data collector should only be processing that data that is needed and required as opposed to using data that is nice to have or beneficial to the data processing organisation.

Organisations should be reviewing their personal data processing operations to see whether there are other options to allow data processing to take place without infringing the obligations under the GDPR, and also to examine whether each item of personal information is necessary to carry out the processing tasks.

Organisations using databases for processing can take advantage of some or all of the following techniques for minimising the data held by utilising built in functions or other third party add-ins to process data without holding unnecessary personal data:

  1. Aggregation: Utilising arithmetic operations such as ‘SUM’ or ‘COUNT’ options can allow data analysts to avoid processing personal data. An example would be a banking institution utilising the number of credit cards held by a customer for decision processing rather than using each card as they would for operational purposes.
  2. Bundling: Personal data records can be bundled by utilising the ‘GROUP BY’ and ‘DISTINCT’ operations to similarly avoid the use of personal data for processing. In some cases it may be preferable to collect grouped data by attributes to maintain the privacy protections of the data. The outcome of bundling data would be similar to aggregating records, but by using business groupings like age groups it protects data that relates to individuals.
  3. Pseudonymisation: This technique is often used in health contexts to maintain patient confidentiality when datasets are utilised for secondary purposes like population health studies. In automated collection systems the data can be pseudonymised either at the source (e.g. by using client side operations to obfuscate the data by generating a GUID before it reaches the data collector) or centrally (where the data is pseudonymised after data collection). In some cases it is not a one size fits all solution, and techniques can be combined to best effect.
  4. Anonymisation: Resources on how data can be effectively anonymised whilst remaining fit for purpose are obtainable from the UK Anonymization Network (http://ukanon.net/). Dependent upon the datasets you are employing there are several options that can be considered. As an example, for many financial institutions it is acceptable to anonymise credit card information by masking all but the first 6 or last 4 digits of a card number. Users of SQL Server 2016 can utilise the dynamic data masking operations to protect sensitive data within the database.

Although there is some degree of automation possible in the management of personal and sensitive data organisations still need to weigh up ensuring anonymisation of the data whilst still retaining the utility of the processing and analysis. It has been argued that data can be useful or anonymous, but not both.

Loss of utility and usefulness in datasets as a result of aggregation can be quite marked in cases where there is interdependence present in the data. As an example, consumer product preference may be influenced by other consumers, and aggregating a number of records would mask this important difference. Increasingly where data is used to individualise the customer experience there will be cases where aggregating will lead to loss of individual analysis.

It is also important to ensure that the data sets cannot be re-identified through further processing. As an example, over 87% of a US health data set was able to be re-identified utilising only the postal code, gender and age of the subjects.

It is advisable to implement best practice guidelines for data de-sensitisation where these are available, such as the HIPAA Safe Harbor or the Expert Determination Methods to ensure that re-identification is not possible in datasets, especially where combination with other publicly available data such as electoral roll or spatial data could lead to personal or sensitive data being disclosed.

The regulations will apply compliance across all datasets employed in processing so companies should also look at utilisation of data across the board, including that held in cloud data lakes, in the ‘shadow’ processing undertaken as part of spreadsheets, backups and development and test data environments.

Data minimisation techniques can assist GDPR compliance and reduce the risk that organisations   suffer breaches in the disclosure of personal and sensitive data. However, the problems of aggregating to protect sensitive data are hampered by a lack of guidance for each of the situations that will demand it. Although guidance is available from public bodies like the Information Commissioner it will fall to the management team to choose the best practice for data minimisation according to their data processing practices.

It should be implemented by developing corporate data policy, information security, data governance and transparency to ensure that compliance to the new regulations with respect to the data minimisation rules is attained.

Consideration of other GDPR regulations regarding obligations under the EU ‘right to be forgotten’ and the implementation of a ‘privacy by design’ culture, alongside data breach reporting will also need to be considered by the data collection and processing organisation.

One useful source of information in regards to information handling for organisations based in the UK comes from the Information Commissioners Office at https://ico.org.uk/for-organisations/

This article forms the first part of a short series looking into the areas where GDPR presents changes and challenges of implementation.

Related Posts:

The GDPR and Data Portability

The GDPR and Trust in Business

Privacy by Design and Default 

The GDPR and Privacy Impact Assessments