What can we learn from genomic privacy research that can be applied to GDPR?

On the 1st of March a meeting devoted to the discussion around the balance between the need to protect sensitive data and the need to open them for research to foster scientific progress took place at the European Parliament. The panelists were Paulo Silva from the European Commission, Alea López de San Román from the League of European Research Universities, Prof. Lars Juhl Jensen from the University of Copenhagen, Prof. Erman Ayday from the Bilkent University and Dr. Pawel Szczesny from the Open Science Foundation. The meeting was hosted by Michal Boni, Member of the European Parliament (EPP).


The importance of open data in life sciences is hard to overestimate. An unrestricted access both to raw data and to computational services based on that data foster scientific research. However, the further we reach to open scientific resources, the closer we are to the natural borders of that process. Already information about locations of endangered species or genomes of deadly bacteria is not public due to justified apprehension of unintended consequences. Obviously huge amount of sensitive data (such as medical records, or behaviour patterns) are stored in closed silos complying with privacy legal requirements. It is also expected that these closed resources will grow as huge stream of personal data from wearable sensors, both from commercial applications and from citizen science projects is about to appear. One of the most special cases of sensitive data is human genomes.

The genome of a person is a unique, static and valuable resource that can be abused way beyond the life of a single person, as plenty of its data are inherited and heritable. The current research on the genomic privacy clearly show the failure of anonymization methods, temporality of the security measures and, what is the most important, the lack of appropriate legal framework to protect the donors of genetic information from the consequences of sharing such data. On the other hand, large-scale human genome sequencing is needed for an advance in fighting and preventing major diseases such as cancers or diabetes. Does this mean that the promise of breakthroughs in life sciences that were to stem from availability of big data will not be fulfilled?

Not necessarily, as the field of genomic privacy has already generated several solutions that provide the possibility to analyze the genomic data without releasing the data and without compromising the privacy of data donors. And the same solutions, but at a larger scale, can be applied to provide researchers with access to computations on sensitive data of many different kinds. To benefit fully from computational access to sensitive data, it is important to remember that no resource holds all available data of particular kind. Even for open data, researchers routinely consult different databases, as the process of adding new records and maintaining the quality of the resources differs from institution to institution. For sensitive data it is even harder to aggregate different data because of legal requirements. Therefore, the crucial element of providing computational access to sensitive data for research purposes is development of unified standard of communication between these resources. The work on the initiative that aims at providing unified access to computations on closed data for life sciences called Compute Commons has just started.

The recommendations in view of the implementation of the General Data Protection
Regulation were as follows:

  1. It is important to develop maximally unified legal framework for research data between Member States. Too many options for closing research data will yield the projects of Europe-wide research impossible due to legal incompatibilities.
  2. It is important to start a systematic research on the privacy and security of sensitive data. It will not only allow developing appropriate measures, but also will let data owners be less restrictive about the security of some types of data.
  3. It is important to develop a parallel process of creating technology counterpart to the GDPR. The same process of unification that occurs during work on the GDPR should be mirrored by a unification of standards of security and computations on sensitive data across Member States. European Data Protection Board could have a role in this process.
  4. It is important to extend the spectrum of stakeholders when discussing the implementation of the GDPR by research organisations and academic societies directly involved with open data and open access to scientific literature. These communities have the most up-to-date knowledge about the innovations and opportunities stemming from unrestricted access to scientific resources

Comments are closed.