The Data Science Competence Center (CCSD) of the University of Geneva is pleased to invite you to the fourth edition of the Data Science Seminars, exploring data protection and ownership in academic research.
The ongoing process of digitalization in contemporary societies is accompanied by the emergence of new challenges related to the regulation of data and its use. At national and regional levels, all over the world, legal regimes are being set up to provide a framework for data protection and ownership. This reinforcement of protection mechanisms is not without consequences for the scientific world. Whether it is at the level of data collection, or at the level of data analysis, researchers must comply with complex rules, the understanding of which often goes beyond the scope of their scientific knowledge. This requirement is all the more generating uncertainty as data protection is being strengthened simultaneously with the rise of Open Science policies favored by a growing number of donors. Such a situation contributes to making law, and in particular digital law, an increasingly central field of expertise in data science.
Through concrete examples drawn from their research, the speakers at this seminar will share the challenges they have encountered in dealing with data ownership and data protection, and the technical solutions and good practices they have developed to navigate these issues. These presentations will notably highlight the technical and legal challenges inherent to the automatic collection of web data on social networks. In front of large internet platforms, gathering data for research purposes often comes up against legal obstacles related to ownership. The question of data protection is also central when it comes to the analysis and publication, and force to find innovative solutions, such as the development of images de-identification methods and tools, as is the case at Geneva University Hospitals. In order to rigorously anchor these challenges, technical solutions and best practices in the current law, this seminar will close with a legal perspective on these issues by a researcher from the Digital Law Center.
11 May 2021, 12h15-14h00.
The Data Science Seminar will take place online via Zoom.
Anonymize to Legitimize: Analysis of Facebook Public Conversations under Covid-19
Matteo Tarantino, Institute for environmental sciences.
In many countries, Facebook has become the biggest mainstream platform for public discourses. Particularly in times of crisis - such as the Covid-19 Pandemic - the urgency of understanding the patterns of development and circulation of Facebook discourses is even higher, as alignment of understandings and needs is essential to produce effective responses. In most cases, given the volume of exchanges, automatic extraction and processing of such discourses appears as the only viable methodology to approach such studies. However, content – along with user data - is a key asset of the Facebook enterprise. This has increasingly led the company to adopt explicit and implicit measures against third-party extraction and analysis of such data, hindering the full understanding of phenomena such as fake news spreading by academia and institutions.
Drawing from a research project on public Facebook discourses during the first three months of the COVID-19 Pandemic in Italy, this presentation examines the struggles to balance various stakeholders – including the scientific community, the public good, the legal institutions, Facebook, and Facebook users. The resulting solutions raise the cost of research and entail the continuous development of ad-hoc software, anonymization measures and partial agreements with Facebook. At the same time, particularly in this time of crisis, they point towards the need for developing a viable understanding of social media data that also conceptualizes it as a public good.
Challenges of patient consent and data protection for the use of medical images in research
Nicolas Roduit, Geneva University Hospitals.
Medical imaging is becoming a major part of data required for clinical trials and medical research. The rapid evolution of machine learning and deep learning tools applicable to large data collections can potentially open the way to new generation of image analytics and feature extractions.
The main factor limiting the development of these analysis techniques is the lack of access to sufficiently large sets of structured and well-documented imaging data. Besides, regulatory constraints prevent the usage of existing imaging data without formal patient approval. Traditional informed consent requires that the patient be informed of the purpose and goals of the research performed with the data. To overcome this constraint, regulatory bodies have promoted new concepts of “general consent” allowing patients to contribute to the development of open access image collections.
Another constrain is the data de-identification that requires adaptable algorithms for medical imaging data. A special platform has been developed to perform rule-based imaging data de-identification that complies with international guidelines and is adaptable to local ethic committees’ requirements.
Legal barriers and solutions to data access
Yaniv Benhamou, Faculty of Law, Digital Law Center.
Data access, which is key to enable innovation, may be restricted by legal barriers, as data may be subject to multiple and sometimes conflicting legal and appropriation regimes, such as data protection for personal information, intellectual property for software and databases, or trade secrets and contract for confidential data. Consequently, it is important to develop a clear and coherent body of law that organizes the different relationships among all stakeholders and legal regimes that grant different ownership rights, what can be called a “holistic approach of data“. Three (3) specific research questions / solutions for data access will be mentioned, (i) what qualifies as personal data, (ii) data altruism and (iii) the overlaps / interface between different legal regimes, in particular in the cases of mixed datasets.