Artificial Intelligence: accuracy principle in the processing activity
The performance of an algorithm, among others artificial intelligence algorithms, could be compromised by the inaccuracy of the input data when running it, not only by the data used for development. Therefore, in the assessment of a processing activity that embeds an AI system, it is needed to assess the accuracy of the input data, as it could introduce biases and compromise the performance not only of the algorithm, but of the whole processing activity. Suitable safeguards that prevent inaccuracy and protect from the impact of inaccurate input data must be included “by design” in the implementation of the processing activity and its performance should be reviewed and updated where necessary.
Trying to move forward through the artificial intelligence AEPD posts’ line, we are trying to illustrate the GDPR accuracy principle by analyzing a true story.
An elderly grandmother was taken by her relatives to the hospital. She looked in a critical situation. In the front desk, the anxious relatives were answering the questions issued by the clerk in a fast check-in. One of the questions was “Is she dependent on the relatives?”. The grandmother lived with the relatives, and they answered “Yes”. Then, the assessment algorithm of the triage processing was executed, and, as result, the grandmother was taking apart to get just palliative care. At that moment, one of the relatives, who was a prominent scientist, realized about the mistake. The question was misunderstood, in part due to the use of a euphemism, when the actual question should be “Is she disabled?”. It wasn’t the case, she lived with her relatives, but she was an independent woman running her own life. Even more, that answer was key in the decision of the triage process resulting from the running of the algorithm.
What should they do, when the decision was already made, and nobody was attending their claims? Thanks to his contacts, the professor was able to “hack” the algorithm, he got an immediate surgery that saved the grandmother, and she was back to her normal life in a week.
Triage algorithms are an essential part of the emergency health system, which must optimize resources in order to save lives. In this case, the triage algorithm gathered some data to offer a guidance in the decisions of the health professionals. However, in the execution of the triage processing in the current example something went wrong. Let’s see what we could learn from the point of view of the GDPR that could be useful in the use of algorithms, in particular those based on AI systems embedded in processing activities.
From the case presented, two key points could be highlighted. First one, the whole processing activity is more than the execution of the algorithm. Second one, a proper implementation of the accuracy principle (Article 5(1)(d) GDPR) is essential in the execution of the algorithm and in the performance of the whole processing activity.
Regarding the first point, it is important to take into account that the assessment of a single operation of the processing activity (i.e., the assessment algorithm) is not equivalent to the assessment of the whole processing activity. In this case, the assessment of the algorithm used in the triage processing should just be a part of the entire assessment, which includes operations such as data gathering operations, data checking, human involvement and the way in which decisions are executed, reviewed and contested.
Regarding to the accuracy principle, we can identify several key points from the GDPR point of view:
- A lack of definition of the input data could lead to errors or biases that are not part of the algorithm itself.
- The accuracy principle should be implemented in the input data, the output data, and even in the intermediate data of the whole processing activity.
- The precise definition of every input data (its semantics) must be set up “by design” and properly documented. Even more, the value range (for example, “yes/no”, “0 to 10” or “high/medium/low”) should be defined and assessed in the context of the processing.
- The impact of every input data in the final result should be assessed “by design”, for each specific purpose, by conducting an analysis of the implemented algorithm, by the verification tests of the requirements, and by the validation tests in the context of operation.
- The input data could be gathered manually from the data subjects. In such case, the data subjects (and the those than gather data) should know and understand the semantics of the data and the impact of their answer. In the current case, the right questions should had been raised in order to get the accurate input data needed by the triage processing, and the data subjects (the grandmother’s relatives in this case) should understand what was asked and how their answer could change the triage result.
- The input data to the algorithm used in the processing could be gathered from other sources such as databases, sensors like cameras, fingerprint readers or IoB. Data could undergo several transformations from the gathering stage to the algorithm execution which are also part of the processing activity. All those operations are part of the processing activity, together with the algorithm.
- “Every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay” (Article 5(1)(d) GDPR), it means, for each specific purpose and during the operation of the processing activity.
The most precise algorithm could fail because other element of the processing activity is not working properly. Even, a less precise algorithm could work good enough if the right safeguards to manage malfunctions are included by design.
It means two important consequences. First, the design of the whole processing activity involves more elements that just “the algorithm”. Second, the designer of the whole processing must accept that “perfection” doesn’t exist. Therefore, the whole processing should implement measures that “shall be reviewed and updated where necessary” (Article 24(1) GDPR) and, mainly when the processing activity “could produce legal effects concerning him or her or similarly significantly affects him or her” (Article 22(1) GDPR), there must be implemented “suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least [but not exclusively] the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision” (Article 22(3) GDPR).
Finally, we would like to point out that we haven’t described whether the triage processing was implemented in a manual way or by automatic means. In case of automatic means, it could be developed by a traditional software development or by Machine Learning development. But it is important to keep in mind that, although humans sometimes run algorithms like robots, you can at least talk to humans to make them come to their senses.
This post is related with other material released by the AEPD’s Innovation and Technology Division, such us:
- Post AI: System vs Processing, Means vs Purposes
- Post Federated Learning: Artificial Intelligence without compromising privacy
- GDPR compliance of processing that embed Artificial Intelligence. An introduction
- Audit Requirements for Personal Data Processing Activities involving AI
- 10 Misunderstandings about Machine Learning
- Reference map of personal data processing that embed artificial intelligence