AI voice transcription: Implications for data protection
Artificial intelligence is an engine of innovation, enabling increased quality and productivity in the workplace and improving efficiency across various areas. Among these, we find voice transcription services, which have raised some questions among data controllers.
Picture by Immo Wegmann in Unsplash
It is a reality that artificial intelligence has revolutionized many sectors thanks to its ability to perform different processes and the fact that this technology is increasingly accessible to any individual or company. Among the different applications of AI in business, this article focuses on analysing the case of AI voice transcription services, and their implication in data protection.
As a general rule, a person's voice must be considered personal data, insofar as it can identify or make identifiable a natural person, directly or indirectly, taking into account the context, the reasonably usable means and the state of the art. The voice has its own features that, in certain circumstances, may allow the identification of a person, in particular by those who know the speaker or when reference samples are available. Its identification capacity is not homogeneous or automatic in all cases, as it may be anonymized using different techniques.
The voice will be associated with information related to the inherent content of the communication, as well as metadata specific to digital services, for example, the telephone number from which the call is made, the connection from which the voice is transmitted (IP address), information about the use of the application (cookies) or other types of information. This additional information also constitutes personal data and must be taken into account when ensuring compliance with the GDPR.
Focusing only on voice processing, in AI-based transcription systems, it is possible, but not mandatory, to find two processing operations with different purposes when:
- On the one hand, voice transcription is carried out using an AI-based system, for example, to produce minutes of meetings.
- On the other hand, an AI system could be fine-tuned using voice samples, either by the controller itself or, more commonly, by the transcription service provider.
A controller that decides to incorporate a voice transcription service must act diligently when determining and ensuring the protection of data subjects’ rights when selecting systems or processors with whom new processing relationships are established. The controller must exercise due care in selecting those that provide clear information on any additional processing operations and their respective controllers, on confidentiality guarantees (regarding the voice, metadata, and the content of the message), and on security measures, including, where applicable, how retraining is carried out and its lawfulness, on retention periods, on data minimisation —particularly the minimisation of metadata—, on processors and data location, as well as on all other guarantees and measures necessary to comply with the GDPR in general and, in particular, with the requirements of Article 28 of the Regulation.
With regard to possible additional processing operations, the organisation using the system must determine whether any type of processing is carried out that infers emotions, thoughts, beliefs, health status, biometric identification, etc. Transcription services that include emotion detection, or the inference of any special category of data, will be subject to a strict data protection regime and may even involve the use of AI systems prohibited under Regulation (EU) 2024/1689 on Artificial Intelligence
On the other hand, the person responsible for the transcription is usually not the one who carries out the maintenance or further development of the AI system using personal data. Therefore, if the transcription system’s data are used to retrain the AI system, the entity that performs such processing for its own purposes assumes the role of data controller under the GDPR. In addition, its legal basis will be different from that of the transcription processing. In these cases, it should be noted that it is common practice for the transcription system’s training to be supervised, which means that the voice will be listened to by third parties and the content transcribed manually to carry out the tasks related to the adjustment of the model.
Applying the purpose limitation principle, the controller that determines the use of a transcription system must identify the purposes for each processing, which must be specific, explicit and legitimate. The controller should also consider the necessity of including a transcription operation within the processing and comply with the remaining obligations under the GDPR.
The legal bases for processing could be, depending on the case, the performance of a contract, legitimate interest, consent or others for special cases, for example, in emergency services. In the event that the basis is consent, it must be given through a clear affirmative act that reflects a freely given, specific, informed and unambiguous indication of agreement to accept each processing, and it must not be an action activated by default when using the service. The user must not suffer any type of prejudice for not giving consent or for withdrawing it at any time; withdrawal must be possible in a manner as simple as giving consent before the processing began. Consent could be appropriate for certain professional relationships, however, it should be necessary to assess whether such consent could be impaired in an employee-employer relationship or in dealings with Public Administrations.
In many other cases, when legitimate interest is used as a legal basis, it is reminded that this interest must be assessed in a way that demonstrates it has been balanced against the rights and freedoms of the data subject.
Once the organization has fulfilled its due diligence obligations, determined the lawfulness of the processing (to the extent that it may only use the system under the platform's conditions), and selected the most appropriate service, it must adequately inform the data subject whose voice is to be recorded and processed. In accordance with the principles of transparency and data protection by default, and before carrying out the processing, the data subject must be aware, among other things, of the use of the service, whether additional processing will take place and its nature, whether third parties will listen to their conversation (for example, in retraining), as well as their rights and the risks associated with this service, specifically regarding the exercise of their rights to rectification and erasure.
To conclude, regarding the processing of transcription, the GDPR would not apply in the case of synthetic voices, or when measures have been taken to modify the voice at the source to eliminate identifiability, including unlinking it from other identifying information, which is common in some Internet services. It should be reiterated, however, that the GDPR would apply to the service’s metadata and to the content of the communication to the extent that they are related to a natural person.
This blog post is related to other materials published by the AEPD’s Innovation and Technology Division, such as:
- Blog post Data and Information in Artificial Intelligence [dec 2024]
- Blog post Artificial Intelligence: Transparency [sep 2023]
- Audit Requirements for Personal Data Processing Activities involving AI [jan 2021]
- GDPR compliance of processing that embed Artificial Intelligence. An introduction [feb 2020]
- Blog post Consent receipt: A tool for transparency and proactive accountability [feb 2020]