DINUM, the inter-ministerial directorate of digital technology, responsible for the state’s digital strategy, last week published an open source tool using artificial intelligence to pseudonymize documents. Developed by Etalab’s AI Lab, the API is available for testing as well as a pseudonymization guide that details the steps of the process. The user loads or drags a document into the window of the web page, such as .doc, .docx or .txt of a maximum size of 100KB. The tool detects personal data in the text and marks the identified features that are displayed with green or red color codes.
This initiative is part of the Digital Republic Act, enacted in October 2016, which makes the opening of public data, Open data, the default rule. And since administrations are required to hide personal data, pseudonymization is an essential step. It consists of a processing of personal data carried out in such a way as to hide the identification data of individuals without the use of additional information.
In practice, pseudonymization involves replacing the directly identifying data (name, first name, etc.) of a dataset with indirectly identifying data (aka, number in a ranking, etc.). In practice, however, it is often possible to find the identity of these people through third-party data. This is why pseudonymized data remains personal data. The pseudonymization operation is reversible, unlike anonymization.