Anonymize DICOM data

This example is a starting point to anonymize DICOM data.

It shows how to read data and replace tags: person names, patient id, optionally remove curves and private tags, and write the results in a new file.

# authors : Guillaume Lemaitre <g.lemaitre58@gmail.com>
# license : MIT

from __future__ import print_function

import tempfile

import pydicom
from pydicom.data import get_testdata_files

print(__doc__)

Anonymize a single file

filename = get_testdata_files('MR_small.dcm')[0]
dataset = pydicom.dcmread(filename)

data_elements = ['PatientID',
                 'PatientBirthDate']
for de in data_elements:
    print(dataset.data_element(de))

Out:

(0010, 0020) Patient ID                          LO: '4MR1'
(0010, 0030) Patient's Birth Date                DA: ''

We can define a callback function to find all tags corresponding to a person names inside the dataset. We can also define a callback function to remove curves tags.

def person_names_callback(dataset, data_element):
    if data_element.VR == "PN":
        data_element.value = "anonymous"


def curves_callback(dataset, data_element):
    if data_element.tag.group & 0xFF00 == 0x5000:
        del dataset[data_element.tag]

We can use the different callback function to iterate through the dataset but also some other tags such that patient ID, etc.

dataset.PatientID = "id"
dataset.walk(person_names_callback)
dataset.walk(curves_callback)

pydicom allows to remove private tags using remove_private_tags method

dataset.remove_private_tags()

Data elements of type 3 (optional) can be easily deleted using del or delattr.

if 'OtherPatientIDs' in dataset:
    delattr(dataset, 'OtherPatientIDs')

if 'OtherPatientIDsSequence' in dataset:
    del dataset.OtherPatientIDsSequence

For data elements of type 2, this is possible to blank it by assigning a blank string.

tag = 'PatientBirthDate'
if tag in dataset:
    dataset.data_element(tag).value = '01011900'

Finally, this is possible to store the image

data_elements = ['PatientID',
                 'PatientBirthDate']
for de in data_elements:
    print(dataset.data_element(de))

output_filename = tempfile.NamedTemporaryFile().name
dataset.save_as(output_filename)

Out:

(0010, 0020) Patient ID                          LO: 'id'
(0010, 0030) Patient's Birth Date                DA: '01011900'

Total running time of the script: ( 0 minutes 0.013 seconds)

Gallery generated by Sphinx-Gallery