deid

As we’ve discussed, the basic actions of using header filters to flag images, and performing actions on headers (for replacement), are controlled by a text file called a deid recipe. If you want a reminder about how to write this text file, read here, and we hope to at some point have an interactive way as well (let us know your feedback!). The basic gist of the file is that we have different sections.

In this small tutorial, we will walk through the basic steps of loading a recipe, interacting with it, and then using it to replace identifiers. If you want to jump in, then go straight to the script that describes this example.

Data

To run these examples, you’ll need to install external deid-data.

$ pip install deid-data

Recipe Management

The following sections will describe creating and combining recipes.

Create a DeidRecipe

We will start with how to work with a DeidRecipe object. If you aren’t interested in this use case or just want to use a provided deid recipe file, continue to the next section.

We start by importing the class, and instantiating it.

from deid.config import DeidRecipe
recipe = DeidRecipe()
WARNING No specification, loading default base deid.dicom

Since we didn’t load a custom deid recipe text file, we get a default warning message that a default is being use. That default is a dicom base provided by the library. If you want to see the raw data structure that is loaded, look here:

recipe.deid

You can also double check the recipe format. We currently only support dicom, but this could in the future be other image formats (seriously, open an issue)!

recipe.get_format()
# dicom

Note that validation of this structure happens at load time. If something is incorrectly labeled or formatted, you will get an error message and it will fail to load. You can also provide your own deid recipe file, and in doing so, you won’t load the default. Here is one from our examples folder

wget https://raw.githubusercontent.com/pydicom/deid/master/examples/deid/deid.dicom

and in Python…

deid_file = os.path.abspath('deid.dicom')
recipe = DeidRecipe(deid=deid_file)

I would strongly recommended starting with an example, and building your custom recipe from it. If you have an example that you think others would find useful, please contribute it to the repository in the examples folder.

Combine Recipes

You can also choose to load the default base with your own recipe. In this action, the two recipes are combined, with any conflict (an overlap in the second) being given preference. For example, if the first deid you load removes a field and the second adds the same field, the final result will have it added. Keep this in mind and take care when combining recipes for this reason. Here is how it would look to load the default base and provide you custom file:

recipe = DeidRecipe(deid=deid_file, base=True)

You can also specify a different base entirely, and this would be equivalent to just providing a list of deid files:

recipe = DeidRecipe(deid=[deid_file1, deid_file2])
recipe = DeidRecipe(deid=deid_file1, base=True, default_base=deid_file2)

When we load bases, we are looking in the data folder provided by the module. The base is the deid. in this folder. So for example, if we wanted to use `deid/data/deid.dicom.chest.xray` we would specify:

# Use dicom.xray.chest as a base
recipe = DeidRecipe(deid=path, base=True, default_base='dicom.xray.chest')

# Use dicom.xray.chest as the only one
recipe = DeidRecipe(deid='dicom.xray.chest')

# On top of the default base, deid.dicom
recipe = DeidRecipe(deid='dicom.xray.chest', base=True)

This data folder is to encourage sharing! It often is a lot of work to develop a criteria specific for your group or interest. If you have a general recipe that others might use, please contribute it.

Sections

Now let’s discuss the sections that a recipe can include, including a header, labels, filters, and groups for lists of values or fields.

Filters

The process of flagging images comes down to writing a set of filters to check if each image meets some criteria of interest. For example, I might create a filter called “xray” that is triggered when the Modality is CT or XR. The filters are found in the %filter sections of the deid recipe.

First, to get a complete dict of all filters (a dictionary with keys corresponding to filter group names and values the filters themselves) we can do the following actions:

recipe.get_filters()

# To get the group names
recipe.ls_filters()
# ['whitelist', 'blacklist']

# To get a list of specific filters under a group
recipe.get_filters('blacklist')

Header Actions

A header action is a step (e.g., replace, remove, blank) to be applied to a dicom image header. The headers are also part of the deid recipe. You don’t need to necessarily use header actions and filters at the same time, but since it’s nice to keep things tidy for a single dataset using a shared file, we support having them both represented in the same file. You could just as easily keep them in separate files to load separately - a DeidRecipe is not required to have header actions and/or filters.

First, let’s load the default deid recipe file (deid.dicom in the data folder) that we know has a %header section.

recipe = DeidRecipe()

Here is how to get and interact with actions defined by the recipe.

# We can get a complete list of actions
recipe.get_actions()

# We can filter to an action type
recipe.get_actions(action='ADD')

#[{'action': 'ADD',
#  'field': 'IssuerOfPatientID',
#  'value': 'STARR. In an effort to remove PHI all dates are offset from their original values.'},
# {'action': 'ADD',
#  'field': 'PatientBirthDate',
#  'value': 'var:entity_timestamp'},
# {'action': 'ADD', 'field': 'StudyDate', 'value': 'var:item_timestamp'},
# {'action': 'ADD', 'field': 'PatientID', 'value': 'var:entity_id'},
# {'action': 'ADD', 'field': 'AccessionNumber', 'value': 'var:item_id'},
# {'action': 'ADD', 'field': 'PatientIdentityRemoved', 'value': 'Yes'}]

# or we can filter to a field
recipe.get_actions(field='PatientID')

#[{'action': 'REMOVE', 'field': 'PatientID'},
# {'action': 'ADD', 'field': 'PatientID', 'value': 'var:entity_id'}]

# and logically, both
recipe.get_actions(field='PatientID', action="REMOVE")
#  [{'action': 'REMOVE', 'field': 'PatientID'}]

# If you have lists of fields or values defined, you can retrieve them too
recipe.get_fields_lists()
# OrderedDict([('instance_fields',
#              [{'action': 'FIELD', 'field': 'contains:Instance'}])])

recipe.get_values_lists()
# OrderedDict([('cookie_names',
#              [{'action': 'SPLIT',
#                'field': 'PatientID',
#                'value': 'by="^";minlength=4'}]),
#             ('operator_names',
#              [{'action': 'FIELD', 'field': 'startswith:Operator'}])])

recipe.get_values_lists("cookie_names")
# [{'action': 'SPLIT', 'field': 'PatientID', 'value': 'by="^";minlength=4'}]

If you have need for more advanced functions, please file an issue.

Replace Identifiers

The %header section of a deid recipe defines a set of actions and associated fields to perform them on. As we saw in the examples above, we could easily view and filter the actions based on the header field or action type. For this next section, we will pretend that we’ve just extracted ids from our data files (in a dictionary called ids) and we will prepare a second dictionary of updated fields.

In this first step, let’s import needed functions and load a set of cookie dicoms!

from deid.dicom import get_files, replace_identifiers
from deid.utils import get_installdir
from deid.data import get_dataset
import os

# This will get a set of example cookie dicoms
base = get_dataset('dicom-cookies')
dicom_files = list(get_files(base))

Here is the function to get identifiers

from deid.dicom import get_identifiers
ids = get_identifiers(dicom_files)

Remember, the data above probably has PHI in it (e.g., a real PatientID and at this point you might save them in your special (IRB approvied) places, and then do some action to provide replacement anonymous ids to put back in the data. We provide a cookie tumor example of doing this below.

# Load the dummy / example deid
path = os.path.abspath("%s/../examples/deid/" %get_installdir())
recipe = DeidRecipe(deid=path)

We can quickly double check the actions that are defined

recipe.get_actions()

[{'action': 'ADD', 'field': 'PatientIdentityRemoved', 'value': 'Yes'},
 {'action': 'REPLACE', 'field': 'PatientID', 'value': 'var:id'},
 {'action': 'REPLACE', 'field': 'SOPInstanceUID', 'value': 'var:source_id'}]

The above says that we are going to:

We have 7 dicom cookie images we loaded above, so we have two options. We can either loop through the dictionary of ids and update values (in this case, adding values to be used as new variables) or we can make a new datastructure. Let’s be lazy and just update the extracted ones

updated_ids = dict(); count=0
for image, fields in ids.items():
    fields['id'] = 'cookiemonster'
    fields['source_id'] = "cookiemonster-image-%s" %(count)
    updated_ids[image] = fields
    count+=1

You can look at each of the updated_ids entries and see the added variables

updated_ids

...

  'id': 'cookiemonster',
  'source_id': 'cookiemonster-image-2'}}

And then use the deid recipe and updated to create new files

cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                    deid=recipe,
                                    ids=updated_ids)

To check your work, you can load in a cleaned file to see what was done

from pydicom import read_file
test_file = read_file(cleaned_files[0])

# test_file
# (0008, 0018) SOP Instance UID                    UI: cookiemonster-image-1
# (0010, 0020) Patient ID                          LO: 'cookiemonster'
# (0012, 0062) Patient Identity Removed            CS: 'Yes'
# (0028, 0002) Samples per Pixel                   US: 3
# (0028, 0010) Rows                                US: 1536
# (0028, 0011) Columns                             US: 2048
# (7fe0, 0010) Pixel Data                          OB: Array of 738444 bytes

And finally, a few extra customizations for different output folders and settings.

# Different output folder
cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                    ids=updated_ids,
                                    output_folder='/home/vanessa/Desktop')

# Force overwrite (be careful!)
cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                    ids=updated_ids,
                                    output_folder='/home/vanessa/Desktop',
                                    overwrite=True)

Groups

More advanced usage of header actions would be to define a group of values (the content of the header fields) or field names (the names themselves) to use in an action. This corresponds to %fields (a list of fields) and %values (a list of values from fields) to parse at the onset of the dicom load, and use later in a recipe. Here is how that might look in a recipe:

FORMAT dicom

%values cookie_names
SPLIT PatientID by=" ";minlength=3

%values operator_names
FIELD startswith:Operator

%fields instance_fields
FIELD contains:Instance

%header

ADD PatientIdentityRemoved YES
REPLACE values:cookie_names var:id
REPLACE values:operator_names var:source_id
REMOVE fields:instance_fields

In the above, we define two lists of values (operator_names and cookie_names) and a list of fields (instance_fields). The sections read as follows:

And then in our %header section we take the following actions:

Let’s give this a try with an example. We’ll load a recipe, and then look at the loaded deid (recipe.deid).

from deid.config import DeidRecipe
recipe = DeidRecipe("examples/deid/deid.dicom-group")
recipe.deid
OrderedDict([('format', 'dicom'),
             ('values',
              OrderedDict([('cookie_names',
                            [{'action': 'SPLIT',
                              'field': 'PatientID',
                              'value': 'by="^";minlength=4'}]),
                           ('operator_names',
                            [{'action': 'FIELD',
                              'field': 'startswith:Operator'}])])),
             ('fields',
              OrderedDict([('instance_fields',
                            [{'action': 'FIELD',
                              'field': 'contains:Instance'}])])),
             ('header',
              [{'action': 'ADD',
                'field': 'PatientIdentityRemoved',
                'value': 'Yes'},
               {'action': 'REPLACE',
                'field': 'values:cookie_names',
                'value': 'var:id'},
               {'action': 'REPLACE',
                'field': 'values:operator_names',
                'value': 'var:source_id'}])])