deid

As you recall from the configuration notes page, the deid recipe allows you to configure both cleaning of pixels and changing header values. This document will cover the first, how to write and apply filters to clean images.

Filters

You might want to flag images based on their header values. For example, “I have a directory of dicom images, and I want to find those with Modality as CT. You can create one or more filter groups to do this. If you create more than one, an image is attributed to the filter group that comes first in the deid recipe file (indicating higher priority). For example, an image that would be flagged with a general Blacklist criteria that is first flagged with Greylist (meaning we know how to clean it) belongs to Greylist. We check higher priority first to be computationally more efficient, because we can stop checking the image when we hit the first criteria flag.

Filter Groups

While you are free to define your own groups and criteria, we provide a default deid that has the following levels defined within:

Filter Example

I’ll again show you the previous example, but give more detail this time. The start of a filter might look like this:

FORMAT dicom

%filter graylist

LABEL CT Dose Series
  contains Modality CT
  + contains Manufacturer GE
  + contains CodeMeaning IEC Body Dosimetry Phantom
  coordinates 0,0,512,200

LABEL Dose Report
  contains Modality CT
  + contains Manufacturer GE
  + contains SeriesDescription Dose Report
  coordinates 0,0,512,110

%filter blacklist

LABEL Burned In Annotation
contains ImageType SAVE
contains SeriesDescription SAVE
contains BurnedInAnnotation YES
empty ImageType
empty DateOfSecondaryCapture
empty SecondaryCaptureDeviceManufacturer
empty SecondaryCaptureDeviceManufacturerModelName
empty SecondaryCaptureDeviceSoftwareVersions

Each section is indicated by %filter, and within sections, a set of criteria are defined under a LABEL. The formatting of this is inspired by both CTP and my early work with Singularity Containers, which is based on RPM.

How are images filtered?

You can imagine an image starting at the top of the file, and moving down line by line. If at any point it doesn’t pass criteria, it is flagged and placed with the group, and no further checking is done. For this purpose, the sections are ordered by their specificity and preference. This means that, for the above, by placing blacklist after graylist we are saying that an image that could be flagged to be both in the blacklist and graylist will hit the graylist first. This is logical because the graylist corresponds to a specific set of image header criteria for which we know how to clean. We only resort to general blacklist criteria if we make it far enough and haven’t been convinced that there isn’t PHI.

How do I read a criteria?

Each filter section criteria starts with LABEL. this is an identifier to report to the user given that the flag goes off. Each criteria then has the following format:

<criteria> <field> <value>

where “criteria” is a test such as “contains”, “notcontains”, “equals”, “notequals”, “missing”, “present” or “empty”; “field” is the single-word name of a DICOM tag (as known to pydicom), or the number as 8-digit hex format with 0x prefix; and “value” is optional, depending on the filter. For example:

LABEL Burned In Annotation
contains ImageType SAVE

Reads “flag the image with a Burned In Annotation, which belongs to the blacklist filter, if the ImageType fields contains SAVE.” If you want to do an “and” statement across two fields, just use +:

contains ImageType SAVE
  + contains Manufacturer GE

“flag the image if the ImageType contains SAVE AND Manufacturer contains GE. To do an “or”

contains ImageType SAVE
  || contains Manufacturer GE

“flag the image if the ImageType contains SAVE OR Manufacturer contains GE.

And you can have multiple criteria for one LABEL

contains SeriesDescription SAVE
+ contains BurnedInAnnotation YES
empty ImageType
|| empty DateOfSecondaryCapture

And to make it even simpler, if you want to check one field for a value a OR b, you can use regular expressions. The following checks ImageType for “CT” OR “MRI”

equals ImageType CT|MRI

which is equivalent to:

equals ImageType CT
|| equals ImageType MRI

What if you want to evaluate an inner statement? Eg: “flag the image if Criteria 1 and (Criteria 2 OR Criteria 3)? The inner parentheses would need to be evaluated first. You would represent the content of the inner parentheses (Criteria 1 or Criteria 2) on the same line:

LABEL Ct Dose Series
  contains Criteria1
  + contains Criteria2 Value2 || contains Criteria3 Value3
  coordinates 0,0,512,200

What are the criteria options?

For all of the below, case does not matter. All fields are changed to lowercase before comparison, and stripped of leading and trailing white spaces.

How do I customize the process?

There are several things you can customize!