Here is a quick example of using the deid executable. For more information on this client, see the user docs page.
Data
To run these examples, you’ll need to install external deid-data.
$ pip install deid-data
Deid Executable
The deid executable is installed automatically with the module. Just running deid
we see:
usage: deid [-h] [--version] [--quiet] [--debug] [--outfolder OUTFOLDER]
[--format {dicom}] [--overwrite] [--deid DEID]
{inspect,identifiers} ...
Deid (de-identification, anonymization) command line tool.
optional arguments:
-h, --help show this help message and exit
--version, -v show deid software version
--quiet, -q Quiet the verbose output
--debug use verbose logging to debug.
--outfolder OUTFOLDER, -o OUTFOLDER
full path to save output, will use temporary folder if
not specified
--format {dicom}, -f {dicom}
format of images, default is dicom
--overwrite overwrite pre-existing files in output directory, if
they exist.
actions:
actions for deid to perform
{inspect,identifiers}
action for deid to perform
inspect various checks for PHI and quality
identifiers extract and replace identifiers from headers
What we want to do is inspect:
usage: deid inspect [-h] [--deid DEID] [--save] folder [folder ...]
positional arguments:
folder input folder or single image. If not provided, test data will
be used.
optional arguments:
-h, --help show this help message and exit
--deid DEID deid file with preferences, if not specified, default used.
--save, -s save result to output tab separated file.
Let’s run the command with test data (dicom cookies) and specify the deid in our examples folder:
deid inspect --deid examples/deid deid/data/dicom-cookies
Found 7 valid dicom files
FLAGGED image6.dcm in section dangerouscookie
LABEL: LABEL Criteria for Dangerous Cookie
CRITERIA: PatientSex contains M and OperatorsName notequals bold bread
FLAGGED image5.dcm in section dangerouscookie
LABEL: LABEL Criteria for Dangerous Cookie
CRITERIA: PatientSex contains M and OperatorsName notequals bold bread
SUMMARY ================================
CLEAN 5 files
FLAGGED dangerouscookie 2 files
You will see an output, and then a summary of file lists for each of clean and flagged.
If you want to run the above and save the result to file:
deid inspect --deid examples/deid deid/data/dicom-cookies --save
...
SUMMARY ================================
CLEAN 5 files
FLAGGED dangerouscookie 2 files
Result written to pixel-flag-results-dicom-cookies-17-09-02.tsv
and the file looks like this - images with OperatorsName notequals “bold bread” and PatientSex “M” are flagged:
dicom_file pixels_flagged flag_list reason
deid/data/dicom-cookies/image4.dcm,CLEAN
deid/data/dicom-cookies/image2.dcm,CLEAN
deid/data/dicom-cookies/image7.dcm,CLEAN
deid/data/dicom-cookies/image3.dcm,CLEAN
deid/data/dicom-cookies/image1.dcm,CLEAN
deid/data/dicom-cookies/image1.dcm,FLAGGED dangerouscookie PatientSex contains M and OperatorsName notequals bold bread
deid/data/dicom-cookies/image1.dcm,FLAGGED dangerouscookie PatientSex contains M and OperatorsName notequals bold bread
Within Python
First, let’s load the example “dicom cookies” dataset. We will first run this example within python, and then using a command line client (not written yet).
from deid.data import get_dataset
from deid.dicom import get_files
dicom_files = list(get_files(get_dataset('dicom-cookies')))
DEBUG Found 7 contender files in dicom-cookies
DEBUG Checking 7 dicom files for validation.
Found 7 valid dicom files
['/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image4.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image2.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image7.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image6.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image3.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image1.dcm',
'/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image5.dcm']
Next, let’s use the example deid specification file that is relevant to this dataset. We read it in like this:
from deid.config import load_deid
# From the base of the deid repo
deid = load_deid('examples/deid')
DEBUG FORMAT set to dicom
DEBUG Adding section filter dangerouscookie
DEBUG Adding section header
DEBUG Adding ADD PatientIdentityRemoved YES
DEBUG Adding REPLACE PatientID var:id
DEBUG Adding REPLACE SOPInstanceUID var:source_id
and the file we are reading looks like this. It’s very intuitive, we have groups of filters (more specific at the top and moving down to more general) and each is named (“dangerouscookie” and “bigimage”). Within each filter we have one criteria group, with a “+” indicating and. We could have more groups under each, but happen to not for this example.
FORMAT dicom
%filter dangerouscookie
LABEL Criteria for Dangerous Cookie
contains PatientSex M
+ notequals OperatorsName bold bread
coordinates 0,0,512,110
%filter bigimage
LABEL Image Size Good for Machine Learning
equals Rows 2048
+ equals Columns 1536
coordinates 0,0,512,200
%header
ADD PatientIdentityRemoved YES
REPLACE PatientID var:id
REPLACE SOPInstanceUID var:source_id
We won’t be using the header section for this example, but for your FYI, this is the recipe for how we would want to replace information in the header, if we were cleaning the headers. Right now we are just filtering images to flag those that might have PHI. Let’s very strictly walk through the logic that will be taken above:
- If the header contains field PatientSex “M” (Male), and OperatorsName is not “bold bread,” we flag. Otherwise, keep going.
- If the header has field Rows 2048 and Columns 1536 we flag.
The flag that is done first (more specific) is the final decision. This means that you should have your known coordinates of PHI (eg, specific modality, manufacturer, etc) first, and followed by more general estimates of PHI. Likely a later group will create flags for more manual inspection.
Now let’s run the filter! First just within python:
from deid.dicom import has_burned_pixels
groups = has_burned_pixels(dicom_files=dicom_files, deid='examples/deid')
We immediately see that two are flagged:
FLAGGED image6.dcm in section dangerouscookie
LABEL: LABEL Criteria for Dangerous Cookie
CRITERIA: PatientSex contains M and OperatorsName notequals bold bread
FLAGGED image5.dcm in section dangerouscookie
LABEL: LABEL Criteria for Dangerous Cookie
CRITERIA: PatientSex contains M and OperatorsName notequals bold bread
Is this accurate?
for dicom_file in dicom_files:
dicom = read_file(dicom_file)
print("%s:%s - %s" %(os.path.basename(dicom_file),
dicom.OperatorsName,
dicom.PatientSex))
image4.dcm:bold bread - M
image2.dcm:lingering hill - F
image7.dcm:sweet brook - F
image6.dcm:green paper - M <--- FLAGGED
image3.dcm:nameless voice - F
image1.dcm:fragrant pond - F
image5.dcm:curly darkness - M <--- FLAGGED
Seems to be! The data structure returned gives us programmatic access to the groups, including list of clean (top), list of flagged and flag list name (flagged) and given flagged, a lookup dictionary with reasons:
{
"clean":[
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image4.dcm",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image2.dcm",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image7.dcm",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image3.dcm",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image1.dcm"
],
"flagged":{
"dangerouscookie":[
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image6.dcm",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image5.dcm"
]
},
"reason":{
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image5.dcm":" PatientSex contains M and OperatorsName notequals bold bread",
"/home/vanessa/Documents/Dropbox/Code/dicom/deid/deid/data/dicom-cookies/image6.dcm":" PatientSex contains M and OperatorsName notequals bold bread"
}
}