Data
To run these examples, you’ll need to install external deid-data.
$ pip install deid-data
Get Identifiers
A get request using the deid module will return a data structure with headers found in a particular dataset. Let’s walk through these steps. As we did in the loading, the first step was to load a dicom dataset:
from deid.data import get_dataset
from deid.dicom import get_files
base = get_dataset("dicom-cookies")
dicom_files = list(get_files(base))
We now have our small dataset that we want to de-identify! The first step is to get the identifiers. By default, we will return all of them. That call will look like this:
from deid.dicom import get_identifiers
ids = get_identifiers(dicom_files)
You’ll get back a dictionary(indexed by the file name) for each dicom file. Within each entry, the value is another dictionary with an expanded string of the tag. For example:
ids[dicom_files[0]]
{'(0008, 0005)': (0008, 0005) Specific Character Set CS: 'ISO_IR 100' [SpecificCharacterSet],
'(0008, 0016)': (0008, 0016) SOP Class UID UI: Secondary Capture Image Storage [SOPClassUID],
'(0008, 0018)': (0008, 0018) SOP Instance UID UI: 1.2.276.0.7230010.3.1.4.8323329.5329.1495927169.580351 [SOPInstanceUID],
'(0008, 0020)': (0008, 0020) Study Date DA: '20131210' [StudyDate],
'(0008, 0030)': (0008, 0030) Study Time TM: '191929' [StudyTime],
'(0008, 0050)': (0008, 0050) Accession Number SH: '' [AccessionNumber],
'(0008, 0064)': (0008, 0064) Conversion Type CS: 'WSD' [ConversionType],
'(0008, 0080)': (0008, 0080) Institution Name LO: 'STANFORD' [InstitutionName],
'(0008, 0090)': (0008, 0090) Referring Physician's Name PN: 'Dr. solitary heart' [ReferringPhysicianName],
'(0008, 1060)': (0008, 1060) Name of Physician(s) Reading Study PN: 'Dr. lively wind' [NameOfPhysiciansReadingStudy],
'(0008, 1070)': (0008, 1070) Operators' Name PN: 'curly darkness' [OperatorsName],
'(0010, 0010)': (0010, 0010) Patient's Name PN: 'falling disk' [PatientName],
'(0010, 0020)': (0010, 0020) Patient ID LO: 'cookie-47' [PatientID],
'(0010, 0030)': (0010, 0030) Patient's Birth Date DA: '' [PatientBirthDate],
'(0010, 0040)': (0010, 0040) Patient's Sex CS: 'M' [PatientSex],
'(0020, 000d)': (0020, 000d) Study Instance UID UI: 1.2.276.0.7230010.3.1.2.8323329.5329.1495927169.580350 [StudyInstanceUID],
'(0020, 000e)': (0020, 000e) Series Instance UID UI: 1.2.276.0.7230010.3.1.3.8323329.5329.1495927169.580349 [SeriesInstanceUID],
'(0020, 0010)': (0020, 0010) Study ID SH: '' [StudyID],
'(0020, 0011)': (0020, 0011) Series Number IS: '' [SeriesNumber],
'(0020, 0013)': (0020, 0013) Instance Number IS: '' [InstanceNumber],
'(0020, 0020)': (0020, 0020) Patient Orientation CS: '' [PatientOrientation],
'(0020, 4000)': (0020, 4000) Image Comments LT: 'This is a cookie tumor dataset for testing dicom tools.' [ImageComments],
'(0028, 0002)': (0028, 0002) Samples per Pixel US: 3 [SamplesPerPixel],
'(0028, 0004)': (0028, 0004) Photometric Interpretation CS: 'YBR_FULL_422' [PhotometricInterpretation],
'(0028, 0006)': (0028, 0006) Planar Configuration US: 0 [PlanarConfiguration],
'(0028, 0010)': (0028, 0010) Rows US: 1536 [Rows],
'(0028, 0011)': (0028, 0011) Columns US: 2048 [Columns],
'(0028, 0100)': (0028, 0100) Bits Allocated US: 8 [BitsAllocated],
'(0028, 0101)': (0028, 0101) Bits Stored US: 8 [BitsStored],
'(0028, 0102)': (0028, 0102) High Bit US: 7 [HighBit],
'(0028, 0103)': (0028, 0103) Pixel Representation US: 0 [PixelRepresentation],
'(0028, 2110)': (0028, 2110) Lossy Image Compression CS: '01' [LossyImageCompression],
'(0028, 2114)': (0028, 2114) Lossy Image Compression Method CS: 'ISO_10918_1' [LossyImageCompressionMethod],
'(7fe0, 0010)': (7fe0, 0010) Pixel Data OB: Array of 652494 bytes [PixelData]}
If there is a nested tag, you’ll see it with the format (7fe0, 0010)__(0080, 0012)
. If there
is a nested sequence, you’ll see the index provided in that same format. For example,
(7fe0, 0010)__0__(0080, 0012)
counts as the first element of a sequence,
and (7fe0, 0010)__1__(0080, 0012)
the second. We start counting at 0, we aren’t barbarians!
DicomField
The content of each field is a DicomField, which carries with it the dicom tag (string), name (string), and the actual element for further parsing. For example:
field = ids[dicom_files[0]]['(0010, 0010)']
field.element
(0010, 0010) Patient's Name PN: 'falling disk'
field.name
'PatientName'
field.uid
'(0010, 0010)'
The field.element is what you would get if you indexed the dicom Dataset
at dicom.get(“PatientName”). The name refers to the keyword (which, if there
is nesting, will include that. For example, a Sequence with header value AdditionalData
and item Modality
will be returned as AdditionalData_Modality
,
and this name string is used to help with filters. The uid would also
include the index of the sequence, since we use it to index into the
Dataset.
Next Steps
The get_identifiers
function is an easy way to quickly extract (in bulk) multiple
identifiers for inspection, across a lot of files. You might be writing or developing
a recipe, and need easy access to all these fields. What should you do next?
At this point, you have a few options:
Recipe Interaction
If you want to write a recipe to perform a bunch of custom actions on your dicom files, you should read about how to work with recipes.
Clean Pixels
It’s likely that the pixels in the images have burned in annotations, and we can use the header data to flag these images. Thus, before you replace identifiers, you probably want to do this. We have a DicomCleaner class that can flag images for PHI based on matching some header filter criteria, and you can read about that here.
Update Identifiers
Once you are finished with any customization of the recipe, updating identifiers, and/or potentially flagging and quarantining images that have PHI, you should be ready to replace (PUT) with new fields based on the deid recipe.