

To run these examples, you’ll need to install external deid-data.

$ pip install deid-data


While they are different file organizations for dicom, we are going to take a simple approach of assuming some top level directory with some number of files within (yes, including subdirectories). For example, if you retrieved your data using a tool like dcmqr with a C-MOVE, then you might have a flat directory structure. Sometimes the files won’t have an extension (for example, being named by a SOPInstanceUID.

tree deid/data/dicom-cookies/
├── image1.dcm
├── image2.dcm
├── image3.dcm
├── image4.dcm
├── image5.dcm
├── image6.dcm
└── image7.dcm

It doesn’t actually matter so much how your data is structured, you can use any method that you like to. You could technically just use os.listdir or glob:

from glob import glob
import os

base = "deid/data/dicom-cookies"

dicom_files = glob("%s/*" %base)


Notice anything that might trigger a bug with the above? You probably should ask for an absolute path.

# For glob
dicom_files = glob("%s/*" %base)
dicom_files = [os.path.abspath(x) for x in dicom_files]

# For os module
dicom_files = []
for root, folders, files in os.walk(base):
    for file in files:
        fullpath = os.path.abspath(os.path.join(root,file))

We provide a few more robust functions to find datasets, because it’s usually the case that you want to match a pattern of file, have subfolders, or want a validation done to be sure that each file is dicom.

Find Datasets

The function that we have provided will find all datasets matching some pattern (or all files recursively in a folder). You simply need to provide a list of top folders, a list of files and folders, or just files to start. For the purposes of this walkthrough, we will load data folders that are provided with the application.

from import get_dataset

base = get_dataset("dicom-cookies")

In the above, all we’ve done it retrieved the full path for a folder of dicom files. Let’s try to read in the data:

from deid.dicom import get_files

dicom_files = list(get_files(base))
DEBUG Found 7 contender files in dicom-cookies
DEBUG Checking 7 dicom files for validation.
Found 7 valid dicom files

We can also specify to not do the check, if we are absolutely sure. For larger datasets this might speed up processing a little bit.

dicom_files = list(get_files(base,check=False))
DEBUG Found 7 contender files in dicom-cookies

We can also give it a particular pattern to match. Since these files all end with .dcm, that’s not so useful. Let’s give a pattern to just match image1.dcm:

dicom_files = list(get_files(base,pattern="image1*"))
DEBUG Found 1 contender files in dicom-cookies
DEBUG Checking 1 dicom files for validation.
Found 1 valid dicom files

At this point, you should have a list of dicom files. You might now want to configure your deidentifation.