Data
To run these examples, you’ll need to install external deid-data.
$ pip install deid-data
Loading
While they are different file organizations for dicom, we are going to take a simple
approach of assuming some top level directory with some number of files within
(yes, including subdirectories). For example, if you retrieved your data using a
tool like dcmqr with a
C-MOVE
, then you might have a flat directory structure. Sometimes the
files won’t have an extension (for example, being named by a SOPInstanceUID
.
tree deid/data/dicom-cookies/
deid/data/dicom-cookies/
├── image1.dcm
├── image2.dcm
├── image3.dcm
├── image4.dcm
├── image5.dcm
├── image6.dcm
└── image7.dcm
It doesn’t actually matter so much how your data is structured,
you can use any method that you like to. You could technically
just use os.listdir
or glob
:
from glob import glob
import os
base = "deid/data/dicom-cookies"
dicom_files = glob("%s/*" %base)
['deid/data/cookie-series/image4.dcm',
'deid/data/cookie-series/image2.dcm',
'deid/data/cookie-series/image7.dcm',
'deid/data/cookie-series/image6.dcm',
'deid/data/cookie-series/image3.dcm',
'deid/data/cookie-series/image1.dcm',
'deid/data/cookie-series/image5.dcm']
os.listdir(base)
['image4.dcm',
'image2.dcm',
'image7.dcm',
'image6.dcm',
'image3.dcm',
'image1.dcm',
'image5.dcm']
Notice anything that might trigger a bug with the above? You probably should ask for an absolute path.
# For glob
dicom_files = glob("%s/*" %base)
dicom_files = [os.path.abspath(x) for x in dicom_files]
# For os module
dicom_files = []
for root, folders, files in os.walk(base):
for file in files:
fullpath = os.path.abspath(os.path.join(root,file))
dicom_files.append(fullpath)
We provide a few more robust functions to find datasets, because it’s usually the case that you want to match a pattern of file, have subfolders, or want a validation done to be sure that each file is dicom.
Find Datasets
The function that we have provided will find all datasets matching some pattern (or all files recursively in a folder). You simply need to provide a list of top folders, a list of files and folders, or just files to start. For the purposes of this walkthrough, we will load data folders that are provided with the application.
from deid.data import get_dataset
base = get_dataset("dicom-cookies")
base
'/home/vanessa/anaconda3/lib/python3.5/site-packages/som-0.1.1-py3.5.egg/som/data/dicom-cookies'
In the above, all we’ve done it retrieved the full path for a folder of dicom files. Let’s try to read in the data:
from deid.dicom import get_files
dicom_files = list(get_files(base))
DEBUG Found 7 contender files in dicom-cookies
DEBUG Checking 7 dicom files for validation.
Found 7 valid dicom files
We can also specify to not do the check, if we are absolutely sure. For larger datasets this might speed up processing a little bit.
dicom_files = list(get_files(base,check=False))
DEBUG Found 7 contender files in dicom-cookies
We can also give it a particular pattern to match. Since these files all end with
.dcm
, that’s not so useful. Let’s give a pattern to just match image1.dcm
:
dicom_files = list(get_files(base,pattern="image1*"))
DEBUG Found 1 contender files in dicom-cookies
DEBUG Checking 1 dicom files for validation.
Found 1 valid dicom files
At this point, you should have a list of dicom files. You might now want to configure your deidentifation.