pydicom quick start#
Welcome to the quick start guide for pydicom.
pydicom is an MIT licensed, open source Python library for
creating, reading, modifying and writing DICOM
Data Sets (datasets for short) and
File-sets. It can also convert
the imaging and waveform data in certain dataset types to a NumPy
ndarray
(and back again), as long as suitable optional packages are installed.
What is a DICOM dataset?#
A DICOM dataset represents an instance of a real world object, such as a single image slice from a CT scan acquisition. Each dataset is made up of a collection of Data Elements, with each Data Element representing an attribute of the object. A Data Element is itself made of a unique identifier called the Element Tag, has a format specifier called the Value Representation and contains the Value of the attribute. The DICOM Standard groups Data Elements that describe related attributes into modules.
In Part 3, the DICOM Standard defines the many different types of dataset using something called an Information Object Definition (IOD). Each IOD contains a table of optional (U) and mandatory (M) modules that a dataset must have in order to meet that definition. This means you can use the IOD that corresponds to a given dataset to determine which Data Elements it should contain.
As an example, the CT Image IOD contains this table with the modules that are required for a dataset to be considered a valid CT Image instance. This includes the Patient module, which contains patient demographic information. If we look at the Patient module itself, we see that it contains attributes for the Patient’s Name, Patient ID and Patient’s Birth Date, all of which are considered Type 2.
Type 2 attributes must be present, but may have an empty value, so in any given CT Image dataset we should be able to find three Data Elements corresponding to those attributes, albeit with no guarantee they’ll have a useful value.
Reading a dataset#
Note
We’re going to be using example DICOM datasets that are included with
pydicom, such as CT_small.dcm.
You can get the local file path to these datasets by using the get_path()
function to return the path as a pathlib.Path
(your path may vary):
>>> from pydicom import examples
>>> path = examples.get_path("ct")
>>> path
PosixPath('/path/to/pydicom/data/test_files/CT_small.dcm')
When using pydicom to read your own data, use the path to those files directly instead.
To read the DICOM dataset at a given file path (as a str
or pathlib.Path
)
we use dcmread()
, which returns a
FileDataset
instance:
>>> from pydicom import dcmread, examples
>>> path = examples.get_path("ct")
>>> ds = dcmread(path)
dcmread()
can also handle file-likes:
>>> with open(path, 'rb') as f:
... ds = dcmread(f)
And can be used as a context manager:
>>> with dcmread(path) as ds:
... type(ds)
...
<class 'pydicom.dataset.FileDataset'>
By default, dcmread()
will read any DICOM dataset
stored in accordance with the DICOM File Format.
However, you may occasionally read a file that gives you the following
exception:
>>> no_meta_path = examples.get_path('no_meta')
>>> ds = dcmread(no_meta_path)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pydicom/filereader.py", line 887, in dcmread
force=force, specific_tags=specific_tags)
File ".../pydicom/filereader.py", line 678, in read_partial
preamble = read_preamble(fileobj, force)
File ".../pydicom/filereader.py", line 631, in read_preamble
raise InvalidDicomError("File is missing DICOM File Meta Information "
pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information
header or the 'DICM' prefix is missing from the header. Use force=True to
force reading.
This indicates that either:
The file isn’t a DICOM file, or
The file contains DICOM data but isn’t in the DICOM File Format
If you’re sure the file contains DICOM data, you can use the force keyword parameter to force reading:
>>> ds = dcmread(no_meta_path, force=True)
A note of caution about using force=True
; because pydicom uses a
deferred-read system, no exceptions will be raised at the time of reading,
no matter what the contents of the file are:
>>> with open('not_dicom.txt', 'w') as not_dicom:
... not_dicom.write('This is not a DICOM file!')
...
>>> ds = dcmread('not_dicom.txt', force=True)
You’ll only run into problems when trying to use the dataset:
>>> print(ds)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "../pydicom/dataset.py", line 1703, in __str__
return self._pretty_str()
File "../pydicom/dataset.py", line 1436, in _pretty_str
for data_element in self:
File "../pydicom/dataset.py", line 1079, in __iter__
yield self[tag]
File "../pydicom/dataset.py", line 833, in __getitem__
self[tag] = DataElement_from_raw(data_elem, character_set)
File "../pydicom/dataelem.py", line 581, in DataElement_from_raw
raise KeyError(msg)
KeyError: "Unknown DICOM tag (6854,7369) can't look up VR"
Viewing and accessing#
The CT_small.dcm
dataset is also included as an example FileDataset
:
>>> from pydicom import examples
>>> ds = examples.ct
>>> type(ds)
<class 'pydicom.dataset.FileDataset'>
>>> ds.filename
'/path/to/pydicom/data/test_files/CT_small.dcm'
You can view the contents of the entire dataset by using print()
:
>>> print(ds)
Dataset.file_meta -------------------------------
(0002,0000) File Meta Information Group Length UL: 192
(0002,0001) File Meta Information Version OB: b'\x00\x01'
(0002,0002) Media Storage SOP Class UID UI: CT Image Storage
(0002,0003) Media Storage SOP Instance UID UI: 1.3.6.1.4.1.5962.1.1.1.1.1.20040119072730.12322
(0002,0010) Transfer Syntax UID UI: Explicit VR Little Endian
(0002,0012) Implementation Class UID UI: 1.3.6.1.4.1.5962.2
(0002,0013) Implementation Version Name SH: 'DCTOOL100'
(0002,0016) Source Application Entity Title AE: 'CLUNIE1'
-------------------------------------------------
(0008,0005) Specific Character Set CS: 'ISO_IR 100'
(0008,0008) Image Type CS: ['ORIGINAL', 'PRIMARY', 'AXIAL']
(0008,0012) Instance Creation Date DA: '20040119'
(0008,0013) Instance Creation Time TM: '072731'
(0008,0014) Instance Creator UID UI: 1.3.6.1.4.1.5962.3
(0008,0016) SOP Class UID UI: CT Image Storage
...
(0010,1002) Other Patient IDs Sequence 2 item(s) ----
(0010,0020) Patient ID LO: 'ABCD1234'
(0010,0022) Type of Patient ID CS: 'TEXT'
---------
(0010,0020) Patient ID LO: '1234ABCD'
(0010,0022) Type of Patient ID CS: 'TEXT'
---------
...
(0043,0010) Private Creator LO: 'GEMS_PARM_01'
(0043,1010) [Window value] US: 400
...
(7FE0,0010) Pixel Data OW: Array of 32768 elements
(FFFC,FFFC) Data Set Trailing Padding OB: Array of 126 elements
The print output shows a list of the Data Elements (or elements for short) present in the dataset, one element per line. The format of each line is:
(0008,0005): The element’s tag, as (group number, element number) in hexadecimal
Specific Character Set: the element’s name, if known
CS: The element’s Value Representation (VR), if known
‘ISO_IR_100’: the element’s stored value, or the length of the value if it’s too long to show concisely
Elements#
There are three categories of elements:
Standard elements such as (0008,0016) SOP Class UID. These elements are registered in Part 6 of the official DICOM Standard, have a tag with an even group number and are unique at each level of the dataset.
Repeating group elements such as (60xx,3000) Overlay Data (not found in this dataset). Repeating group elements are also registered in the official DICOM Standard, however they have a tag with a group number defined over a range rather than a fixed value. For example, there may be multiple Overlay Data elements at a given level of the dataset as long as each has its own unique group number;
0x6000
,0x6002
,0x6004
, or any even value up to0x601E
.Private elements such as (0043,1010) [Window value]. Private elements have a tag with an odd group number, aren’t registered in the official DICOM Standard, and are instead created privately, as specified by the (gggg,0010-00FF) Private Creator element.
If the private creator is unknown to pydicom then the element name will be Private tag data and the VR UN.
If the private creator is known then the element name will be surrounded by square brackets, e.g. [Window value] and the VR will be shown.
For all element categories, we can access a particular element in the dataset
through its tag, which returns a DataElement
instance:
>>> elem = ds[0x0008, 0x0016]
>>> elem
(0008,0016) SOP Class UID UI: CT Image Storage
>>> elem.tag
(0008,0016)
>>> elem.keyword
'SOPClassUID'
>>> private_elem = ds[0x0043, 0x1010]
>>> private_elem
(0043,1010) [Window value] US: 400
>>> private_elem.keyword
''
We can also access standard elements through their keyword. The keyword is usually the same as the element’s name without any spaces, but there are exceptions - such as (0010,0010) Patient’s Name having a keyword of PatientName. A list of keywords for all standard elements can be found here.
>>> elem = ds['SOPClassUID']
>>> elem
(0008,0016) SOP Class UID UI: CT Image Storage
Because of the lack of a unique keyword, this won’t work for private or
repeating group elements. So for those elements stick to the
Dataset[group number, element number]
method.
In most cases, the important thing about an element is its value:
>>> elem.value
'1.2.840.10008.5.1.4.1.1.2'
For standard elements, you can use the Python dot notation with the keyword to get the value:
>>> ds.SOPClassUID
'1.2.840.10008.5.1.4.1.1.2'
This is the recommended method of accessing the value of standard elements. It’s simpler and more human-friendly then dealing with element tags and later on you’ll see how you can use the keyword to do far more than just accessing the value.
Elements may also be multi-valued - that is, have a Value Multiplicity (VM) > 1:
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'AXIAL']
>>> ds['ImageType'].VM
3
The items for multi-valued elements can be accessed using the standard Python
list
methods:
>>> ds.ImageType[1]
'PRIMARY'
Sequences#
When viewing a dataset, you may see that some of the elements are indented:
>>> print(ds)
...
(0010,1002) Other Patient IDs Sequence 2 item(s) ----
(0010,0020) Patient ID LO: 'ABCD1234'
(0010,0022) Type of Patient ID CS: 'TEXT'
---------
(0010,0020) Patient ID LO: '1234ABCD'
(0010,0022) Type of Patient ID CS: 'TEXT'
---------
...
This indicates that those elements are part of a sequence, in this case part of the Other Patient IDs Sequence element. Sequence elements have a VR of SQ and have a name that ends in the word Sequence. DICOM datasets use the tree data structure, with non-sequence elements acting as leaves and sequence elements acting as the nodes where branches start.
The top-level (root) dataset contains 0 or more elements:
An element may be non-sequence type; its VR is not SQ (leaf), or
An element may be a sequence type; its VR is SQ and it contains 0 or more items (branches):
Each item in the sequence is another dataset, containing 0 or more elements:
An element may be non-sequence type, or
An element may be a sequence type, and so on…
Sequence elements can be accessed in the same manner as non-sequence ones:
>>> elem = ds[0x0010, 0x1002]
>>> elem = ds['OtherPatientIDsSequence']
The main difference between sequence and non-sequence elements is that their value is
a list-like object containing zero or more Dataset
instances,
which can be accessed using the standard Python list
methods:
>>> len(ds.OtherPatientIDsSequence)
2
>>> type(ds.OtherPatientIDsSequence[0])
<class 'pydicom.dataset.Dataset'>
>>> ds.OtherPatientIDsSequence[0]
(0010,0020) Patient ID LO: 'ABCD1234'
(0010,0022) Type of Patient ID CS: 'TEXT'
>>> ds.OtherPatientIDsSequence[1]
(0010,0020) Patient ID LO: '1234ABCD'
(0010,0022) Type of Patient ID CS: 'TEXT'
Dataset.file_meta#
Earlier we saw that by default dcmread()
only reads
files that are in the DICOM File Format. So what’s the
difference between a DICOM dataset written to file and one written in the DICOM File Format?
The answer is a file header containing:
An 128 byte preamble:
>>> ds.preamble b'II*\x00T\x18\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...
Followed by a 4 byte
DICM
prefixFollowed by the required DICOM File Meta Information elements, which in pydicom are stored in a
FileMetaDataset
instance in thefile_meta
attribute:>>> ds.file_meta (0002,0000) File Meta Information Group Length UL: 192 (0002,0001) File Meta Information Version OB: b'\x00\x01' (0002,0002) Media Storage SOP Class UID UI: CT Image Storage (0002,0003) Media Storage SOP Instance UID UI: 1.3.6.1.4.1.5962.1.1.1.1.1.20040119072730.12322 (0002,0010) Transfer Syntax UID UI: Explicit VR Little Endian (0002,0012) Implementation Class UID UI: 1.3.6.1.4.1.5962.2 (0002,0013) Implementation Version Name SH: 'DCTOOL100' (0002,0016) Source Application Entity Title AE: 'CLUNIE1'
As you can see, all the elements in the file_meta
have tags with a group number of
0x0002
. In fact, the DICOM File Format header is the only place you should find group
0x0002
elements as their presence anywhere else is non-conformant.
Out of all of the elements in the file_meta
, the most important is
(0002,0010) Transfer Syntax UID, as the transfer syntax defines the way the
entire dataset (including the pixel data) has been encoded. Chances are
that at some point you’ll need to know it:
>>> ds.file_meta.TransferSyntaxUID
'1.2.840.10008.1.2.1'
>>> ds.file_meta.TransferSyntaxUID.name
'Explicit VR Little Endian'
>>> ds.file_meta.TransferSyntaxUID.keyword
'ExplicitVRLittleEndian'
Modifying a dataset#
Modifying elements#
We can modify the value of any element by retrieving it and setting the value:
>>> elem = ds[0x0010, 0x0010]
>>> elem.value
'CompressedSamples^CT1'
>>> elem.value = 'Citizen^Jan'
>>> elem
(0010,0010) Patient's Name PN: 'Citizen^Jan'
Which raises the question; what kind of value should be used to set an element’s value?
In the above example we used a str
to set the Patient’s Name, but what about
for other elements? Should they all be strings too? (Hint: no).
The allowed object type to use for an element’s value depends on its Value Representation. We can see from the above that Patient’s Name has a VR of PN. By checking the Element VR and Python types guide, we see that elements with a VR of PN can be set using:
None
,str
orPersonName
if the Value Multiplicity (VM) is 1, orlist[str]
orlist[PersonName]
for VM > 1.
Each standard element also has restrictions on its allowed VM, given in Part 6. For Patient’s Name the VM must always be 1, so the
allowed types are None
, str
or PersonName
. If instead we look up
(0018,106C) Synchronization Channel, we see the VR is US and the allowed VM 2,
so using the Element VR and Python types guide, we see the only type that may be
used is list[int]
.
For standard elements it’s simpler to use the keyword to set the value:
>>> ds.PatientName = 'Citizen^Snips'
>>> elem
(0010,0010) Patient's Name PN: 'Citizen^Snips'
Multi-valued elements can be set using a list
or modified using the
list
methods:
>>> ds.ImageType = ['ORIGINAL', 'PRIMARY', 'LOCALIZER'] # VR 'CS'
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'LOCALIZER']
>>> ds.ImageType[1] = 'DERIVED'
>>> ds.ImageType
['ORIGINAL', 'DERIVED', 'LOCALIZER']
>>> ds.ImageType.insert(1, 'PRIMARY')
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'DERIVED', 'LOCALIZER']
Similarly, for sequence elements:
>>> from pydicom.dataset import Dataset
>>> ds.OtherPatientIDsSequence = [Dataset(), Dataset()] # VR 'SQ'
>>> ds.OtherPatientIDsSequence.append(Dataset())
>>> len(ds.OtherPatientIDsSequence)
3
The items in a sequence are always Dataset
instances, if you
try to add any other type to a sequence you’ll get an exception:
>>> ds.OtherPatientIDsSequence.append('Hello world?')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pydicom/multival.py", line 63, in append
self._list.append(self.type_constructor(val))
File ".../pydicom/sequence.py", line 15, in validate_dataset
raise TypeError('Sequence contents must be Dataset instances.')
TypeError: Sequence contents must be Dataset instances.
You can set any element value as empty by using None
(sequence elements
will automatically be converted to an empty list when you do so):
>>> ds.PatientName = None
>>> elem
(0010,0010) Patient's Name PN: None
>>> ds.OtherPatientIDsSequence = None
>>> len(ds.OtherPatientIDsSequence)
0
Elements with a value of None
, b''
, ''
or []
will still be
written to file, but will have an empty value and zero length.
Adding elements#
Standard and repeating group elements#
New elements of any category can be added to the dataset with the
add_new()
method, which takes the tag, VR and
value to use for the new element.
Let’s say we wanted to add the (0028,1050) Window Center standard element. We
already know the tag is (0028,1050), but how we get the VR and how do we
know the Python type
to use for the value?
There are two ways to get an element’s VR:
You can use Part 6 of the DICOM Standard and search for the element
Alternatively, you can use the
dictionary_VR()
function to look it up
>>> from pydicom.datadict import dictionary_VR
>>> dictionary_VR([0x0028, 0x1050])
'DS'
As we saw earlier, you can use the Element VR and Python types guide to find the Python type to use for a given VR.
For DS with a VM of 1-n, we can use a str
, int
or float
,
or a list
of those types. So to add the new element:
>>> ds.add_new([0x0028, 0x1050], 'DS', "100.0")
>>> elem = ds[0x0028, 0x1050]
>>> elem
(0028,1050) Window Center DS: "100.0"
Some VRs, like DS, require the value be formatted correctly. For example, elements with a VR of DA should use the YYYYMMDD format and only allow ASCII characters 0 to 9 (unless used for query matching). The full list of VRs and their formatting requirements can be found in Section 6.2 of Part 5 of the DICOM Standard.
Alternative for standard elements#
Adding elements with add_new()
is a lot of
work, so for standard elements you can just use the keyword
and pydicom will do the VR lookup for you:
>>> 'WindowWidth' in ds
False
>>> ds.WindowWidth = 500
>>> ds['WindowWidth']
(0028,1051) Window Width DS: "500.0"
Notice how we can also use the element keyword with the Python
in
operator to see if a standard element is in
the dataset? This also works with element tags, so private and repeating group
elements are also covered:
>>> [0x0043, 0x1010] in ds
True
Sequences#
Because sequence items are also Dataset
instances,
you can use the same methods on them as well.
>>> seq = ds.OtherPatientIDsSequence
>>> seq += [Dataset(), Dataset(), Dataset()]
>>> seq[0].PatientID = 'Citizen^Jan'
>>> seq[0].TypeOfPatientID = 'TEXT'
>>> seq[1].PatientID = 'CompressedSamples^CT1'
>>> seq[1].TypeOfPatientID = 'TEXT'
>>> seq[0]
(0010,0020) Patient ID LO: 'Citizen^Jan'
(0010,0022) Type of Patient ID CS: 'TEXT'
>>> seq[1]
(0010,0020) Patient ID LO: 'CompressedSamples^CT1'
(0010,0022) Type of Patient ID CS: 'TEXT'
Private elements#
When adding private elements, the DICOM Standard requires
a (gggg,0010-00FF) Private Creator element also be added to identify and reserve the
gggg
section of private tags. pydicom provides the
add_new_private()
convenience method to help manage this:
>>> ds.add_new_private("Private Creator Name", 0x000B, 0x01, "my value", "SH")
>>> ds
...
(000B,0010) Private Creator LO: 'Private Creator Name'
(000B,1001) Private tag data SH: 'my value'
...
Deleting elements#
All elements can be deleted with the del
operator in combination with the element tag:
>>> del ds[0x0043, 0x1010]
>>> [0x0043, 0x1010] in ds
False
For standard elements you can use the keyword instead:
>>> del ds.WindowCenter
>>> 'WindowCenter' in ds
False
And you can remove items from sequences and multi-valued elements using your
preferred list
method:
>>> del ds.OtherPatientIDsSequence[2]
>>> len(seq)
2
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'DERIVED', 'LOCALIZER']
>>> del ds.ImageType[2]
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'LOCALIZER']
Writing a dataset#
After changing the dataset, the final step is to write the modifications back
to file. This can be done by using save_as()
to
write the dataset to the supplied path:
>>> ds.save_as('out.dcm')
You can also write to any Python file-like:
>>> with open('out.dcm', 'wb') as f:
... ds.save_as(f)
...
>>> from io import BytesIO
>>> out = BytesIO()
>>> ds.save_as(out)
By default, save_as()
will write the dataset
as-is. This means that even if your dataset is not conformant to the
DICOM File Format it will
still be written exactly as given. To be certain you’re writing the
dataset in the DICOM File Format you can use the enforce_file_format keyword
parameter:
>>> ds.save_as('out.dcm', enforce_file_format=True)
This will attempt to automatically add in any missing required group
0x0002
File Meta Information elements and set a blank 128 byte preamble (if
required). If it’s unable to do so then an exception will be raised:
>>> del ds.file_meta
>>> ds.save_as('out.dcm', enforce_file_format=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pydicom/dataset.py", line 2452, in save_as
pydicom.dcmwrite(
File ".../pydicom/filewriter.py", line 1311, in dcmwrite
validate_file_meta(file_meta, enforce_standard=True)
File ".../pydicom/dataset.py", line 3204, in validate_file_meta
raise AttributeError(
AttributeError: Required File Meta Information elements are either missing
or have an empty value: (0002,0010) Transfer Syntax UID
The exception message contains the required element(s) that need to be added, usually this will only be the Transfer Syntax UID. It’s an important element, so get in the habit of making sure it’s there and correct.
Because we deleted the file_meta
dataset
we need to add it back:
>>> from pydicom.dataset import FileMetaDataset
>>> ds.file_meta = FileMetaDataset()
And now we can add our Transfer Syntax UID element and save to file:
>>> ds.file_meta.TransferSyntaxUID = '1.2.840.10008.1.2.1'
>>> ds.save_as('out.dcm', enforce_file_format=True)
Accessing Pixel Data#
See also
We have a separate and more in-depth pixel data tutorial that also covers creation of new pixel data and compressing and decompressing existing pixel data.
Many DICOM datasets have image and image-like data available in the (7FE0,0010) Pixel Data element. If present, the data will either be available as uncompressed raw binary data or as an encapsulated and compressed image codestream, depending on the Transfer Syntax UID:
>>> ds = examples.ct
>>> ds.file_meta.TransferSyntaxUID.is_compressed # raw binary
False
>>> ds = examples.jpeg2k
>>> ds.file_meta.TransferSyntaxUID.is_compressed # encapsulated codestreams
True
As bytes
#
For datasets with an uncompressed Transfer Syntax UID, accessing the image data is
simply a matter of accessing the Pixel Data value, which will return all frames
concatenated together as bytes
:
>>> ds = examples.ct
>>> pixel_data = ds.PixelData
>>> type(pixel_data)
<class 'bytes'>
>>> len(pixel_data)
32768
For datasets with a compressed Transfer Syntax, each frame of image data will have
been encapsulated, which must be reversed. In pydicom
this can be done with the get_frame()
function or the
generate_frames()
iterator to return or yield a frame of compressed
image data as bytes
:
>>> from pydicom.encaps import get_frame
>>> ds = examples.jpeg2k
>>> len(ds.PixelData)
152326
>>> nr_frames = ds.get("NumberOfFrames", 1) # Number Of Frames may not be present
>>> frame = get_frame(ds.PixelData, 0, number_of_frames=nr_frames)
>>> len(frame)
152294
As a NumPy ndarray
#
Note
Converting uncompressed Pixel Data to an ndarray
requires installing
NumPy, and converting compressed Pixel Data may
require installing other packages. See this page
for a list of supported transfer syntaxes and the packages required to decompress
them.
There are three main methods for converting Pixel Data to an ndarray
,
depending on your use case:
Using the
Dataset.pixel_array
property, in conjunction withDataset.pixel_array_options()
.Using the
pixel_array()
oriter_pixels()
functions.Using the
Decoder.as_array()
orDecoder.iter_array()
instance methods.
With Dataset.pixel_array
#
The most convenient way to return the entire pixel data as an ndarray
is with the Dataset.pixel_array
property:
>>> from pydicom import examples
>>> ds = examples.ybr_color
>>> arr = ds.pixel_array
>>> arr.shape
(30, 240, 320, 3)
This will load the entire pixel data into memory and convert it to a ndarray
. By default,
it will also convert any pixel data in the YCbCr color space to RGB using
the convert_color_space()
function. Customization of the conversion
process can be done through Dataset.pixel_array_options()
.
>>> ds.pixel_array_options(index=0) # Convert only the first frame
>>> arr = ds.pixel_array # still reads all frames into memory
>>> arr.shape
(240, 320, 3)
The main drawbacks of Dataset.pixel_array
are:
It requires loading the entire Pixel Data into memory.
The
ndarray
is kept as an attribute of the dataset, taking up memory when it might not need to do so.The returned
ndarray
lacks any descriptive metadata.The conversion can only be customized using a second function.
With pixel_array()
and iter_pixels()
#
The pixel_array()
and iter_pixels()
functions convert the pixel data data
to an ndarray
directly:
>>> from pydicom.pixels import pixel_array
>>> arr = pixel_array(ds, index=0) # reads all frames into memory
>>> arr.shape
(240, 320, 3)
If you’re concerned about memory usage, both functions can be used with the path to the dataset instead. This will reduce the amount of Pixel Data read into memory to the minimum required:
>>> path = examples.get_path("ybr_color")
>>> arr = pixel_array(path, index=0) # reads only a single frame into memory
If you need the elements from the dataset’s Image Pixel
module in order to perform any required image processing operations (such as
rescale
and windowing
),
you can pass an empty Dataset
via the ds_out parameter, which will be populated by the
group 0x0028
elements:
>>> from pydicom.dataset import Dataset
>>> ds = Dataset()
>>> arr = pixel_array(path, index=0, ds_out=ds)
>>> ds.Rows, ds.Columns
(240, 320)
The main drawback of pixel_array()
and iter_pixels()
is that the
returned ndarray
lacks any descriptive metadata.
With Decoder.as_array()
and Decoder.iter_array()
#
Warning
Do not use the Decoder
class directly, instead use the class instance
returned by the get_decoder()
function.
Finally, if you need metadata describing the returned ndarray
, you can use the
Decoder.as_array()
and Decoder.iter_array()
methods for the Decoder
class instance returned by
get_decoder()
:
>>> from pydicom.pixels import get_decoder
>>> ds = examples.ybr_color
>>> decoder = get_decoder(ds.file_meta.TransferSyntaxUID)
>>> arr, meta = decoder.as_array(ds, index=0)
>>> meta["photometric_interpretation"]
'RGB'
>>> meta["number_of_frames"]
1
The main drawback of using the Decoder
methods in the manner shown above is
that the entire Pixel Data will be read into memory. However, this doesn’t necessarily
have to be the case; the pixel_array()
and
iter_pixels()
functions use those same methods to perform
memory-efficient decoding. If you need both image metadata and memory efficiency,
take a look at the source code for those functions to see how you can implement this yourself.
Loading a File-set#
See also
We have a separate and more in-depth File-set tutorial that also covers creating and modifying File-sets.
A File-set is a collection of (usually) related datasets that have been written to file and share
a common naming space. They’re identifiable by a DICOMDIR
file located in their root
directory, which is used to summarize the File-set’s contents. While the DICOMDIR file can be
read using dcmread()
like any other DICOM dataset, we recommend that
you use the FileSet
class to manage the DICOMDIR and related File-set
instead.
Warning
The DICOMDIR dataset contains a series of records that are referenced to each other by
their file offsets, which makes it very easy to ‘break’ a DICOMDIR dataset, even by
changing something seemingly innocuous like the (0004,1130) File-set ID. This is why we
recommend using FileSet
to manage any changes.
>>> from pydicom import examples
>>> from pydicom.fileset import FileSet
>>> path = examples.get_path("dicomdir") # The path to the example File-set
>>> fs = FileSet(path)
A summary of the File-set’s contents is shown when printing:
>>> print(fs)
DICOM File-set
Root directory: .../pydicom/data/test_files/dicomdirtests
File-set ID: PYDICOM_TEST
File-set UID: 1.2.276.0.7230010.3.1.4.0.31906.1359940846.78187
Descriptor file ID: (no value available)
Descriptor file character set: (no value available)
Changes staged for write(): DICOMDIR update, directory structure update
Managed instances:
PATIENT: PatientID='77654033', PatientName='Doe^Archibald'
STUDY: StudyDate=20010101, StudyTime=000000, StudyDescription='XR C Spine Comp Min 4 Views'
SERIES: Modality=CR, SeriesNumber=1
IMAGE: 1 SOP Instance
SERIES: Modality=CR, SeriesNumber=2
IMAGE: 1 SOP Instance
SERIES: Modality=CR, SeriesNumber=3
IMAGE: 1 SOP Instance
STUDY: StudyDate=19950903, StudyTime=173032, StudyDescription='CT, HEAD/BRAIN WO CONTRAST'
SERIES: Modality=CT, SeriesNumber=2
IMAGE: 4 SOP Instances
PATIENT: PatientID='98890234', PatientName='Doe^Peter'
STUDY: StudyDate=20010101, StudyTime=000000
SERIES: Modality=CT, SeriesNumber=4
IMAGE: 2 SOP Instances
SERIES: Modality=CT, SeriesNumber=5
IMAGE: 5 SOP Instances
...
You can search the File-set with the find_values()
method to
return a list of element values found in the DICOMDIR’s records:
>>> fs.find_values("PatientID")
['77654033', '98890234']
The search can be expanded to the File-set’s managed instances (its datasets), by passing
load=True
, at the cost of a longer search time due to having to read and decode the
corresponding files:
>>> fs.find_values("PhotometricInterpretation")
[]
>>> fs.find_values("PhotometricInterpretation", load=True)
['MONOCHROME1', 'MONOCHROME2']
The File-set can also be searched to find instances matching a query using the
find()
method, which returns a list of FileInstance
that
can be read and decoded using FileInstance.load()
to return them as a
FileDataset
:
>>> matches = fs.find(PatientID="77654033")
>>> len(matches)
7
>>> ds = matches[0].load()
>>> ds.PatientName
'Doe^Archibald'
find()
also supports the use of the load parameter:
>>> len(fs.find(PhotometricInterpretation='MONOCHROME1'))
0
>>> len(fs.find(PhotometricInterpretation='MONOCHROME1', load=True))
3