pydicom quick start#

Welcome to the quick start guide for pydicom.

pydicom is an MIT licensed, open source Python library for creating, reading, modifying and writing DICOM Data Sets (datasets for short) and File-sets. It can also convert the imaging and waveform data in certain dataset types to a NumPy ndarray (and back again), as long as suitable optional packages are installed.

What is a DICOM dataset?#

A DICOM dataset represents an instance of a real world object, such as a single image slice from a CT scan acquisition. Each dataset is made up of a collection of Data Elements, with each Data Element representing an attribute of the object. A Data Element is itself made of a unique identifier called the Element Tag, has a format specifier called the Value Representation and contains the Value of the attribute. The DICOM Standard groups Data Elements that describe related attributes into modules.

In Part 3, the DICOM Standard defines the many different types of dataset using something called an Information Object Definition (IOD). Each IOD contains a table of optional (U) and mandatory (M) modules that a dataset must have in order to meet that definition. This means you can use the IOD that corresponds to a given dataset to determine which Data Elements it should contain.

As an example, the CT Image IOD contains this table with the modules that are required for a dataset to be considered a valid CT Image instance. This includes the Patient module, which contains patient demographic information. If we look at the Patient module itself, we see that it contains attributes for the Patient’s Name, Patient ID and Patient’s Birth Date, all of which are considered Type 2.

Type 2 attributes must be present, but may have an empty value, so in any given CT Image dataset we should be able to find three Data Elements corresponding to those attributes, albeit with no guarantee they’ll have a useful value.

Reading a dataset#

Note

We’re going to be using example DICOM datasets that are included with pydicom, such as CT_small.dcm. You can get the local file path to these datasets by using the get_path() function to return the path as a pathlib.Path (your path may vary):

>>> from pydicom import examples
>>> path = examples.get_path("ct")
>>> path
PosixPath('/path/to/pydicom/data/test_files/CT_small.dcm')

When using pydicom to read your own data, use the path to those files directly instead.

To read the DICOM dataset at a given file path (as a str or pathlib.Path) we use dcmread(), which returns a FileDataset instance:

>>> from pydicom import dcmread, examples
>>> path = examples.get_path("ct")
>>> ds = dcmread(path)

dcmread() can also handle file-likes:

>>> with open(path, 'rb') as f:
...     ds = dcmread(f)

And can be used as a context manager:

>>> with dcmread(path) as ds:
...    type(ds)
...
<class 'pydicom.dataset.FileDataset'>

By default, dcmread() will read any DICOM dataset stored in accordance with the DICOM File Format. However, you may occasionally read a file that gives you the following exception:

>>> no_meta_path = examples.get_path('no_meta')
>>> ds = dcmread(no_meta_path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pydicom/filereader.py", line 887, in dcmread
    force=force, specific_tags=specific_tags)
  File ".../pydicom/filereader.py", line 678, in read_partial
    preamble = read_preamble(fileobj, force)
  File ".../pydicom/filereader.py", line 631, in read_preamble
    raise InvalidDicomError("File is missing DICOM File Meta Information "
  pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information
  header or the 'DICM' prefix is missing from the header. Use force=True to
  force reading.

This indicates that either:

  • The file isn’t a DICOM file, or

  • The file contains DICOM data but isn’t in the DICOM File Format

If you’re sure the file contains DICOM data, you can use the force keyword parameter to force reading:

>>> ds = dcmread(no_meta_path, force=True)

A note of caution about using force=True; because pydicom uses a deferred-read system, no exceptions will be raised at the time of reading, no matter what the contents of the file are:

>>> with open('not_dicom.txt', 'w') as not_dicom:
...    not_dicom.write('This is not a DICOM file!')
...
>>> ds = dcmread('not_dicom.txt', force=True)

You’ll only run into problems when trying to use the dataset:

>>> print(ds)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "../pydicom/dataset.py", line 1703, in __str__
      return self._pretty_str()
  File "../pydicom/dataset.py", line 1436, in _pretty_str
      for data_element in self:
  File "../pydicom/dataset.py", line 1079, in __iter__
      yield self[tag]
  File "../pydicom/dataset.py", line 833, in __getitem__
      self[tag] = DataElement_from_raw(data_elem, character_set)
  File "../pydicom/dataelem.py", line 581, in DataElement_from_raw
      raise KeyError(msg)
  KeyError: "Unknown DICOM tag (6854,7369) can't look up VR"

Viewing and accessing#

The CT_small.dcm dataset is also included as an example FileDataset:

>>> from pydicom import examples
>>> ds = examples.ct
>>> type(ds)
<class 'pydicom.dataset.FileDataset'>
>>> ds.filename
'/path/to/pydicom/data/test_files/CT_small.dcm'

You can view the contents of the entire dataset by using print():

>>> print(ds)
Dataset.file_meta -------------------------------
(0002,0000) File Meta Information Group Length  UL: 192
(0002,0001) File Meta Information Version       OB: b'\x00\x01'
(0002,0002) Media Storage SOP Class UID         UI: CT Image Storage
(0002,0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.5962.1.1.1.1.1.20040119072730.12322
(0002,0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002,0012) Implementation Class UID            UI: 1.3.6.1.4.1.5962.2
(0002,0013) Implementation Version Name         SH: 'DCTOOL100'
(0002,0016) Source Application Entity Title     AE: 'CLUNIE1'
-------------------------------------------------
(0008,0005) Specific Character Set              CS: 'ISO_IR 100'
(0008,0008) Image Type                          CS: ['ORIGINAL', 'PRIMARY', 'AXIAL']
(0008,0012) Instance Creation Date              DA: '20040119'
(0008,0013) Instance Creation Time              TM: '072731'
(0008,0014) Instance Creator UID                UI: 1.3.6.1.4.1.5962.3
(0008,0016) SOP Class UID                       UI: CT Image Storage
...
(0010,1002)  Other Patient IDs Sequence   2 item(s) ----
    (0010,0020) Patient ID                          LO: 'ABCD1234'
    (0010,0022) Type of Patient ID                  CS: 'TEXT'
    ---------
    (0010,0020) Patient ID                          LO: '1234ABCD'
    (0010,0022) Type of Patient ID                  CS: 'TEXT'
    ---------
...
(0043,0010) Private Creator                     LO: 'GEMS_PARM_01'
(0043,1010) [Window value]                      US: 400
...
(7FE0,0010) Pixel Data                          OW: Array of 32768 elements
(FFFC,FFFC) Data Set Trailing Padding           OB: Array of 126 elements

The print output shows a list of the Data Elements (or elements for short) present in the dataset, one element per line. The format of each line is:

  • (0008,0005): The element’s tag, as (group number, element number) in hexadecimal

  • Specific Character Set: the element’s name, if known

  • CS: The element’s Value Representation (VR), if known

  • ‘ISO_IR_100’: the element’s stored value, or the length of the value if it’s too long to show concisely

Elements#

There are three categories of elements:

  • Standard elements such as (0008,0016) SOP Class UID. These elements are registered in Part 6 of the official DICOM Standard, have a tag with an even group number and are unique at each level of the dataset.

  • Repeating group elements such as (60xx,3000) Overlay Data (not found in this dataset). Repeating group elements are also registered in the official DICOM Standard, however they have a tag with a group number defined over a range rather than a fixed value. For example, there may be multiple Overlay Data elements at a given level of the dataset as long as each has its own unique group number; 0x6000, 0x6002, 0x6004, or any even value up to 0x601E.

  • Private elements such as (0043,1010) [Window value]. Private elements have a tag with an odd group number, aren’t registered in the official DICOM Standard, and are instead created privately, as specified by the (gggg,0010-00FF) Private Creator element.

    • If the private creator is unknown to pydicom then the element name will be Private tag data and the VR UN.

    • If the private creator is known then the element name will be surrounded by square brackets, e.g. [Window value] and the VR will be shown.

For all element categories, we can access a particular element in the dataset through its tag, which returns a DataElement instance:

>>> elem = ds[0x0008, 0x0016]
>>> elem
(0008,0016) SOP Class UID                       UI: CT Image Storage
>>> elem.tag
(0008,0016)
>>> elem.keyword
'SOPClassUID'
>>> private_elem = ds[0x0043, 0x1010]
>>> private_elem
(0043,1010) [Window value]                      US: 400
>>> private_elem.keyword
''

We can also access standard elements through their keyword. The keyword is usually the same as the element’s name without any spaces, but there are exceptions - such as (0010,0010) Patient’s Name having a keyword of PatientName. A list of keywords for all standard elements can be found here.

>>> elem = ds['SOPClassUID']
>>> elem
(0008,0016) SOP Class UID                       UI: CT Image Storage

Because of the lack of a unique keyword, this won’t work for private or repeating group elements. So for those elements stick to the Dataset[group number, element number] method.

In most cases, the important thing about an element is its value:

>>> elem.value
'1.2.840.10008.5.1.4.1.1.2'

For standard elements, you can use the Python dot notation with the keyword to get the value:

>>> ds.SOPClassUID
'1.2.840.10008.5.1.4.1.1.2'

This is the recommended method of accessing the value of standard elements. It’s simpler and more human-friendly then dealing with element tags and later on you’ll see how you can use the keyword to do far more than just accessing the value.

Elements may also be multi-valued - that is, have a Value Multiplicity (VM) > 1:

>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'AXIAL']
>>> ds['ImageType'].VM
3

The items for multi-valued elements can be accessed using the standard Python list methods:

>>> ds.ImageType[1]
'PRIMARY'

Sequences#

When viewing a dataset, you may see that some of the elements are indented:

>>> print(ds)
...
(0010,1002)  Other Patient IDs Sequence   2 item(s) ----
    (0010,0020) Patient ID                          LO: 'ABCD1234'
    (0010,0022) Type of Patient ID                  CS: 'TEXT'
    ---------
    (0010,0020) Patient ID                          LO: '1234ABCD'
    (0010,0022) Type of Patient ID                  CS: 'TEXT'
    ---------
...

This indicates that those elements are part of a sequence, in this case part of the Other Patient IDs Sequence element. Sequence elements have a VR of SQ and have a name that ends in the word Sequence. DICOM datasets use the tree data structure, with non-sequence elements acting as leaves and sequence elements acting as the nodes where branches start.

  • The top-level (root) dataset contains 0 or more elements:

    • An element may be non-sequence type; its VR is not SQ (leaf), or

    • An element may be a sequence type; its VR is SQ and it contains 0 or more items (branches):

      • Each item in the sequence is another dataset, containing 0 or more elements:

        • An element may be non-sequence type, or

        • An element may be a sequence type, and so on…

Sequence elements can be accessed in the same manner as non-sequence ones:

>>> elem = ds[0x0010, 0x1002]
>>> elem = ds['OtherPatientIDsSequence']

The main difference between sequence and non-sequence elements is that their value is a list-like object containing zero or more Dataset instances, which can be accessed using the standard Python list methods:

>>> len(ds.OtherPatientIDsSequence)
2
>>> type(ds.OtherPatientIDsSequence[0])
<class 'pydicom.dataset.Dataset'>
>>> ds.OtherPatientIDsSequence[0]
(0010,0020) Patient ID                          LO: 'ABCD1234'
(0010,0022) Type of Patient ID                  CS: 'TEXT'
>>> ds.OtherPatientIDsSequence[1]
(0010,0020) Patient ID                          LO: '1234ABCD'
(0010,0022) Type of Patient ID                  CS: 'TEXT'

Dataset.file_meta#

Earlier we saw that by default dcmread() only reads files that are in the DICOM File Format. So what’s the difference between a DICOM dataset written to file and one written in the DICOM File Format? The answer is a file header containing:

  • An 128 byte preamble:

    >>> ds.preamble
    b'II*\x00T\x18\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...
    
  • Followed by a 4 byte DICM prefix

  • Followed by the required DICOM File Meta Information elements, which in pydicom are stored in a FileMetaDataset instance in the file_meta attribute:

    >>> ds.file_meta
    (0002,0000) File Meta Information Group Length  UL: 192
    (0002,0001) File Meta Information Version       OB: b'\x00\x01'
    (0002,0002) Media Storage SOP Class UID         UI: CT Image Storage
    (0002,0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.5962.1.1.1.1.1.20040119072730.12322
    (0002,0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
    (0002,0012) Implementation Class UID            UI: 1.3.6.1.4.1.5962.2
    (0002,0013) Implementation Version Name         SH: 'DCTOOL100'
    (0002,0016) Source Application Entity Title     AE: 'CLUNIE1'
    

As you can see, all the elements in the file_meta have tags with a group number of 0x0002. In fact, the DICOM File Format header is the only place you should find group 0x0002 elements as their presence anywhere else is non-conformant.

Out of all of the elements in the file_meta, the most important is (0002,0010) Transfer Syntax UID, as the transfer syntax defines the way the entire dataset (including the pixel data) has been encoded. Chances are that at some point you’ll need to know it:

>>> ds.file_meta.TransferSyntaxUID
'1.2.840.10008.1.2.1'
>>> ds.file_meta.TransferSyntaxUID.name
'Explicit VR Little Endian'
>>> ds.file_meta.TransferSyntaxUID.keyword
'ExplicitVRLittleEndian'

Modifying a dataset#

Modifying elements#

We can modify the value of any element by retrieving it and setting the value:

>>> elem = ds[0x0010, 0x0010]
>>> elem.value
'CompressedSamples^CT1'
>>> elem.value = 'Citizen^Jan'
>>> elem
(0010,0010) Patient's Name                      PN: 'Citizen^Jan'

Which raises the question; what kind of value should be used to set an element’s value? In the above example we used a str to set the Patient’s Name, but what about for other elements? Should they all be strings too? (Hint: no).

The allowed object type to use for an element’s value depends on its Value Representation. We can see from the above that Patient’s Name has a VR of PN. By checking the Element VR and Python types guide, we see that elements with a VR of PN can be set using:

  • None, str or PersonName if the Value Multiplicity (VM) is 1, or

  • list[str] or list[PersonName] for VM > 1.

Each standard element also has restrictions on its allowed VM, given in Part 6. For Patient’s Name the VM must always be 1, so the allowed types are None, str or PersonName. If instead we look up (0018,106C) Synchronization Channel, we see the VR is US and the allowed VM 2, so using the Element VR and Python types guide, we see the only type that may be used is list[int].

For standard elements it’s simpler to use the keyword to set the value:

>>> ds.PatientName = 'Citizen^Snips'
>>> elem
(0010,0010) Patient's Name                      PN: 'Citizen^Snips'

Multi-valued elements can be set using a list or modified using the list methods:

>>> ds.ImageType = ['ORIGINAL', 'PRIMARY', 'LOCALIZER']  # VR 'CS'
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'LOCALIZER']
>>> ds.ImageType[1] = 'DERIVED'
>>> ds.ImageType
['ORIGINAL', 'DERIVED', 'LOCALIZER']
>>> ds.ImageType.insert(1, 'PRIMARY')
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'DERIVED', 'LOCALIZER']

Similarly, for sequence elements:

>>> from pydicom.dataset import Dataset
>>> ds.OtherPatientIDsSequence = [Dataset(), Dataset()]  # VR 'SQ'
>>> ds.OtherPatientIDsSequence.append(Dataset())
>>> len(ds.OtherPatientIDsSequence)
3

The items in a sequence are always Dataset instances, if you try to add any other type to a sequence you’ll get an exception:

>>> ds.OtherPatientIDsSequence.append('Hello world?')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pydicom/multival.py", line 63, in append
    self._list.append(self.type_constructor(val))
  File ".../pydicom/sequence.py", line 15, in validate_dataset
    raise TypeError('Sequence contents must be Dataset instances.')
  TypeError: Sequence contents must be Dataset instances.

You can set any element value as empty by using None (sequence elements will automatically be converted to an empty list when you do so):

>>> ds.PatientName = None
>>> elem
(0010,0010) Patient's Name                      PN: None
>>> ds.OtherPatientIDsSequence = None
>>> len(ds.OtherPatientIDsSequence)
0

Elements with a value of None, b'', '' or [] will still be written to file, but will have an empty value and zero length.

Adding elements#

Standard and repeating group elements#

New elements of any category can be added to the dataset with the add_new() method, which takes the tag, VR and value to use for the new element.

Let’s say we wanted to add the (0028,1050) Window Center standard element. We already know the tag is (0028,1050), but how we get the VR and how do we know the Python type to use for the value?

There are two ways to get an element’s VR:

>>> from pydicom.datadict import dictionary_VR
>>> dictionary_VR([0x0028, 0x1050])
'DS'

As we saw earlier, you can use the Element VR and Python types guide to find the Python type to use for a given VR. For DS with a VM of 1-n, we can use a str, int or float, or a list of those types. So to add the new element:

>>> ds.add_new([0x0028, 0x1050], 'DS', "100.0")
>>> elem = ds[0x0028, 0x1050]
>>> elem
(0028,1050) Window Center                       DS: "100.0"

Some VRs, like DS, require the value be formatted correctly. For example, elements with a VR of DA should use the YYYYMMDD format and only allow ASCII characters 0 to 9 (unless used for query matching). The full list of VRs and their formatting requirements can be found in Section 6.2 of Part 5 of the DICOM Standard.

Alternative for standard elements#

Adding elements with add_new() is a lot of work, so for standard elements you can just use the keyword and pydicom will do the VR lookup for you:

>>> 'WindowWidth' in ds
False
>>> ds.WindowWidth = 500
>>> ds['WindowWidth']
(0028,1051) Window Width                        DS: "500.0"

Notice how we can also use the element keyword with the Python in operator to see if a standard element is in the dataset? This also works with element tags, so private and repeating group elements are also covered:

>>> [0x0043, 0x1010] in ds
True

Sequences#

Because sequence items are also Dataset instances, you can use the same methods on them as well.

>>> seq = ds.OtherPatientIDsSequence
>>> seq += [Dataset(), Dataset(), Dataset()]
>>> seq[0].PatientID = 'Citizen^Jan'
>>> seq[0].TypeOfPatientID = 'TEXT'
>>> seq[1].PatientID = 'CompressedSamples^CT1'
>>> seq[1].TypeOfPatientID = 'TEXT'
>>> seq[0]
(0010,0020) Patient ID                          LO: 'Citizen^Jan'
(0010,0022) Type of Patient ID                  CS: 'TEXT'
>>> seq[1]
(0010,0020) Patient ID                          LO: 'CompressedSamples^CT1'
(0010,0022) Type of Patient ID                  CS: 'TEXT'

Private elements#

When adding private elements, the DICOM Standard requires a (gggg,0010-00FF) Private Creator element also be added to identify and reserve the gggg section of private tags. pydicom provides the add_new_private() convenience method to help manage this:

>>> ds.add_new_private("Private Creator Name", 0x000B, 0x01, "my value", "SH")
>>> ds
...
(000B,0010) Private Creator                     LO: 'Private Creator Name'
(000B,1001) Private tag data                    SH: 'my value'
...

Deleting elements#

All elements can be deleted with the del operator in combination with the element tag:

>>> del ds[0x0043, 0x1010]
>>> [0x0043, 0x1010] in ds
False

For standard elements you can use the keyword instead:

>>> del ds.WindowCenter
>>> 'WindowCenter' in ds
False

And you can remove items from sequences and multi-valued elements using your preferred list method:

>>> del ds.OtherPatientIDsSequence[2]
>>> len(seq)
2
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'DERIVED', 'LOCALIZER']
>>> del ds.ImageType[2]
>>> ds.ImageType
['ORIGINAL', 'PRIMARY', 'LOCALIZER']

Writing a dataset#

After changing the dataset, the final step is to write the modifications back to file. This can be done by using save_as() to write the dataset to the supplied path:

>>> ds.save_as('out.dcm')

You can also write to any Python file-like:

>>> with open('out.dcm', 'wb') as f:
...    ds.save_as(f)
...
>>> from io import BytesIO
>>> out = BytesIO()
>>> ds.save_as(out)

By default, save_as() will write the dataset as-is. This means that even if your dataset is not conformant to the DICOM File Format it will still be written exactly as given. To be certain you’re writing the dataset in the DICOM File Format you can use the enforce_file_format keyword parameter:

>>> ds.save_as('out.dcm', enforce_file_format=True)

This will attempt to automatically add in any missing required group 0x0002 File Meta Information elements and set a blank 128 byte preamble (if required). If it’s unable to do so then an exception will be raised:

>>> del ds.file_meta
>>> ds.save_as('out.dcm', enforce_file_format=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pydicom/dataset.py", line 2452, in save_as
    pydicom.dcmwrite(
  File ".../pydicom/filewriter.py", line 1311, in dcmwrite
    validate_file_meta(file_meta, enforce_standard=True)
  File ".../pydicom/dataset.py", line 3204, in validate_file_meta
    raise AttributeError(
AttributeError: Required File Meta Information elements are either missing
or have an empty value: (0002,0010) Transfer Syntax UID

The exception message contains the required element(s) that need to be added, usually this will only be the Transfer Syntax UID. It’s an important element, so get in the habit of making sure it’s there and correct.

Because we deleted the file_meta dataset we need to add it back:

>>> from pydicom.dataset import FileMetaDataset
>>> ds.file_meta = FileMetaDataset()

And now we can add our Transfer Syntax UID element and save to file:

>>> ds.file_meta.TransferSyntaxUID = '1.2.840.10008.1.2.1'
>>> ds.save_as('out.dcm', enforce_file_format=True)

Accessing Pixel Data#

See also

We have a separate and more in-depth pixel data tutorial that also covers creation of new pixel data and compressing and decompressing existing pixel data.

Many DICOM datasets have image and image-like data available in the (7FE0,0010) Pixel Data element. If present, the data will either be available as uncompressed raw binary data or as an encapsulated and compressed image codestream, depending on the Transfer Syntax UID:

>>> ds = examples.ct
>>> ds.file_meta.TransferSyntaxUID.is_compressed  # raw binary
False
>>> ds = examples.jpeg2k
>>> ds.file_meta.TransferSyntaxUID.is_compressed  # encapsulated codestreams
True

As bytes#

For datasets with an uncompressed Transfer Syntax UID, accessing the image data is simply a matter of accessing the Pixel Data value, which will return all frames concatenated together as bytes:

>>> ds = examples.ct
>>> pixel_data = ds.PixelData
>>> type(pixel_data)
<class 'bytes'>
>>> len(pixel_data)
32768

For datasets with a compressed Transfer Syntax, each frame of image data will have been encapsulated, which must be reversed. In pydicom this can be done with the get_frame() function or the generate_frames() iterator to return or yield a frame of compressed image data as bytes:

>>> from pydicom.encaps import get_frame
>>> ds = examples.jpeg2k
>>> len(ds.PixelData)
152326
>>> nr_frames = ds.get("NumberOfFrames", 1)  # Number Of Frames may not be present
>>> frame = get_frame(ds.PixelData, 0, number_of_frames=nr_frames)
>>> len(frame)
152294

As a NumPy ndarray#

Note

Converting uncompressed Pixel Data to an ndarray requires installing NumPy, and converting compressed Pixel Data may require installing other packages. See this page for a list of supported transfer syntaxes and the packages required to decompress them.

There are three main methods for converting Pixel Data to an ndarray, depending on your use case:

  1. Using the Dataset.pixel_array property, in conjunction with Dataset.pixel_array_options().

  2. Using the pixel_array() or iter_pixels() functions.

  3. Using the Decoder.as_array() or Decoder.iter_array() instance methods.

With Dataset.pixel_array#

The most convenient way to return the entire pixel data as an ndarray is with the Dataset.pixel_array property:

>>> from pydicom import examples
>>> ds = examples.ybr_color
>>> arr = ds.pixel_array
>>> arr.shape
(30, 240, 320, 3)

This will load the entire pixel data into memory and convert it to a ndarray. By default, it will also convert any pixel data in the YCbCr color space to RGB using the convert_color_space() function. Customization of the conversion process can be done through Dataset.pixel_array_options().

>>> ds.pixel_array_options(index=0)  # Convert only the first frame
>>> arr = ds.pixel_array  # still reads all frames into memory
>>> arr.shape
(240, 320, 3)

The main drawbacks of Dataset.pixel_array are:

  • It requires loading the entire Pixel Data into memory.

  • The ndarray is kept as an attribute of the dataset, taking up memory when it might not need to do so.

  • The returned ndarray lacks any descriptive metadata.

  • The conversion can only be customized using a second function.

With pixel_array() and iter_pixels()#

The pixel_array() and iter_pixels() functions convert the pixel data data to an ndarray directly:

>>> from pydicom.pixels import pixel_array
>>> arr = pixel_array(ds, index=0)  # reads all frames into memory
>>> arr.shape
(240, 320, 3)

If you’re concerned about memory usage, both functions can be used with the path to the dataset instead. This will reduce the amount of Pixel Data read into memory to the minimum required:

>>> path = examples.get_path("ybr_color")
>>> arr = pixel_array(path, index=0)   # reads only a single frame into memory

If you need the elements from the dataset’s Image Pixel module in order to perform any required image processing operations (such as rescale and windowing), you can pass an empty Dataset via the ds_out parameter, which will be populated by the group 0x0028 elements:

>>> from pydicom.dataset import Dataset
>>> ds = Dataset()
>>> arr = pixel_array(path, index=0, ds_out=ds)
>>> ds.Rows, ds.Columns
(240, 320)

The main drawback of pixel_array() and iter_pixels() is that the returned ndarray lacks any descriptive metadata.

With Decoder.as_array() and Decoder.iter_array()#

Warning

Do not use the Decoder class directly, instead use the class instance returned by the get_decoder() function.

Finally, if you need metadata describing the returned ndarray, you can use the Decoder.as_array() and Decoder.iter_array() methods for the Decoder class instance returned by get_decoder():

>>> from pydicom.pixels import get_decoder
>>> ds = examples.ybr_color
>>> decoder = get_decoder(ds.file_meta.TransferSyntaxUID)
>>> arr, meta = decoder.as_array(ds, index=0)
>>> meta["photometric_interpretation"]
'RGB'
>>> meta["number_of_frames"]
1

The main drawback of using the Decoder methods in the manner shown above is that the entire Pixel Data will be read into memory. However, this doesn’t necessarily have to be the case; the pixel_array() and iter_pixels() functions use those same methods to perform memory-efficient decoding. If you need both image metadata and memory efficiency, take a look at the source code for those functions to see how you can implement this yourself.

Loading a File-set#

See also

We have a separate and more in-depth File-set tutorial that also covers creating and modifying File-sets.

A File-set is a collection of (usually) related datasets that have been written to file and share a common naming space. They’re identifiable by a DICOMDIR file located in their root directory, which is used to summarize the File-set’s contents. While the DICOMDIR file can be read using dcmread() like any other DICOM dataset, we recommend that you use the FileSet class to manage the DICOMDIR and related File-set instead.

Warning

The DICOMDIR dataset contains a series of records that are referenced to each other by their file offsets, which makes it very easy to ‘break’ a DICOMDIR dataset, even by changing something seemingly innocuous like the (0004,1130) File-set ID. This is why we recommend using FileSet to manage any changes.

>>> from pydicom import examples
>>> from pydicom.fileset import FileSet
>>> path = examples.get_path("dicomdir")  # The path to the example File-set
>>> fs = FileSet(path)

A summary of the File-set’s contents is shown when printing:

>>> print(fs)
DICOM File-set
  Root directory: .../pydicom/data/test_files/dicomdirtests
  File-set ID: PYDICOM_TEST
  File-set UID: 1.2.276.0.7230010.3.1.4.0.31906.1359940846.78187
  Descriptor file ID: (no value available)
  Descriptor file character set: (no value available)
  Changes staged for write(): DICOMDIR update, directory structure update

  Managed instances:
    PATIENT: PatientID='77654033', PatientName='Doe^Archibald'
      STUDY: StudyDate=20010101, StudyTime=000000, StudyDescription='XR C Spine Comp Min 4 Views'
        SERIES: Modality=CR, SeriesNumber=1
          IMAGE: 1 SOP Instance
        SERIES: Modality=CR, SeriesNumber=2
          IMAGE: 1 SOP Instance
        SERIES: Modality=CR, SeriesNumber=3
          IMAGE: 1 SOP Instance
      STUDY: StudyDate=19950903, StudyTime=173032, StudyDescription='CT, HEAD/BRAIN WO CONTRAST'
        SERIES: Modality=CT, SeriesNumber=2
          IMAGE: 4 SOP Instances
    PATIENT: PatientID='98890234', PatientName='Doe^Peter'
      STUDY: StudyDate=20010101, StudyTime=000000
        SERIES: Modality=CT, SeriesNumber=4
          IMAGE: 2 SOP Instances
        SERIES: Modality=CT, SeriesNumber=5
          IMAGE: 5 SOP Instances
      ...

You can search the File-set with the find_values() method to return a list of element values found in the DICOMDIR’s records:

>>> fs.find_values("PatientID")
['77654033', '98890234']

The search can be expanded to the File-set’s managed instances (its datasets), by passing load=True, at the cost of a longer search time due to having to read and decode the corresponding files:

>>> fs.find_values("PhotometricInterpretation")
[]
>>> fs.find_values("PhotometricInterpretation", load=True)
['MONOCHROME1', 'MONOCHROME2']

The File-set can also be searched to find instances matching a query using the find() method, which returns a list of FileInstance that can be read and decoded using FileInstance.load() to return them as a FileDataset:

>>> matches = fs.find(PatientID="77654033")
>>> len(matches)
7
>>> ds = matches[0].load()
>>> ds.PatientName
'Doe^Archibald'

find() also supports the use of the load parameter:

>>> len(fs.find(PhotometricInterpretation='MONOCHROME1'))
0
>>> len(fs.find(PhotometricInterpretation='MONOCHROME1', load=True))
3