Core elements in pydicom¶
pydicom object model, description of classes, examples
Dataset¶
The main class in pydicom is Dataset which emulates the behavior
of a Python dict whose keys are DICOM tags (BaseTag instances),
and values are the corresponding DataElement instances.
Warning
The iterator of a Dataset yields
DataElement instances, e.g. the values of the
dictionary instead of the keys normally yielded by iterating a dict.
A Dataset can be created directly, but you’ll
usually get one by reading an existing DICOM dataset from file using
dcmread():
>>> from pydicom import dcmread, examples
>>> # Returns the path to pydicom's examples.rt_plan dataset
>>> path = examples.get_path("rt_plan")
>>> print(path)
PosixPath('/path/to/pydicom/data/test_files/rtplan.dcm')
>>> # Read the DICOM dataset at `path`
>>> ds = dcmread(path)
You can display the contents of the entire dataset using str(ds) or with:
>>> ds
Dataset.file_meta -------------------------------
(0002,0000) File Meta Information Group Length UL: 156
(0002,0001) File Meta Information Version OB: b'\x00\x01'
(0002,0002) Media Storage SOP Class UID UI: RT Plan Storage
(0002,0003) Media Storage SOP Instance UID UI: 1.2.999.999.99.9.9999.9999.20030903150023
(0002,0010) Transfer Syntax UID UI: Implicit VR Little Endian
(0002,0012) Implementation Class UID UI: 1.2.888.888.88.8.8.8
-------------------------------------------------
(0008,0012) Instance Creation Date DA: '20030903'
(0008,0013) Instance Creation Time TM: '150031'
(0008,0016) SOP Class UID UI: RT Plan Storage
(0008,0018) SOP Instance UID UI: 1.2.777.777.77.7.7777.7777.20030903150023
(0008,0020) Study Date DA: '20030716'
(0008,0030) Study Time TM: '153557'
(0008,0050) Accession Number SH: ''
(0008,0060) Modality CS: 'RTPLAN'
...
You can access specific elements by their DICOM keyword or tag:
>>> ds.PatientName # element keyword
'Last^First^mid^pre'
>>> ds[0x10, 0x10].value # element tag
'Last^First^mid^pre'
When using the element tag directly a DataElement
instance is returned, so DataElement.value
must be used to get the value.
Warning
In pydicom, private data elements are displayed with square brackets around the name (if the name is known to pydicom). These are shown for convenience only; the descriptive name in brackets cannot be used to retrieve data elements. See details in Private Data Elements.
You can also set an element’s value by using the element’s keyword or tag number:
>>> ds.PatientID = "12345"
>>> ds.SeriesNumber = 5
>>> ds[0x10, 0x10].value = 'Test'
The use of element keywords is possible because pydicom intercepts requests for member variables, and checks if they are in the DICOM dictionary. It translates the keyword to a (group, element) tag and returns the corresponding value for that tag if it exists in the dataset.
See Anonymize DICOM data for a usage example of data elements removal and assignation.
Note
To understand using Sequence in pydicom, please refer
to this object model:
Dataset(emulates a Pythondict)Contains
DataElementinstances, the value of each element can be one of:
The value of sequence elements is a Sequence
instance, which wraps a Python list. Items in the sequence are
referenced by number, beginning at index 0 as per Python convention:
>>> ds.BeamSequence[0].BeamName
'Field 1'
>>> # Or, set an intermediate variable to a dataset in the list
>>> beam1 = ds.BeamSequence[0] # First dataset in the sequence
>>> beam1.BeamName
'Field 1'
Using DICOM keywords is the recommended way to access data elements, but you can also use the tag numbers directly, such as:
>>> # Same thing with tag numbers - much harder to read:
>>> # Really should only be used if DICOM keyword not in pydicom dictionary
>>> ds[0x300a, 0xb0][0][0x300a, 0xc2].value
'Field 1'
If you don’t remember or know the exact element tag or keyword,
Dataset provides a handy
Dataset.dir() method, useful during interactive
sessions at the Python prompt:
>>> ds.dir("pat")
['PatientBirthDate', 'PatientID', 'PatientName', 'PatientSetupSequence', 'PatientSex']
Dataset.dir() will return any non-private element
keywords in the dataset that have the specified string anywhere in the
keyword (case insensitive).
Note
Calling Dataset.dir() without passing it an
argument will return a list of all non-private element keywords in
the dataset.
You can also see all the names that pydicom knows about by viewing the
_dicom_dict.py file. It
should not normally be necessary, but you can add your own entries to the
DICOM dictionary at run time using add_dict_entry() or
add_dict_entries(). Similarly, you can add private data
elements to the private dictionary using
add_private_dict_entry() or
add_private_dict_entries().
Under the hood, Dataset stores a
DataElement object for each item, but when
accessed by keyword (e.g. ds.PatientName) only the value of that
DataElement is returned. If you need the object itself,
you can use the access the item using either the keyword (for official DICOM
elements) or tag number:
>>> # reload the data
>>> ds = pydicom.dcmread(path)
>>> elem = ds['PatientName']
>>> elem.VR, elem.value
('PN', 'Last^First^mid^pre')
>>> # an alternative is to use:
>>> elem = ds[0x0010,0x0010]
>>> elem.VR, elem.value
('PN', 'Last^First^mid^pre')
To see whether the Dataset contains a particular element, use
the in operator with the element’s keyword or tag:
>>> "PatientName" in ds # or (0x0010, 0x0010) in ds
True
To remove an element from the Dataset, use the del
operator:
>>> del ds.SoftwareVersions # or del ds[0x0018, 0x1020]
To work with (7FE0,0010) Pixel Data, the raw bytes are available
through the PixelData keyword:
>>> # example CT dataset with actual pixel data
>>> ds = examples.ct
>>> pixel_bytes = ds.PixelData
However its much more convenient to use
Dataset.pixel_array to return a
numpy.ndarray (requires the NumPy library):
>>> arr = ds.pixel_array
>>> arr
array([[175, 180, 166, ..., 203, 207, 216],
[186, 183, 157, ..., 181, 190, 239],
[184, 180, 171, ..., 152, 164, 235],
...,
[906, 910, 923, ..., 922, 929, 927],
[914, 954, 938, ..., 942, 925, 905],
[959, 955, 916, ..., 911, 904, 909]], dtype=int16)
For more details, see Working with Pixel Data.
DataElement¶
The DataElement class is not usually used directly in user
code, but is used extensively by Dataset.
DataElement is a simple object which stores the following
things:
VR– the element’s Value Representation – a two letterstrthat describes to the format of the stored value.
VM– the element’s Value Multiplicity as anint. This is automatically determined from the contents of thevalue.
value– the element’s actual value. A regular value like a number or string (orlistof them if the VM > 1), or aSequence.
Tag¶
Tag() is not generally used directly in user code, as
BaseTags are automatically created when you assign or read
elements using their keywords as illustrated in sections above.
The BaseTag class is derived from int,
so in effect it’s just a number with some extra behavior:
Tag()is used to create instances ofBaseTagand enforces the expected 4-byte (group, element) structure.A
BaseTaginstance can be created from anintor atuplecontaining the (group, element), or from the DICOM keyword:>>> from pydicom.tag import Tag >>> t1 = Tag(0x00100010) # all of these are equivalent >>> t2 = Tag(0x10, 0x10) >>> t3 = Tag((0x10, 0x10)) >>> t4 = Tag("PatientName") >>> t1 (0010,0010) >>> type(t1) <class `pydicom.tag.BaseTag`> >>> t1==t2, t1==t3, t1==t4 (True, True, True)
BaseTag.groupandBaseTag.elemto return the group and element portions of the tag.The
BaseTag.is_privateproperty checks whether the tag represents a private tag (i.e. if group number is odd).
Sequence¶
Sequence is derived from Python’s list.
The only added functionality is to make string representations prettier.
Otherwise all the usual methods of list like item selection, append,
etc. are available.
For examples of accessing data nested in sequences, see Working with sequences.