Core elements in pydicom¶
pydicom object model, description of classes, examples
Dataset¶
The main class in pydicom is Dataset
which emulates the behavior
of a Python dict
whose keys are DICOM tags (BaseTag
instances),
and values are the corresponding DataElement
instances.
Warning
The iterator of a Dataset
yields
DataElement
instances, e.g. the values of the
dictionary instead of the keys normally yielded by iterating a dict
.
A Dataset
can be created directly, but you’ll
usually get one by reading an existing DICOM dataset from file using
dcmread()
:
>>> from pydicom import dcmread, examples
>>> # Returns the path to pydicom's examples.rt_plan dataset
>>> path = examples.get_path("rt_plan")
>>> print(path)
PosixPath('/path/to/pydicom/data/test_files/rtplan.dcm')
>>> # Read the DICOM dataset at `path`
>>> ds = dcmread(path)
You can display the contents of the entire dataset using str(ds)
or with:
>>> ds
Dataset.file_meta -------------------------------
(0002,0000) File Meta Information Group Length UL: 156
(0002,0001) File Meta Information Version OB: b'\x00\x01'
(0002,0002) Media Storage SOP Class UID UI: RT Plan Storage
(0002,0003) Media Storage SOP Instance UID UI: 1.2.999.999.99.9.9999.9999.20030903150023
(0002,0010) Transfer Syntax UID UI: Implicit VR Little Endian
(0002,0012) Implementation Class UID UI: 1.2.888.888.88.8.8.8
-------------------------------------------------
(0008,0012) Instance Creation Date DA: '20030903'
(0008,0013) Instance Creation Time TM: '150031'
(0008,0016) SOP Class UID UI: RT Plan Storage
(0008,0018) SOP Instance UID UI: 1.2.777.777.77.7.7777.7777.20030903150023
(0008,0020) Study Date DA: '20030716'
(0008,0030) Study Time TM: '153557'
(0008,0050) Accession Number SH: ''
(0008,0060) Modality CS: 'RTPLAN'
...
You can access specific elements by their DICOM keyword or tag:
>>> ds.PatientName # element keyword
'Last^First^mid^pre'
>>> ds[0x10, 0x10].value # element tag
'Last^First^mid^pre'
When using the element tag directly a DataElement
instance is returned, so DataElement.value
must be used to get the value.
Warning
In pydicom, private data elements are displayed with square brackets around the name (if the name is known to pydicom). These are shown for convenience only; the descriptive name in brackets cannot be used to retrieve data elements. See details in Private Data Elements.
You can also set an element’s value by using the element’s keyword or tag number:
>>> ds.PatientID = "12345"
>>> ds.SeriesNumber = 5
>>> ds[0x10, 0x10].value = 'Test'
The use of element keywords is possible because pydicom intercepts requests for member variables, and checks if they are in the DICOM dictionary. It translates the keyword to a (group, element) tag and returns the corresponding value for that tag if it exists in the dataset.
See Anonymize DICOM data for a usage example of data elements removal and assignation.
Note
To understand using Sequence
in pydicom, please refer
to this object model:
Dataset
(emulates a Pythondict
)Contains
DataElement
instances, the value of each element can be one of:
The value of sequence elements is a Sequence
instance, which wraps a Python list
. Items in the sequence are
referenced by number, beginning at index 0
as per Python convention:
>>> ds.BeamSequence[0].BeamName
'Field 1'
>>> # Or, set an intermediate variable to a dataset in the list
>>> beam1 = ds.BeamSequence[0] # First dataset in the sequence
>>> beam1.BeamName
'Field 1'
Using DICOM keywords is the recommended way to access data elements, but you can also use the tag numbers directly, such as:
>>> # Same thing with tag numbers - much harder to read:
>>> # Really should only be used if DICOM keyword not in pydicom dictionary
>>> ds[0x300a, 0xb0][0][0x300a, 0xc2].value
'Field 1'
If you don’t remember or know the exact element tag or keyword,
Dataset
provides a handy
Dataset.dir()
method, useful during interactive
sessions at the Python prompt:
>>> ds.dir("pat")
['PatientBirthDate', 'PatientID', 'PatientName', 'PatientSetupSequence', 'PatientSex']
Dataset.dir()
will return any non-private element
keywords in the dataset that have the specified string anywhere in the
keyword (case insensitive).
Note
Calling Dataset.dir()
without passing it an
argument will return a list
of all non-private element keywords in
the dataset.
You can also see all the names that pydicom knows about by viewing the
_dicom_dict.py file. It
should not normally be necessary, but you can add your own entries to the
DICOM dictionary at run time using add_dict_entry()
or
add_dict_entries()
. Similarly, you can add private data
elements to the private dictionary using
add_private_dict_entry()
or
add_private_dict_entries()
.
Under the hood, Dataset
stores a
DataElement
object for each item, but when
accessed by keyword (e.g. ds.PatientName
) only the value of that
DataElement
is returned. If you need the object itself,
you can use the access the item using either the keyword (for official DICOM
elements) or tag number:
>>> # reload the data
>>> ds = pydicom.dcmread(path)
>>> elem = ds['PatientName']
>>> elem.VR, elem.value
('PN', 'Last^First^mid^pre')
>>> # an alternative is to use:
>>> elem = ds[0x0010,0x0010]
>>> elem.VR, elem.value
('PN', 'Last^First^mid^pre')
To see whether the Dataset
contains a particular element, use
the in
operator with the element’s keyword or tag:
>>> "PatientName" in ds # or (0x0010, 0x0010) in ds
True
To remove an element from the Dataset
, use the del
operator:
>>> del ds.SoftwareVersions # or del ds[0x0018, 0x1020]
To work with (7FE0,0010) Pixel Data, the raw bytes
are available
through the PixelData keyword:
>>> # example CT dataset with actual pixel data
>>> ds = examples.ct
>>> pixel_bytes = ds.PixelData
However its much more convenient to use
Dataset.pixel_array
to return a
numpy.ndarray
(requires the NumPy library):
>>> arr = ds.pixel_array
>>> arr
array([[175, 180, 166, ..., 203, 207, 216],
[186, 183, 157, ..., 181, 190, 239],
[184, 180, 171, ..., 152, 164, 235],
...,
[906, 910, 923, ..., 922, 929, 927],
[914, 954, 938, ..., 942, 925, 905],
[959, 955, 916, ..., 911, 904, 909]], dtype=int16)
For more details, see Working with Pixel Data.
DataElement¶
The DataElement
class is not usually used directly in user
code, but is used extensively by Dataset
.
DataElement
is a simple object which stores the following
things:
VR
– the element’s Value Representation – a two letterstr
that describes to the format of the stored value.
VM
– the element’s Value Multiplicity as anint
. This is automatically determined from the contents of thevalue
.
value
– the element’s actual value. A regular value like a number or string (orlist
of them if the VM > 1), or aSequence
.
Tag¶
Tag()
is not generally used directly in user code, as
BaseTags
are automatically created when you assign or read
elements using their keywords as illustrated in sections above.
The BaseTag
class is derived from int
,
so in effect it’s just a number with some extra behavior:
Tag()
is used to create instances ofBaseTag
and enforces the expected 4-byte (group, element) structure.A
BaseTag
instance can be created from anint
or atuple
containing the (group, element), or from the DICOM keyword:>>> from pydicom.tag import Tag >>> t1 = Tag(0x00100010) # all of these are equivalent >>> t2 = Tag(0x10, 0x10) >>> t3 = Tag((0x10, 0x10)) >>> t4 = Tag("PatientName") >>> t1 (0010,0010) >>> type(t1) <class `pydicom.tag.BaseTag`> >>> t1==t2, t1==t3, t1==t4 (True, True, True)
BaseTag.group
andBaseTag.elem
to return the group and element portions of the tag.The
BaseTag.is_private
property checks whether the tag represents a private tag (i.e. if group number is odd).
Sequence¶
Sequence
is derived from Python’s list
.
The only added functionality is to make string representations prettier.
Otherwise all the usual methods of list
like item selection, append,
etc. are available.
For examples of accessing data nested in sequences, see Working with sequences.