Core elements in pydicom¶
pydicom object model, description of classes, examples
Dataset¶
dataset.Dataset
is the main object you will work with
directly. Dataset is derived from Python’s dict
, so it inherits (and
overrides some of) the methods of dict
. In other words, it is a collection
of key:value pairs, where the key is the DICOM (group,element) tag (as a
Tag object, described below), and the value is a DataElement instance (also
described below).
A dataset could be created directly, but you will usually get one by reading an existing DICOM file:
>>> import pydicom
>>> from pydicom.data import get_testdata_files
>>> # get some test data
>>> filename = get_testdata_files("rtplan.dcm")[0]
>>> ds = pydicom.dcmread(filename)
You can display the entire dataset by simply printing its string (str or repr) value:
>>> ds
(0008, 0012) Instance Creation Date DA: '20030903'
(0008, 0013) Instance Creation Time TM: '150031'
(0008, 0016) SOP Class UID UI: RT Plan Storage
(0008, 0018) SOP Instance UID UI: 1.2.777.777.77.7.7777.7777.20030903150023
(0008, 0020) Study Date DA: '20030716'
(0008, 0030) Study Time TM: '153557'
(0008, 0050) Accession Number SH: ''
(0008, 0060) Modality CS: 'RTPLAN'
...
Note
You can also view DICOM files in a collapsible tree using the example program dcm_qt_tree.py.
You can access specific data elements by name (DICOM ‘keyword’) or by DICOM tag number:
>>> ds.PatientName
'Last^First^mid^pre'
>>> ds[0x10,0x10].value
'Last^First^mid^pre'
In the latter case (using the tag number directly) a DataElement instance is
returned, so the .value
must be used to get the value.
You can also set values by name (DICOM keyword) or tag number:
>>> ds.PatientID = "12345"
>>> ds.SeriesNumber = 5
>>> ds[0x10,0x10].value = 'Test'
The use of names is possible because pydicom intercepts requests for member variables, and checks if they are in the DICOM dictionary. It translates the keyword to a (group,element) number and returns the corresponding value for that key if it exists.
See Anonymize DICOM data for a usage example of data elements removal and assignation.
Note
To understand using sequence.Sequences
in pydicom, please refer to
this object model:
dataset.Dataset
(derived from Python’s dict
)
- —> contains DataElement instances
–> the value of the data element can be one of:
- a regular value like a number, string, etc.
- a list of regular values (e.g. a 3-D coordinate)
- a Sequence instance
–> a Sequence is a list of
dataset.Dataset
(and so we come full circle)
DICOM sequence.Sequences
are turned into Python list
s. Items in
the sequence are referenced by number, beginning at index 0 as per Python
convention:
>>> ds.BeamSequence[0].BeamName
'Field 1'
Using DICOM keywords is the recommended way to access data elements, but you can also use the tag numbers directly, such as:
>>> # Same thing with tag numbers:
>>> ds[0x300a,0xb0][0][0x300a,0xc2].value
'Field 1'
>>> # yet another way, using another variable
>>> beam1=ds[0x300a,0xb0][0]
>>> beam1.BeamName, beam1[0x300a,0xc2].value
('Field 1', 'Field 1')
If you don’t remember or know the exact tag name, dataset.Dataset
provides a handy dataset.Dataset.dir()
method, useful during interactive
sessions at the Python prompt:
>>> ds.dir("pat")
['PatientBirthDate', 'PatientID', 'PatientName', 'PatientSetupSequence', 'PatientSex']
dataset.Dataset.dir()
will return any DICOM tag names in the dataset that
have the specified string anywhere in the name (case insensitive).
Note
Calling dataset.Dataset.dir()
with no string will list all tag names
available in the dataset.
You can also see all the names that pydicom knows about by viewing the
_dicom_dict.py
file. You could modify that file to add tags that pydicom
doesn’t already know about.
Under the hood, dataset.Dataset
stores a DataElement object for each
item, but when accessed by name (e.g. ds.PatientName
) only the value
of
that dataelem.DataElement
is returned. If you need the whole
dataelem
(see the dataelem.DataElement
discussion), you can
use the dataset.Dataset.data_element()
method or access the item using
the tag number:
>>> # reload the data
>>> ds = pydicom.dcmread(filename)
>>> data_element = ds.data_element("PatientName")
>>> data_element.VR, data_element.value
('PN', 'Last^First^mid^pre')
>>> # an alternative is to use:
>>> data_element = ds[0x10,0x10]
>>> data_element.VR, data_element.value
('PN', 'Last^First^mid^pre')
To check for the existence of a particular tag before using it, use the in keyword:
>>> "PatientName" in ds
True
To remove a data element from the dataset, use dataset.Dataset.del()
:
>>> del ds.SoftwareVersions # or del ds[0x0018, 0x1020]
To work with pixel data, the raw bytes are available through the usual tag:
>>> # read data with actual pixel data
>>> filename = get_testdata_files("CT_small.dcm")[0]
>>> ds = pydicom.dcmread(filename)
>>> pixel_bytes = ds.PixelData
but to work with them in a more intelligent way, use pixel_array
(requires the NumPy library):
>>> pix = ds.pixel_array
>>> pix
array([[175, 180, 166, ..., 203, 207, 216],
[186, 183, 157, ..., 181, 190, 239],
[184, 180, 171, ..., 152, 164, 235],
...,
[906, 910, 923, ..., 922, 929, 927],
[914, 954, 938, ..., 942, 925, 905],
[959, 955, 916, ..., 911, 904, 909]], dtype=int16)
For more details, see Working with Pixel Data.
DataElement¶
The dataelem.DataElement
class is not usually used directly in user
code, but is used extensively by
dataset.Dataset
. dataelem.DataElement
is a simple object
which stores the following things:
- tag – a DICOM tag (as a Tag object)
- VR – DICOM value representation – various number and string formats, etc
- VM – value multiplicity. This is 1 for most DICOM tags, but can be multiple, e.g. for coordinates. You do not have to specify this, the DataElement class keeps track of it based on value.
- value – the actual value. A regular value like a number or string (or list of them), or a Sequence.
Tag¶
The Tag class is derived from Python’s int
, so in effect, it is just
a number with some extra behaviour:
Tag enforces that the DICOM tag fits in the expected 4-byte (group,element)
A Tag instance can be created from an int or from a tuple containing the (group,element) separately:
>>> from pydicom.tag import Tag >>> t1=Tag(0x00100010) # all of these are equivalent >>> t2=Tag(0x10,0x10) >>> t3=Tag((0x10, 0x10)) >>> t1 (0010, 0010) >>> t1==t2, t1==t3 (True, True)Tag has properties group and element (or elem) to return the group and element portions
The
is_private
property checks whether the tag represents a private tag (i.e. if group number is odd).
Sequence¶
Sequence is derived from Python’s list
. The only added functionality is
to make string representations prettier. Otherwise all the usual methods of
list
like item selection, append, etc. are available.