pydicom.dataset.Dataset¶
-
class
pydicom.dataset.
Dataset
(*args: Dict[pydicom.tag.BaseTag, Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement]], **kwargs: object)[source]¶ Contains a collection (dictionary) of DICOM Data Elements.
Behaves like a
dict
.Note
Dataset
is only derived fromdict
to make it work in a NumPyndarray
. The parentdict
class is never called, as alldict
methods are overridden.Examples
Add an element to the
Dataset
(for elements in the DICOM dictionary):>>> ds = Dataset() >>> ds.PatientName = "CITIZEN^Joan" >>> ds.add_new(0x00100020, 'LO', '12345') >>> ds[0x0010, 0x0030] = DataElement(0x00100030, 'DA', '20010101')
Add a sequence element to the
Dataset
>>> ds.BeamSequence = [Dataset(), Dataset(), Dataset()] >>> ds.BeamSequence[0].Manufacturer = "Linac, co." >>> ds.BeamSequence[1].Manufacturer = "Linac and Sons, co." >>> ds.BeamSequence[2].Manufacturer = "Linac and Daughters, co."
Add private elements to the
Dataset
>>> block = ds.private_block(0x0041, 'My Creator', create=True) >>> block.add_new(0x01, 'LO', '12345')
Updating and retrieving element values:
>>> ds.PatientName = "CITIZEN^Joan" >>> ds.PatientName 'CITIZEN^Joan' >>> ds.PatientName = "CITIZEN^John" >>> ds.PatientName 'CITIZEN^John'
Retrieving an element’s value from a Sequence:
>>> ds.BeamSequence[0].Manufacturer 'Linac, co.' >>> ds.BeamSequence[1].Manufacturer 'Linac and Sons, co.'
Accessing the
DataElement
items:>>> elem = ds['PatientName'] >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John' >>> elem = ds[0x00100010] >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John' >>> elem = ds.data_element('PatientName') >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Accessing a private
DataElement
item:>>> block = ds.private_block(0x0041, 'My Creator') >>> elem = block[0x01] >>> elem (0041, 1001) Private tag data LO: '12345' >>> elem.value '12345'
Alternatively:
>>> ds.get_private_item(0x0041, 0x01, 'My Creator').value '12345'
Deleting an element from the
Dataset
>>> del ds.PatientID >>> del ds.BeamSequence[1].Manufacturer >>> del ds.BeamSequence[2]
Deleting a private element from the
Dataset
>>> block = ds.private_block(0x0041, 'My Creator') >>> if 0x01 in block: ... del block[0x01]
Determining if an element is present in the
Dataset
>>> 'PatientName' in ds True >>> 'PatientID' in ds False >>> (0x0010, 0x0030) in ds True >>> 'Manufacturer' in ds.BeamSequence[0] True
Iterating through the top level of a
Dataset
only (excluding Sequences):>>> for elem in ds: ... print(elem) (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Iterating through the entire
Dataset
(including Sequences):>>> for elem in ds.iterall(): ... print(elem) (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Recursively iterate through a
Dataset
(including Sequences):>>> def recurse(ds): ... for elem in ds: ... if elem.VR == 'SQ': ... [recurse(item) for item in elem] ... else: ... # Do something useful with each DataElement
Converting the
Dataset
to and from JSON:>>> ds = Dataset() >>> ds.PatientName = "Some^Name" >>> jsonmodel = ds.to_json() >>> ds2 = Dataset() >>> ds2.from_json(jsonmodel) (0010, 0010) Patient's Name PN: 'Some^Name'
-
indent_chars
¶ For string display, the characters used to indent nested Sequences. Default is
" "
.- Type
-
is_little_endian
¶ Shall be set before writing with
write_like_original=False
. TheDataset
(excluding the pixel data) will be written using the given endianess.- Type
-
is_implicit_VR
¶ Shall be set before writing with
write_like_original=False
. TheDataset
will be written using the transfer syntax with the given VR handling, e.g Little Endian Implicit VR ifTrue
, and Little Endian Explicit VR or Big Endian Explicit VR (depending onDataset.is_little_endian
) ifFalse
.- Type
-
__init__
(*args: Dict[pydicom.tag.BaseTag, Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement]], **kwargs: object) → None[source]¶ Create a new
Dataset
instance.
Methods
__init__
(*args, **kwargs)Create a new
Dataset
instance.add
(data_element)Add an element to the
Dataset
.add_new
(tag, VR, value)Create a new element and add it to the
Dataset
.clear
()Delete all the elements from the
Dataset
.convert_pixel_data
([handler_name])Convert pixel data to a
numpy.ndarray
internally.copy
()Return a shallow copy of the dataset.
data_element
(name)Return the element corresponding to the element keyword name.
decode
()Apply character set decoding to the elements in the
Dataset
.decompress
([handler_name])Decompresses Pixel Data and modifies the
Dataset
in-place.dir
(*filters)Return an alphabetical list of element keywords in the
Dataset
.elements
()Yield the top-level elements of the
Dataset
.Create an empty
Dataset.file_meta
if none exists.fix_meta_info
([enforce_standard])Ensure the file meta info exists and has the correct values for transfer syntax and media storage UIDs.
formatted_lines
([element_format, …])Iterate through the
Dataset
yielding formattedstr
for each element.from_json
(json_dataset[, bulk_data_uri_handler])Add elements to the
Dataset
from DICOM JSON format.fromkeys
([value])Create a new dictionary with keys from iterable and values set to value.
get
()Simulate
dict.get()
to handle element tags and keywords.get_item
()Return the raw data element if possible.
get_private_item
(group, element_offset, …)Return the data element for the given private tag group.
group_dataset
(group)Return a
Dataset
containing only elements of a certain group.items
()Return the
Dataset
items to simulatedict.items()
.iterall
()Iterate through the
Dataset
, yielding all the elements.keys
()Return the
Dataset
keys to simulatedict.keys()
.overlay_array
(group)Return the Overlay Data in group as a
numpy.ndarray
.pop
(key, *args)Emulate
dict.pop()
with support for tags and keywords.popitem
()Emulate
dict.popitem()
.private_block
(group, private_creator[, create])Return the block for the given tag group and private_creator.
private_creators
(group)Return a list of private creator names in the given group.
Remove all private elements from the
Dataset
.save_as
(filename[, write_like_original])Write the
Dataset
to filename.set_original_encoding
(is_implicit_vr, …)Set the values for the original transfer syntax and encoding.
setdefault
(key[, default])Emulate
dict.setdefault()
with support for tags and keywords.to_json
([bulk_data_threshold, …])Return a JSON representation of the
Dataset
.to_json_dict
([bulk_data_threshold, …])Return a dictionary representation of the
Dataset
conforming to the DICOM JSON Model as described in the DICOM Standard, Part 18, Annex F.top
()Return a
str
representation of the top level elements.Return a
list
of valid names for auto-completion code.update
(dictionary)Extend
dict.update()
to handle DICOM tags and keywords.values
()Return the
Dataset
values to simulatedict.values()
.walk
(callback[, recursive])Iterate through the
Dataset's
elements and run callback on each.waveform_array
(index)Return an
ndarray
for the multiplex group at index in the (5400,0100) Waveform Sequence.Attributes
Return
True
if the encoding to be used for writing is set and is the same as that used to originally encode theDataset
.Return the pixel data as a
numpy.ndarray
.-
add
(data_element: pydicom.dataelem.DataElement) → None[source]¶ Add an element to the
Dataset
.Equivalent to
ds[data_element.tag] = data_element
- Parameters
data_element (dataelem.DataElement) – The
DataElement
to add.
-
add_new
(tag: Union[int, str, Tuple[int, int], BaseTag], VR: str, value: object) → None[source]¶ Create a new element and add it to the
Dataset
.- Parameters
tag – The DICOM (group, element) tag in any form accepted by
Tag()
such as[0x0010, 0x0010]
,(0x10, 0x10)
,0x00100010
, etc.VR (str) – The 2 character DICOM value representation (see DICOM Standard, Part 5, Section 6.2).
value –
The value of the data element. One of the following:
-
convert_pixel_data
(handler_name: str = '') → None[source]¶ Convert pixel data to a
numpy.ndarray
internally.- Parameters
handler_name (str, optional) – The name of the pixel handler that shall be used to decode the data. Supported names are:
'gdcm'
,'pillow'
,'jpeg_ls'
,'rle'
,'numpy'
and'pylibjpeg'
. If not used (the default), a matching handler is used from the handlers configured inpixel_data_handlers
.- Returns
Converted pixel data is stored internally in the dataset.
- Return type
- Raises
ValueError – If handler_name is not a valid handler name.
NotImplementedError – If the given handler or any handler, if none given, is unable to decompress pixel data with the current transfer syntax
RuntimeError – If the given handler, or the handler that has been selected if none given, is not available.
Notes
If the pixel data is in a compressed image format, the data is decompressed and any related data elements are changed accordingly.
-
copy
() → pydicom.dataset.Dataset[source]¶ Return a shallow copy of the dataset.
-
data_element
(name: str) → Optional[pydicom.dataelem.DataElement][source]¶ Return the element corresponding to the element keyword name.
- Parameters
name (str) – A DICOM element keyword.
- Returns
For the given DICOM element keyword, return the corresponding
DataElement
if present,None
otherwise.- Return type
-
decode
() → None[source]¶ Apply character set decoding to the elements in the
Dataset
.See DICOM Standard, Part 5, Section 6.1.1.
-
decompress
(handler_name: str = '') → None[source]¶ Decompresses Pixel Data and modifies the
Dataset
in-place.New in version 1.4: The handler_name keyword argument was added
If not a compressed transfer syntax, then pixel data is converted to a
numpy.ndarray
internally, but not returned.If compressed pixel data, then is decompressed using an image handler, and internal state is updated appropriately:
Dataset.file_meta.TransferSyntaxUID
is updated to non-compressed formis_undefined_length
isFalse
for the (7FE0,0010) Pixel Data element.
Changed in version 1.4: The handler_name keyword argument was added
- Parameters
handler_name (str, optional) – The name of the pixel handler that shall be used to decode the data. Supported names are:
'gdcm'
,'pillow'
,'jpeg_ls'
,'rle'
and'numpy'
. If not used (the default), a matching handler is used from the handlers configured inpixel_data_handlers
.- Returns
- Return type
- Raises
NotImplementedError – If the pixel data was originally compressed but file is not Explicit VR Little Endian as required by the DICOM Standard.
-
dir
(*filters: str) → List[str][source]¶ Return an alphabetical list of element keywords in the
Dataset
.Intended mainly for use in interactive Python sessions. Only lists the element keywords in the current level of the
Dataset
(i.e. the contents of any sequence elements are ignored).- Parameters
filters (str) – Zero or more string arguments to the function. Used for case-insensitive match to any part of the DICOM keyword.
- Returns
The matching element keywords in the dataset. If no filters are used then all element keywords are returned.
- Return type
list of str
-
elements
() → Iterator[pydicom.dataelem.DataElement][source]¶ Yield the top-level elements of the
Dataset
.New in version 1.1.
Examples
>>> ds = Dataset() >>> for elem in ds.elements(): ... print(elem)
The elements are returned in the same way as in
Dataset.__getitem__()
.- Yields
dataelem.DataElement or dataelem.RawDataElement – The unconverted elements sorted by increasing tag order.
-
ensure_file_meta
() → None[source]¶ Create an empty
Dataset.file_meta
if none exists.New in version 1.2.
-
fix_meta_info
(enforce_standard: bool = True) → None[source]¶ Ensure the file meta info exists and has the correct values for transfer syntax and media storage UIDs.
New in version 1.2.
Warning
The transfer syntax for
is_implicit_VR = False
andis_little_endian = True
is ambiguous and will therefore not be set.- Parameters
enforce_standard (bool, optional) – If
True
, a check for incorrect and missing elements is performed (seevalidate_file_meta()
).
-
formatted_lines
(element_format: str = '%(tag)s %(name)-35.35s %(VR)s: %(repval)s', sequence_element_format: str = '%(tag)s %(name)-35.35s %(VR)s: %(repval)s', indent_format: Optional[str] = None) → Iterator[str][source]¶ Iterate through the
Dataset
yielding formattedstr
for each element.- Parameters
element_format (str) – The string format to use for non-sequence elements. Formatting uses the attributes of
DataElement
. Default is"%(tag)s %(name)-35.35s %(VR)s: %(repval)s"
.sequence_element_format (str) – The string format to use for sequence elements. Formatting uses the attributes of
DataElement
. Default is"%(tag)s %(name)-35.35s %(VR)s: %(repval)s"
indent_format (str or None) – Placeholder for future functionality.
- Yields
str – A string representation of an element.
-
classmethod
from_json
(json_dataset: Union[Dict[str, bytes], str], bulk_data_uri_handler: Optional[Union[Callable[[pydicom.tag.BaseTag, str, str], object], Callable[[str], object]]] = None) → _Dataset[source]¶ Add elements to the
Dataset
from DICOM JSON format.New in version 1.3.
See the DICOM Standard, Part 18, Annex F.
- Parameters
json_dataset (dict or str) –
dict
orstr
representing a DICOM Data Set formatted based on the DICOM JSON Model.bulk_data_uri_handler (callable, optional) – Callable function that accepts either the tag, vr and “BulkDataURI” or just the “BulkDataURI” of the JSON representation of a data element and returns the actual value of data element (retrieved via DICOMweb WADO-RS).
- Returns
- Return type
-
get
(key: str, default: Optional[object] = 'None') → object[source]¶ -
get
(key: Union[int, Tuple[int, int], pydicom.tag.BaseTag], default: Optional[object] = 'None') → pydicom.dataelem.DataElement Simulate
dict.get()
to handle element tags and keywords.- Parameters
- Returns
value – If key is the keyword for an element in the
Dataset
then return the element’s value.dataelem.DataElement – If key is a tag for a element in the
Dataset
then return theDataElement
instance.value – If key is a class attribute then return its value.
-
get_item
(key: slice) → Dataset[source]¶ -
get_item
(key: Union[int, str, Tuple[int, int], BaseTag]) → pydicom.dataelem.DataElement Return the raw data element if possible.
It will be raw if the user has never accessed the value, or set their own value. Note if the data element is a deferred-read element, then it is read and converted before being returned.
-
get_private_item
(group: int, element_offset: int, private_creator: str) → pydicom.dataelem.DataElement[source]¶ Return the data element for the given private tag group.
New in version 1.3.
This is analogous to
Dataset.__getitem__()
, but only for private tags. This allows to find the private tag for the correct private creator without the need to add the tag to the private dictionary first.- Parameters
- Returns
The corresponding element.
- Return type
- Raises
ValueError – If group is not part of a private tag or private_creator is empty.
KeyError – If the private creator tag is not found in the given group. If the private tag is not found.
-
group_dataset
(group: int) → pydicom.dataset.Dataset[source]¶ Return a
Dataset
containing only elements of a certain group.
-
property
is_original_encoding
¶ Return
True
if the encoding to be used for writing is set and is the same as that used to originally encode theDataset
.New in version 1.1.
This includes properties related to endianess, VR handling and the (0008,0005) Specific Character Set.
-
items
() → ItemsView[pydicom.tag.BaseTag, Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement]][source]¶ Return the
Dataset
items to simulatedict.items()
.- Returns
The top-level (
BaseTag
,DataElement
) items for theDataset
.- Return type
dict_items
-
iterall
() → Iterator[pydicom.dataelem.DataElement][source]¶ Iterate through the
Dataset
, yielding all the elements.Unlike
iter(Dataset)
, this does recurse into sequences, and so yields all elements as if dataset were “flattened”.- Yields
dataelem.DataElement
-
keys
() → KeysView[pydicom.tag.BaseTag][source]¶ Return the
Dataset
keys to simulatedict.keys()
.
-
overlay_array
(group: int) → np.ndarray[source]¶ Return the Overlay Data in group as a
numpy.ndarray
.New in version 1.4.
- Parameters
group (int) – The group number of the overlay data.
- Returns
The (group,3000) Overlay Data converted to a
numpy.ndarray
.- Return type
-
property
pixel_array
¶ Return the pixel data as a
numpy.ndarray
.Changed in version 1.4: Added support for Float Pixel Data and Double Float Pixel Data
- Returns
The (7FE0,0008) Float Pixel Data, (7FE0,0009) Double Float Pixel Data or (7FE0,0010) Pixel Data converted to a
numpy.ndarray
.- Return type
-
pop
(key: Union[int, str, Tuple[int, int], BaseTag], *args: object) → Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement][source]¶ Emulate
dict.pop()
with support for tags and keywords.Removes the element for key if it exists and returns it, otherwise returns a default value if given or raises
KeyError
.- Parameters
*args (zero or one argument) – Defines the behavior if no tag exists for key: if given, it defines the return value, if not given,
KeyError
is raised
- Returns
- Return type
The element for key if it exists, or the default value if given.
- Raises
KeyError – If the key is not a valid tag or keyword. If the tag does not exist and no default is given.
-
popitem
() → Tuple[pydicom.tag.BaseTag, Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement]][source]¶ Emulate
dict.popitem()
.- Returns
- Return type
tuple of (BaseTag, DataElement)
-
private_block
(group: int, private_creator: str, create: bool = False) → pydicom.dataset.PrivateBlock[source]¶ Return the block for the given tag group and private_creator.
New in version 1.3.
If create is
True
and the private_creator does not exist, the private creator tag is added.Notes
We ignore the unrealistic case that no free block is available.
- Parameters
group (int) – The group of the private tag to be found as a 32-bit
int
. Must be an odd number (e.g. a private group).private_creator (str) – The private creator string associated with the tag.
create (bool, optional) – If
True
and private_creator does not exist, a new private creator tag is added at the next free block. IfFalse
(the default) and private_creator does not exist,KeyError
is raised instead.
- Returns
The existing or newly created private block.
- Return type
- Raises
ValueError – If group doesn’t belong to a private tag or private_creator is empty.
KeyError – If the private creator tag is not found in the given group and the create parameter is
False
.
-
private_creators
(group: int) → List[str][source]¶ Return a list of private creator names in the given group.
New in version 1.3.
Examples
This can be used to check if a given private creator exists in the group of the dataset:
>>> ds = Dataset() >>> if 'My Creator' in ds.private_creators(0x0041): ... block = ds.private_block(0x0041, 'My Creator')
- Parameters
group (int) – The private group as a 32-bit
int
. Must be an odd number.- Returns
All private creator names for private blocks in the group.
- Return type
list of str
- Raises
ValueError – If group is not a private group.
Remove all private elements from the
Dataset
.
-
save_as
(filename: Union[str, os.PathLike[AnyStr], BinaryIO], write_like_original: bool = True) → None[source]¶ Write the
Dataset
to filename.Wrapper for pydicom.filewriter.dcmwrite, passing this dataset to it. See documentation for that function for details.
See also
pydicom.filewriter.dcmwrite
Write a DICOM file from a
FileDataset
instance.
-
set_original_encoding
(is_implicit_vr: Optional[bool], is_little_endian: Optional[bool], character_encoding: Optional[str]) → None[source]¶ Set the values for the original transfer syntax and encoding.
New in version 1.2.
Can be used for a
Dataset
with raw data elements to enable optimized writing (e.g. without decoding the data elements).
-
setdefault
(key: Union[int, str, Tuple[int, int], BaseTag], default: Optional[object] = None) → pydicom.dataelem.DataElement[source]¶ Emulate
dict.setdefault()
with support for tags and keywords.Examples
>>> ds = Dataset() >>> elem = ds.setdefault((0x0010, 0x0010), "Test") >>> elem (0010, 0010) Patient's Name PN: 'Test' >>> elem.value 'Test' >>> elem = ds.setdefault('PatientSex', ... DataElement(0x00100040, 'CS', 'F')) >>> elem.value 'F'
- Parameters
default (pydicom.dataelem.DataElement or object, optional) – The
DataElement
to use with key, or the value of theDataElement
to use with key (defaultNone
).
- Returns
The
DataElement
for key.- Return type
- Raises
ValueError – If key is not convertible to a valid tag or a known element keyword.
KeyError – If
enforce_valid_values
isTrue
and key is an unknown non-private tag.
-
to_json
(bulk_data_threshold: int = 1024, bulk_data_element_handler: Optional[Callable[[pydicom.dataelem.DataElement], str]] = None, dump_handler: Optional[Callable[[Dataset], str]] = None) → str[source]¶ Return a JSON representation of the
Dataset
.New in version 1.3.
See the DICOM Standard, Part 18, Annex F.
- Parameters
bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element above which the element should be considered bulk data and the value provided as a URI rather than included inline (default:
1024
). Ignored if no bulk data handler is given.bulk_data_element_handler (callable, optional) – Callable function that accepts a bulk data element and returns a JSON representation of the data element (dictionary including the “vr” key and either the “InlineBinary” or the “BulkDataURI” key).
dump_handler (callable, optional) –
Callable function that accepts a
dict
and returns the serialized (dumped) JSON string (by default usesjson.dumps()
).
- Returns
Dataset
serialized into a string based on the DICOM JSON Model.- Return type
Examples
>>> def my_json_dumps(data): ... return json.dumps(data, indent=4, sort_keys=True) >>> ds.to_json(dump_handler=my_json_dumps)
-
to_json_dict
(bulk_data_threshold: int = 1024, bulk_data_element_handler: Optional[Callable[[pydicom.dataelem.DataElement], str]] = None) → _Dataset[source]¶ Return a dictionary representation of the
Dataset
conforming to the DICOM JSON Model as described in the DICOM Standard, Part 18, Annex F.New in version 1.4.
- Parameters
bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element above which the element should be considered bulk data and the value provided as a URI rather than included inline (default:
1024
). Ignored if no bulk data handler is given.bulk_data_element_handler (callable, optional) – Callable function that accepts a bulk data element and returns a JSON representation of the data element (dictionary including the “vr” key and either the “InlineBinary” or the “BulkDataURI” key).
- Returns
Dataset
representation based on the DICOM JSON Model.- Return type
-
trait_names
() → List[str][source]¶ Return a
list
of valid names for auto-completion code.Used in IPython, so that data element names can be found and offered for autocompletion on the IPython command line.
-
update
(dictionary: Union[Dict[str, object], Dict[Union[int, str, Tuple[int, int], BaseTag], pydicom.dataelem.DataElement]]) → None[source]¶ Extend
dict.update()
to handle DICOM tags and keywords.
-
values
() → ValuesView[Union[pydicom.dataelem.DataElement, pydicom.dataelem.RawDataElement]][source]¶ Return the
Dataset
values to simulatedict.values()
.- Returns
The
DataElements
that make up the values of theDataset
.- Return type
dict_keys
-
walk
(callback: Callable[[Dataset, pydicom.dataelem.DataElement], None], recursive: bool = True) → None[source]¶ Iterate through the
Dataset's
elements and run callback on each.Visit all elements in the
Dataset
, possibly recursing into sequences and their items. The callback function is called for eachDataElement
(including elements with a VR of ‘SQ’). Can be used to perform an operation on certain types of elements.For example,
remove_private_tags()
finds all elements with private tags and deletes them.The elements will be returned in order of increasing tag number within their current
Dataset
.- Parameters
callback –
A callable function that takes two arguments:
a
Dataset
a
DataElement
belonging to thatDataset
recursive (bool, optional) – Flag to indicate whether to recurse into sequences (default
True
).
-
waveform_array
(index: int) → np.ndarray[source]¶ Return an
ndarray
for the multiplex group at index in the (5400,0100) Waveform Sequence.New in version 2.1.
- Parameters
index (int) – The index of the multiplex group to return the array for.
- Returns
The Waveform Data for the multiplex group as an
ndarray
with shape (samples, channels). If (003A,0210) Channel Sensitivity is present then the values will be in the units specified by the (003A,0211) Channel Sensitivity Units Sequence.- Return type
See also
-