API

Public

Todo

load / save from FileIO.jl. #26

Private

ASDF.ASDFFileType
ASDFFile

The in-memory representation of a loaded ASDF file, combining the parsed YAML metadata tree with the binary block infrastructure needed to lazily materialize any array data it references.

Fields

FieldDescription
filenamePath of the file on disk, as passed to ASDF.load_file. Used for display and diagnostics only. The file is not kept open after loading.
metadataThe fully parsed YAML tree. Keys are strings matching the top-level YAML keys. Values may be any Julia type produced by the YAML constructors, including ASDF.NDArray, ASDF.ChunkedNDArray, nested Dicts, Vectors, and scalars.
lazy_block_headersAll binary block headers found in the file, scanned once at load time. Shared by reference with every ASDF.NDArray in metadata, allowing them to locate and read their backing blocks on demand.
source
ASDF.ASDFLibraryType
ASDFLibrary

Software provenance metadata, serialized as a !core/software-1.0.0 YAML tag. [ASDF.write_file] inserts an entry automatically under the key "asdf/library" if one is not already present, using the package's own name, author, homepage, and version.

source
ASDF.BlockHeaderType
BlockHeader

Parsed representation of a single ASDF binary block header.

Every binary block in an ASDF file begins with a fixed-layout header that describes the block's compression, size, and integrity checksum. BlockHeader captures all decoded fields from that header, together with the IO handle and file position needed to subsequently read the block's data payload.

Fields

FieldDescription
ioThe open file handle from which the block's data can be read.
positionAbsolute byte offset of the block magic token within io.
token4-byte magic token. Always equal to block_magic_token (\323BLK).
header_sizeSize of the extended header region in bytes (excludes the 6-byte prefix).
flagsBlock flags. Bit 0 (0x1) indicates a streamed block (not currently supported).
compression4-byte compression key (e.g. "zstd", "bzp2").
allocated_sizeNumber of bytes allocated in the file for this block's data (≥ used_size).
used_sizeNumber of bytes of (compressed) data actually written.
data_sizeNumber of bytes of the uncompressed data.
checksum16-byte MD5 digest of the compressed data, or all zeros if omitted.
validate_checksumWhen true, ASDF.read_block verifies the MD5 digest before returning data.

File layout

The block occupies the following byte range within io, starting at position:

[position]4 bytesmagic token
[position + 4]2 bytesheader_size (big-endian UInt16)
[position + 6]header_size bytesextended header fields
[position + 6 + header_size]allocated_size bytesdata payload

The extended header (always 48 bytes in the current implementation) contains, in order: flags (4 B), compression (4 B), allocated_size (8 B), used_size (8 B), data_size (8 B), and checksum (16 B).

All multi-byte integers in the header are stored in big-endian byte order.

source
ASDF.BlocksType
Blocks

Module-level accumulator that collects ASDF.NDArrayWrapper values and their corresponding file positions during a single call to ASDF.write_file.

Blocks acts as a two-phase write buffer. In the first phase, as the YAML tree is serialized, each non-inline ASDF.NDArrayWrapper appends itself to arrays and reserves a sequential source index. In the second phase, ASDF.write_file iterates over arrays, compresses and writes each block to disk, and records the resulting file offsets in positions. The finalized positions vector is then written as the ASDF block index at the end of the file.

Not thread-safe

A single instance of Blocks is held in the module-level constant ASDF.blocks. Because this global state is mutated by ASDF.write_file, concurrent calls to write_file from multiple threads will corrupt each other's block lists. Do not call write_file concurrently.

Fields

FieldDescription
arraysOrdered list of arrays to be written as binary blocks, accumulated during YAML serialisation. The position of each wrapper in this vector is its zero-based block source index.
positionsAbsolute byte offsets of each written block within the output file, populated during the block-writing phase of ASDF.write_file. positions[i] corresponds to arrays[i].
source
ASDF.ByteorderType
ASDF.Byteorder

Represents the byte order of array data stored in a block. Available variants:

  • Byteorder_little : Little-endian
  • Byteorder_big: Big-endian
source
ASDF.ChunkedNDArrayType
ChunkedNDArray

A logical N-dimensional array assembled from a collection of arbitrarily-positioned ASDF.NDArrayChunk tiles, each backed by its own binary block or inline data.

ChunkedNDArray is the in-memory representation of a !core/chunked-ndarray-1.X.Y YAML node. It defines the shape and element type of the full logical array, but defers all data access to the individual chunks. The full array is only allocated and populated when Base.getindex(ndarray::NDArray) is called.

Fields

FieldDescription
shapeDimensions of the full logical array in Python/C (row-major) order, outermost dimension first. All elements must be non-negative. The equivalent Julia shape is reverse(shape).
datatypeElement type shared by all chunks. Convert to a Julia type with Type(datatype).
chunksOrdered collection of tiles that together populate the logical array. Tiles are written in iteration order when materializing. Later tiles overwrite earlier ones in any overlapping region.
source
ASDF.CompressionType

Identifies the compression algorithm used for a data block. Available variants:

Scheme4-byte keyBackendDescription
C_None\0\0\0\0Fast I/O speed, no CPU overhead
C_BloscblscChunkCodecLibBlosc.jlMulti-threaded, shuffle-aware, best with typed numeric arrays
C_Blosc2bls2See Issue #49Like Blosc but supports more than 2 GB of data
C_Bzip2bzp2ChunkCodecLibBzip2.jlGood ratio, moderate speed (default)
C_Lz4 (:block)lz4\0ChunkCodecLibLz4.jlFastest decompression, Python-compatible
C_Lz4 (:frame)lz4\0ChunkCodecLibLz4.jlLZ4 frame format for non-Python consumers
C_Xzxz\0\0CodecXz.jlHighest compression ratio, slowest
C_ZlibzlibChunkCodecLibZlib.jlBroad compatibility
C_ZstdzstdChunkCodecLibZstd.jlBest ratio/speed trade-off
source
ASDF.DatatypeType

Maps ASDF datatype strings to Julia types. Note this is unrelated to Base.DataType. Defined mappings:

ASDF stringJulia type
bool8Bool
int8 ... int128Int8 ... Int128
uint8 ... uint128UInt8 ... UInt128
float16 ... float64Float16 ... Float64
complex32Complex{Float16}
complex64Complex{Float32}
complex128Complex{Float64}
source
ASDF.LazyBlockHeadersType
LazyBlockHeaders

A mutable container holding the complete list of ASDF.BlockHeader values scanned from an ASDF file, shared by reference with every ASDF.NDArray and ASDF.ChunkedNDArray constructed during parsing.

Reference sharing

Every ASDF.NDArray created during parsing of a given file holds a reference to the same LazyBlockHeaders instance. When ndarray[] is called, it indexes into lazy_block_headers.block_headers using the array's zero-based source field:

header = ndarray.lazy_block_headers.block_headers[ndarray.source + 1]
data   = ASDF.read_block(header)

Because block_headers is populated after all NDArray objects are constructed, no array needs to be updated individually when block scanning completes. The shared mutable reference propagates the result automatically.

Mutability

LazyBlockHeaders is a mutable struct solely to allow block_headers to be assigned after construction. The field is written exactly once per file load, immediately after YAML.load within ASDF.load_file returns. It is never modified again during normal use. Treat it as effectively immutable after load_file returns.

source
ASDF.NDArrayType
NDArray

A lazily-materialized N-dimensional array stored in an ASDF file, either as a binary block or inline within the ASDF file.

NDArray is the in-memory representation of an !core/ndarray-1.0.0 YAML node. It holds the array's shape, type, and layout metadata, but defers reading and decompressing block data until the array is explicitly materialized by calling Base.getindex(ndarray::NDArray).

Fields

FieldDescription
lazy_block_headersReference to the file's block header list. Used to resolve source indices at materialization time.
sourceZero-based index of the backing binary block, or nothing for inline arrays.
dataIn-memory array for inline data, or nothing for block-backed arrays.
shapeArray dimensions in Python/C (row-major) order — outermost dimension first. The equivalent Julia shape is reverse(shape).
datatypeElement type, as an ASDF.Datatype enum value. Convert to a Julia type with Type(datatype).
byteorderByte order of the stored data (Byteorder_little or Byteorder_big).
offsetByte offset from the start of the block to the first array element. Non-negative.
stridesByte strides in Python/C (row-major) order. Must all be positive and length(strides) == length(shape).
source
ASDF.NDArrayChunkType
NDArrayChunk

A positioned tile within an ASDF.ChunkedNDArray, pairing an ASDF.NDArray with a zero-based origin that locates the tile in the parent array's coordinate space.

Fields

FieldDescription
startZero-based origin of the tile in Python/C (row-major) order, outermost dimension first. All elements must be non-negative. length(start) must equal length(ndarray.strides).
ndarrayThe tile data, including its own shape, datatype, byte order, and backing block reference.
source
ASDF.NDArrayWrapperType
NDArrayWrapper

A write-side wrapper around a Julia array that carries compression and layout options. Used as the value type when building a document dict for ASDF.write_file.

ParameterDefaultDescription
compressionC_Bzip2Applied compression scheme
inlinefalseEmbed data in YAML instead of a binary block
lz4_layout:block:block for Python-compatible chunked LZ4, :frame for LZ4 frame format
Note

If the compressed output is larger than the raw input, the block is stored uncompressed regardless of the chosen compression.

source
ASDF.load_fileMethod
load_file(filename::AbstractString; extensions = false, validate_checksum = true)

Reads an ASDF file from disk.

ParameterDescription
filenamePath to the .asdf file
extensionsWhen true, unknown YAML tags are parsed leniently (as maps, sequences, or scalars) instead of raising an error
validate_checksumWhen true, each block's MD5 checksum is verified against the stored value

Block data is located lazily. Block headers are scanned after the YAML is parsed, and array data (ndarray) is read only when Base.getindex(ndarray::NDArray) is called, i.e., ndarray[].

File handle lifetime

The file handle opened by load_file is retained for the lifetime of the returned ASDF.ASDFFile so that block data can be read on demand. Do not move, truncate, or overwrite the source file while any ASDF.NDArray from it may still be accessed.

source
ASDF.write_fileMethod
write_file(filename::AbstractString, document::Dict{Any,Any})

Writes an ASDF file to disk. document is a plain Dict whose values may include NDArrayWrapper instances. These are serialized as binary blocks with appropriate compression.

Layout of the output file:

  1. ASDF/YAML header (#ASDF 1.0.0, #ASDF_STANDARD 1.2.0, %YAML 1.1)
  2. YAML tree (!core/asdf-1.1.0)
  3. Binary blocks — one per NDArrayWrapper that has inline == false
  4. Block index (#ASDF BLOCK INDEX)
source
Base.getindexMethod
Base.getindex(chunked_ndarray::ChunkedNDArray)

Allocates a dense array of shape reverse(shape) and fills it by calling chunk.ndarray[] for each chunk, placing the result at the correct offset.

source
Base.getindexMethod
Base.getindex(ndarray::NDArray)

Returns the fully materialized array. See ASDF.NDArray for definitions. When block-backed (source !== nothing), this reads and decompresses the block, applies offset and strides via a StridedViews.StridedView, reinterprets bytes to Type(datatype), and byte-swaps if byteorder != host_byteorder. The returned array satisfies:

size(result) == Tuple(reverse(ndarray.shape))
eltype(result) == Type(ndarray.datatype)
sizeof(eltype) .* strides(result) == Tuple(reverse(ndarray.strides))
source