Introduction

Adapted from ADASS 2024 workshop.

The ASDF file format is based on the human-readable YAML standard, extended with efficient binary blocks to store array data. Basic arithmetic types (Bool, Int, Float, Complex) and String types are supported out of the box. Other types (structures) need to be declared to be supported.

ASDF supports arbitrary array strides, both C (Python) and Fortran (Julia) memory layouts, as well as compression. The YAML metadata can contain arbitrary information corresponding to scalars, arrays, or dictionaries.

The ASDF file format targets a similar audience as the HDF5 format.

Getting started

ASDF files are initially created as a nested dictionary with your specified keys:

af_payload = Dict{Any, Any}( # To-do: see if type signature needs to be this general
    "meta" => Dict("my" => Dict("nested" => "metadata")),
    "data" => [1, 2, 3, 4],
)
Dict{Any, Any} with 2 entries:
  "meta" => Dict("my"=>Dict("nested"=>"metadata"))
  "data" => [1, 2, 3, 4]

Next, this dictionary can be written to the ASDF file format with ASDF.write_file:

using ASDF

data_dir = joinpath("..", "data")
mkpath(data_dir)
fpath = joinpath(data_dir, "my_file.asdf")

ASDF.write_file(fpath, af_payload)

which contains the following file contents:

#ASDF 1.0.0
#ASDF_STANDARD 1.2.0
# This is an ASDF file <https://asdf-standard.readthedocs.io/>
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
---
!core/asdf-1.1.0
meta:
  my:
    nested: "metadata"
data:
  - 1
  - 2
  - 3
  - 4
asdf/library: !core/software-1.0.0
  version: "2.0.0"
  name: "ASDF.jl"
  author: "Erik Schnetter <schnetter@gmail.com>"
  homepage: "https://github.com/JuliaAstro/ASDF.jl"
...
#ASDF BLOCK INDEX
%YAML 1.1
---
[]
...

This file can be loaded back with ASDF.load_file:

af = ASDF.load_file(fpath)
ASDF.ASDFFile("../data/my_file.asdf", Dict{Any, Any}("meta" => Dict{Any, Any}("my" => Dict{Any, Any}("nested" => "metadata")), "data" => [1, 2, 3, 4], "asdf/library" => Dict{Any, Any}("name" => "ASDF.jl", "author" => "Erik Schnetter <schnetter@gmail.com>", "homepage" => "https://github.com/JuliaAstro/ASDF.jl", "version" => "2.0.0")), ASDF.LazyBlockHeaders(ASDF.BlockHeader[]))

This creates an ASDF.ASDFFile object which contains a meta field. This is a new dictionary that merges information about this library (stored under the asdf/library key) with the original user-defined af_payload dictionary:

af.metadata
Dict{Any, Any} with 3 entries:
  "meta"         => Dict{Any, Any}("my"=>Dict{Any, Any}("nested"=>"metadata"))
  "data"         => [1, 2, 3, 4]
  "asdf/library" => Dict{Any, Any}("name"=>"ASDF.jl", "author"=>"Erik Schnetter…
af.metadata["asdf/library"]
Dict{Any, Any} with 4 entries:
  "name"     => "ASDF.jl"
  "author"   => "Erik Schnetter <schnetter@gmail.com>"
  "homepage" => "https://github.com/JuliaAstro/ASDF.jl"
  "version"  => "2.0.0"

Since the underlying data is a dictionary, it can be modified in the standard way:

af.metadata["meta"]["my"]["nested2"] = "metadata2"

af.metadata
Dict{Any, Any} with 3 entries:
  "meta"         => Dict{Any, Any}("my"=>Dict{Any, Any}("nested2"=>"metadata2",…
  "data"         => [1, 2, 3, 4]
  "asdf/library" => Dict{Any, Any}("name"=>"ASDF.jl", "author"=>"Erik Schnetter…

Array storage

By default, array data is written inline as a literal to the ASDF file. This can be stored and later accessed more efficiently by wrapping your data in an ASDF.NDArrayWrapper. This allows for your data to be stored as a binary via the inline = false keyword, which can be further optimized by specifying a supported compression algorithm to use via the compression keyword:

af_payload = Dict{Any, Any}(
    "meta" => Dict("my" => Dict("nested" => "metadata")),
    # Default
    "data" => ASDF.NDArrayWrapper([1, 2, 3, 4]; inline = false, compression = ASDF.C_Bzip2),
)

fpath = joinpath(data_dir, "my_file_compressed.asdf")
ASDF.write_file(fpath, af_payload)

Saving your data as an NDArrayWrapper allows for it to be lazily accessed as a strided view. To access the underlying data, use the [] (dereference) syntax:

af = ASDF.load_file(fpath)

af.metadata["data"][]
4-element reshape(reinterpret(Int64, ::StridedViews.StridedView{UInt8, 2, Memory{UInt8}, typeof(identity)}), 4) with eltype Int64:
 1
 2
 3
 4

Tagged objects

Coming soon. Supporting custom objects, extensions.