Introduction
Adapted from ADASS 2024 workshop.
The ASDF file format is based on the human-readable YAML standard, extended with efficient binary blocks to store array data. Basic arithmetic types (Bool, Int, Float, Complex) and String types are supported out of the box. Other types (structures) need to be declared to be supported.
ASDF supports arbitrary array strides, both C (Python) and Fortran (Julia) memory layouts, as well as compression. The YAML metadata can contain arbitrary information corresponding to scalars, arrays, or dictionaries.
The ASDF file format targets a similar audience as the HDF5 format.
Getting started
ASDF files are initially created as a dictionary with arbitrarily nested data:
julia> using OrderedCollections
julia> af_payload = OrderedDict("field_1" => [5, 6, 7, 8], "field_2" => ["up", "down", "left", "right"], "field_3" => OrderedDict("field_3a" => ["apple", "orange", "pear"], "field_3b" => [1.0, 2.0, 3.0]));We use an OrderedDict from OrderedCollections.jl to preserve the order of data on write and load, and to maximize compatibility with YAML.jl. For a potential alternative using Dictionaries.jl, see this experimental branch.
ASDF.jl is registered with FileIO.jl, so this data can be written to the ASDF file format with the generic save function:
julia> using ASDF
julia> save("intro.asdf", af_payload)The saved file contains the following human-readable contents:
View file
julia> read("intro.asdf", String) |> print
#ASDF 1.0.0
#ASDF_STANDARD 1.2.0
# This is an ASDF file <https://asdf-standard.readthedocs.io/>
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
---
!core/asdf-1.1.0
field_1:
- 5
- 6
- 7
- 8
field_2:
- "up"
- "down"
- "left"
- "right"
field_3:
field_3a:
- "apple"
- "orange"
- "pear"
field_3b:
- 1.0
- 2.0
- 3.0
asdf/library: !core/software-1.0.0
name: "ASDF.jl"
author: "Erik Schnetter <schnetter@gmail.com>"
homepage: "https://github.com/JuliaAstro/ASDF.jl"
version: "2.0.0"
...
#ASDF BLOCK INDEX
%YAML 1.1
---
[]
...which can be loaded back with FileIO.jl's generic load function:
julia> af = load("intro.asdf")
intro.asdf
├─ field_1::Vector{Int64} | shape = (4,)
├─ field_2::Vector{String} | shape = (4,)
├─ field_3::String
│ ├─ field_3a::Vector{String} | shape = (3,)
│ └─ field_3b::Vector{Float64} | shape = (3,)
└─ asdf/library::String
├─ author::String | Erik Schnetter <schnetter@gmail.com>
├─ homepage::String | https://github.com/JuliaAstro/ASDF.jl
├─ name::String | ASDF.jl
└─ version::String | 2.0.0This is stored as an ASDF.ASDFFile. To change the number of rows shown, pass this object to ASDF.info:
julia> ASDF.info(af; max_rows = 3)
intro.asdf
├─ field_1::Vector{Int64} | shape = (4,)
├─ field_2::Vector{String} | shape = (4,)
⋮ (8) more rowsIt contains a metadata field, which is a new dictionary that merges information about this library (stored under the asdf/library key) with the original user-defined af_payload dictionary. For convenience, af.metadata[<key>] can be accessed directly as af[key]. Since the underlying data is a dictionary, it can be modified in the standard way:
julia> af["field_1"] = [50, 60, 70, 80];
julia> af["field_1"]
4-element Vector{Int64}:
50
60
70
80The convenience syntax can also be used to save the modified ASDF.ASDFFile object directly:
julia> save("intro_modified.asdf", af)View file
julia> read("intro_modified.asdf", String) |> print
#ASDF 1.0.0
#ASDF_STANDARD 1.2.0
# This is an ASDF file <https://asdf-standard.readthedocs.io/>
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
---
!core/asdf-1.1.0
field_1:
- 50
- 60
- 70
- 80
field_2:
- "up"
- "down"
- "left"
- "right"
field_3:
field_3a:
- "apple"
- "orange"
- "pear"
field_3b:
- 1.0
- 2.0
- 3.0
asdf/library: !core/software-1.0.0
name: "ASDF.jl"
author: "Erik Schnetter <schnetter@gmail.com>"
homepage: "https://github.com/JuliaAstro/ASDF.jl"
version: "2.0.0"
...
#ASDF BLOCK INDEX
%YAML 1.1
---
[]
...Array storage
By default, array data is written inline as a literal to the ASDF file. This can be stored and later accessed more efficiently by wrapping your data in an ASDF.NDArrayWrapper. This allows for your data to be stored as a binary via the inline = false keyword (default), which can be further optimized by specifying a supported compression algorithm to use via the compression keyword:
julia> af_payload = OrderedDict("meta" => OrderedDict("my" => OrderedDict("nested" => "metadata")), "data" => ASDF.NDArrayWrapper([1, 2, 3, 4]; compression = ASDF.C_Bzip2));
julia> save("intro_compressed.asdf", af_payload)
julia> af = load("intro_compressed.asdf")
intro_compressed.asdf
├─ meta::String
│ └─ my::String
│ └─ nested::String | metadata
├─ data::ASDF.NDArray | shape = [4]
└─ asdf/library::String
├─ author::String | Erik Schnetter <schnetter@gmail.com>
├─ homepage::String | https://github.com/JuliaAstro/ASDF.jl
├─ name::String | ASDF.jl
└─ version::String | 2.0.0View file
julia> read("intro_compressed.asdf", String) |> print
#ASDF 1.0.0
#ASDF_STANDARD 1.2.0
# This is an ASDF file <https://asdf-standard.readthedocs.io/>
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
---
!core/asdf-1.1.0
meta:
my:
nested: "metadata"
data: !core/ndarray-1.0.0
source: 0
shape:
- 4
datatype: "int64"
byteorder: "little"
asdf/library: !core/software-1.0.0
name: "ASDF.jl"
author: "Erik Schnetter <schnetter@gmail.com>"
homepage: "https://github.com/JuliaAstro/ASDF.jl"
version: "2.0.0"
...
�BLK0 f�0xj�sq���r#ASDF BLOCK INDEX
%YAML 1.1
---
[463,]
...Using NDArrayWrapper allows for the wrapped data to be lazily accessed as a strided view. To access the underlying data, use the [] (dereference) syntax:
julia> af["data"][]
4-element reshape(reinterpret(Int64, ::StridedViews.StridedView{UInt8, 2, Memory{UInt8}, typeof(identity)}), 4) with eltype Int64:
1
2
3
4Tagged objects
Come back soon to see how custom Julia objects can be handled in ASDF.jl.