Working with tabular data

Tables are a common way to represent various forms of catalogs. One common format for storing this data in astronomy is as a FITS file.

In these examples, we will fetch and load the Hipparcos-GAIA Catalog of Accelerations [(HGCA, Brandt et al 2021])](https://iopscience.iop.org/article/10.3847/1538-4365/abf93c). This catalog cross matches stars from the Hipparcos and GAIA catalogs in order to calculate the long term astrometric proper motion anomaly; that is, the star's deviation from straight line motion in the plane of the sky over the ~20 baseline between the two missions.

A wide range of tabular data formats are supported in Julia under a common Tables.jl interface. For example, CSV, Excel, Arrow, CASA Tables, and various SQL formats to name a few.

Packages

  • FITSIO: we'll use this package to load a table stored in a FITS file
  • DataFrames: we'll use this package to examine and manipulate the table
  • Plots: we'll use this package to visualize the contents of the table
  • AstroLib: general utility package. We'll use a helper function to compute a map projection.

You can install the necessary packages by running Julia, and typing ] to enter Pkg-mode. Then: add FITSIO DataFrames Plots AstroLib. Alternatively, you can run using Pkg; Pkg.add(["FITSIO", "DataFrames", "Plots", "AstroLib"]).

If you will be using these tools as part of a larger project, it's strongly recommended to create a Julia Project to record package versions. If you're just experimenting, you can create a temporary project by running ] activate --temp.

If you're using Pluto notebooks, installing and recording package versions in a project are handled for you automatically.

Downloading the data

The table in question is hosted alongside the article. Go to Table 4 and click the link at the bottom to download it in FITS format. You'll need to uncompress the archive to see the HGCA_vEDR3.fits file.

FITS tables can be loaded using the FITSIO package or the AstroImages package which wraps it.

Loading the table

julia> using FITSIO
julia> fits = FITS("HGCA_vEDR3.fits")

File: HGCA_vEDR3.fits
Mode: "r" (read-only)
HDUs: Num  Name  Type   
      1          Image  
      2          Table
julia> table_fits = fits[2]
File: HGCA_vEDR3.fits
HDU: 2
Type: Table
Rows: 115346
Columns: Name                    Size  Type     TFORM  
         hip_id                        Int32    J
         gaia_source_id                Int64    K
         gaia_ra                       Float64  D
         gaia_dec                      Float64  D
         radial_velocity               Float32  E
...
         chisq                         Float32  E

If we choose to use the AstroImages package, this code could be substituted for:

julia> using AstroImages
julia> table_fits = load("HGCA_vEDR3.fits", 2);

Since this table conforms to the Tables.jl interface we can already pass it to a wide range of analysis and plotting tools; however, for interactive work it's useful to wrap this data in a DataFrame from DataFrames.jl.

julia> using DataFrames
julia> df = DataFrame(table_fits)
115346×35 DataFrame
    Row │ hip_id  gaia_source_id       gaia_ra        gaia_dec ⋯
        │ Int32   Int64                Float64        Float64  ⋯
────────┼───────────────────────────────────────────────────────
      1 │      1  2738327528519591936    0.000871957    1.0889 ⋯
      2 │      2  2341871673090078592    0.00511158   -19.4988  
      3 │      3  2881742980523997824    0.00506023    38.8593  
      4 │      4  4973386040722654336    0.00907157   -51.8935  
      5 │      5  2305974989264598272    0.00997423   -40.5912 ⋯
   ⋮    │   ⋮              ⋮                 ⋮            ⋮    ⋱
 115343 │ 120401  5290738562888564736  119.382        -60.6309  
 115344 │ 120402  5290832364972775808  119.449        -60.6097  
 115345 │ 120403  5290725643625189504  119.455        -60.6836  
 115346 │ 120404  5290820682661822848  119.512        -60.6147 ⋯
                              32 columns and 115337 rows omitted

Examining the table

As a first step, let's summarize the contents of the data frame using describe:

julia> describe(df)
35×7 DataFrame
 Row │ variable                mean         min            median       max                  nmissing  eltype   
     │ Symbol                  Union…       Any            Union…       Any                  Int64     DataType 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ hip_id                  59162.8      1              59133.5      120404                      0  Int32
   2 │ gaia_source_id          3.5587e18    7632157690368  3.58418e18   6917489002841762304         0  Int64
   3 │ gaia_ra                 181.445      0.000871957    181.843      359.982                     0  Float64
   4 │ gaia_dec                -2.13495     -89.7824       -1.96568     89.5695                     0  Float64
   5 │ radial_velocity         NaN          NaN                         NaN                         0  Float32
   6 │ radial_velocity_error   NaN          NaN                         NaN                         0  Float32
   7 │ radial_velocity_source               Gaia_DR2                    None                        0  String
   8 │ parallax_gaia           6.98091      0.1            4.29612      768.067                     0  Float32
   9 │ parallax_gaia_error     0.0450278    0.00802848     0.023354     1.52339                     0  Float32
  10 │ pmra_gaia               -1.43563     -4406.47       -1.73408     6766.0                      0  Float32
  11 │ pmdec_gaia              -16.3946     -5817.8        -5.53577     10362.4                     0  Float32
  12 │ pmra_gaia_error         0.062217     0.00743185     0.0326008    2.03418                     0  Float32
  13 │ pmdec_gaia_error        0.0574897    0.00869586     0.029098     1.99573                     0  Float32
  14 │ pmra_pmdec_gaia         -0.0142746   -0.971819      -0.0116653   0.891564                    0  Float32
  15 │ pmra_hg                 -1.43057     -4406.68       -1.69382     6765.91                     0  Float32
  ⋮  │           ⋮                  ⋮             ⋮             ⋮                ⋮              ⋮         ⋮
  22 │ pmra_hip_error          1.22346      0.290212       0.93546      2269.23                     0  Float32
  23 │ pmdec_hip_error         1.00177      0.289527       0.794942     113.61                      0  Float32
  24 │ pmra_pmdec_hip          0.00441718   -0.94068       0.000400515  0.986897                    0  Float32
  25 │ epoch_ra_gaia           2016.07      2015.11        2016.07      2017.05                     0  Float64
  26 │ epoch_dec_gaia          2016.09      2014.95        2016.1       2017.2                      0  Float64
  27 │ epoch_ra_hip            1991.25      1990.4         1991.25      1992.43                     0  Float64
  28 │ epoch_dec_hip           1991.28      1990.34        1991.28      1992.41                     0  Float64
  29 │ crosscal_pmra_hip       -0.0578266   -1.36874       -0.0648208   1.50959                     0  Float32
  30 │ crosscal_pmdec_hip      0.00226569   -1.15742       0.00306844   1.59232                     0  Float32
  31 │ crosscal_pmra_hg        -0.0013262   -0.0511373     -0.00269026  0.0619938                   0  Float32
  32 │ crosscal_pmdec_hg       0.000217839  -0.0598967     0.00025581   0.0564424                   0  Float32
  33 │ nonlinear_dpmra         -9.0424e-5   -7.98001       3.44128e-8   2.64822                     0  Float32
  34 │ nonlinear_dpmdec        0.000311498  -4.1194        1.92019e-7   16.0394                     0  Float32
  35 │ chisq                   566.555      3.11559e-5     3.35103      3.67633e5                   0  Float32
                                                                                                  6 rows omitted

This lists all the columns of the table along with their min, max, median, and means. It also specifies how many entries are missing and the element type of the column.

We can access a specific column from the table using two different syntaxes: df[:,"epoch_ra_gaia"], or simply df.epoch_ra_gaia.

Filtering

Let's apply a cut to the parallax column to only include nearby stars:

julia> nearby = filter(:parallax_gaia => >(50.0), df)
799×35 DataFrame
 Row │ hip_id  gaia_source_id       gaia_ra    gaia_dec   radial_velocity  radial_vel ⋯
     │ Int32   Int64                Float64    Float64    Float32          Float32    ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────
   1 │    428   386655019234959872    1.30108   45.7859          -1.24273             ⋯
   2 │    436  4706630501049679744    1.32232  -67.8351          40.2364
   3 │    439  2306965202564744064    1.38379  -37.3675          25.2944
   4 │    473   386653851004022144    1.42676   45.8114           1.15092
  ⋮  │   ⋮              ⋮               ⋮          ⋮             ⋮                    ⋱
 796 │ 117779  2867175035571212416  358.285     29.0182           1.30031             ⋯
 797 │ 117828  6377828354964753792  358.463    -75.6342          -9.62612
 798 │ 117966  2442996678074668288  358.914     -6.14423         17.0672
 799 │ 120005  1022456104850892928  138.591     52.6834          11.9794
                                                        30 columns and 791 rows omitted

Let's break this down. First, we specify the column name as :parallax_gaia. The : syntax defines a Symbol in Julia which is a bit like a string and a variable name. Next, we say what filter we want to apply to this column by passing a key-value Pair constructed with =>. This syntax, e.g. 1 => 2 just groups two values and is unrelated to keyword arguments. Then, we pass a predicate function, that is a function that takes one value and returns true or false. The expression >(50.0) produces such a function that takes a value and compares it with 50.0 milliarseconds of parallax. Finally, we pass the table we want to filter.

This useful cheatsheet by Tom Kwong is a great reference for these sort of operations.

Plotting

Let's now visualize these stars as they appear in the plane of the sky. We'll colour them based on the significance of the anomalous acceleration they had between the two satellite missions. This acceleration could be caused by a hidden companion star or planet.

julia> using Plots
julia> scatter(
    nearby.gaia_ra,
    nearby.gaia_dec;
    marker_z = log10.(nearby.chisq),
    colorbartitle="log10 χ²", # typed as \chi <tab> \^2 <tab>
    label = "",
    xlabel = "right ascension (°)", # typed as \degree <tab>
    ylabel = "declination (°)" 
)

Let's improve this plot by using a different map projection. We can make this conversion using AstroLib.jl.

The function aitoff takes longitude and latitude (or in this case, right-ascension and delcination) and returns a new position using an Aitoff projection.

julia> using AstroLib
julia> newpoints = aitoff.(nearby.gaia_ra, nearby.gaia_dec)

newpoints is returned as a vector of Tuples of x and y coordinates, but to plot them we'll need separate flat vectors of x and y values. We can convert using getindex:

julia> newx = getindex.(newpoints, 1)
julia> newy = getindex.(newpoints, 2)
799-element Vector{Float64}:
  49.51360693576993
 -71.02232113037604
 -40.77386766281466
   ⋮
 -78.04105630927837
  -6.821326623255115
  64.9538084311332

getindex(obj, n) is equivalent to writing obj[n]. Here we use broadcasting to fetch the first and then second element of each point in the list.

Finally, we'll make the plot and tweak some formatting options:

julia> scatter(
    newx,
    newy;
    marker_z = log10.(nearby.chisq),
    color = :turbo,
    colorbartitle="log10 χ²", # typed as \chi <tab> \^2 <tab>
    label = "",
    xlabel = "right ascension (°)", # typed as \degree <tab>
    ylabel = "declination (°)",
    background=:transparent,
    foreground=:gray,
    framestyle=:box,
    markerstrokewidth=0,
    grid=:none
)

Plot of nearby stars with significant acceleration

We can save the plot using savefig("myplot.pdf"). png, svg, and other formats are also supported.

For more on plotting in general, see the Plots.jl documentation.