Parallel Computing with OhMyThreads.jl

SolarPosition.jl provides a parallel computing extension using OhMyThreads.jl for efficient multithreaded solar position calculations across large time series. This extension is particularly useful when processing thousands of timestamps, where parallelization can provide significant speedups.

Installation

The OhMyThreads extension is loaded automatically when both SolarPosition.jl and OhMyThreads.jl are loaded:

using SolarPosition
using OhMyThreads
Thread Configuration

Julia must be started with multiple threads to benefit from parallelization. Use julia --threads=auto or set the JULIA_NUM_THREADS environment variable. Check the number of available threads with Threads.nthreads().

Quick Start

The extension adds new methods to solar_position and solar_position! that accept an OhMyThreads.Scheduler as the last argument. These methods automatically parallelize computations across the provided timestamp vector.

using SolarPosition
using OhMyThreads
using Dates
using StructArrays

# Create observer location
obs = Observer(51.5, -0.18, 15.0)  # London

# Generate a year of minute timestamps
times = collect(DateTime(2024, 1, 1):Minute(1):DateTime(2025, 1, 1))

# Parallel computation with DynamicScheduler
t0 = time()
positions = solar_position(obs, times, PSA(), NoRefraction(), DynamicScheduler())
dt_parallel = time() - t0
println("Time taken (parallel): $(round(dt_parallel, digits=5)) seconds")
Time taken (parallel): 0.77307 seconds

Now we compare this to the serial version:

# Serial computation (no scheduler argument)
t0 = time()
positions_serial = solar_position(obs, times, PSA(), NoRefraction())
dt_serial = time() - t0
println("Time taken (serial): $(round(dt_serial, digits=5)) seconds")
Time taken (serial): 0.13224 seconds

We observe a speedup of:

speedup = dt_serial / dt_parallel
println("Speedup: $(round(speedup, digits=2))×")
Speedup: 0.17×

Simplified Syntax

You can also use the simplified syntax with the scheduler as the third argument, which uses the default algorithm (PSA) and no refraction correction:

# Simplified syntax with default algorithm
positions = solar_position(obs, times, DynamicScheduler())
@show first(positions, 3)
3-element StructArray(::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64}) with eltype SolPos{Float64}:
 SolPos(azimuth=358.1624783703248°, elevation=-61.55014545423738°, zenith=151.5501454542374°)
 SolPos(azimuth=358.64510532358224°, elevation=-61.554425651495485°, zenith=151.55442565149548°)
 SolPos(azimuth=359.12783221156707°, elevation=-61.55739562548632°, zenith=151.55739562548632°)

Available Schedulers

OhMyThreads.jl provides different scheduling strategies optimized for various workload characteristics:

DynamicScheduler

The DynamicScheduler is the default and recommended scheduler for most workloads. It dynamically balances tasks among threads, making it suitable for non-uniform workloads where computation times may vary. Please visit the OhMyThreads.jl documentation for more details.

# Dynamic scheduling (recommended)
positions = solar_position(obs, times, PSA(), NoRefraction(), DynamicScheduler());

StaticScheduler

The StaticScheduler partitions work statically among threads. This can be more efficient for uniform workloads where all computations take approximately the same time.

# Static scheduling for uniform workloads
positions = solar_position(obs, times, PSA(), NoRefraction(), StaticScheduler())

In-Place Computation

For maximum performance and minimal allocations, use the in-place version solar_position! with a pre-allocated StructVector:

using StructArrays

# Pre-allocate output array
positions = StructVector{SolPos{Float64}}(undef, length(times))

# Compute in-place
solar_position!(positions, obs, times, PSA(), NoRefraction(), DynamicScheduler())

The in-place version avoids allocating the output array and minimizes intermediate allocations, making it ideal for repeated computations or memory-constrained environments.

Performance Comparison

Here's a typical performance comparison between serial and parallel execution:

using BenchmarkTools

### Serial execution (no scheduler argument)
@benchmark solar_position($obs, $times, PSA(), NoRefraction())
# BenchmarkTools.Trial: 57 samples with 1 evaluation per sample.
#  Range (min … max):  83.994 ms … 98.110 ms  ┊ GC (min … max): 0.00% … 12.50%
#  Time  (median):     87.907 ms              ┊ GC (median):    0.66%
#  Time  (mean ± σ):   88.194 ms ±  2.478 ms  ┊ GC (mean ± σ):  1.39% ±  2.23%

#                ▁             █
#   ▆▁▄▁▁▁▇▆▆▁▁▇▄█▄▁▇▆▇▇▄▄▁▄▄▆▆█▆▆▄▁▆▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
#   84 ms           Histogram: frequency by time        95.8 ms <

#  Memory estimate: 12.06 MiB, allocs estimate: 9.

### Parallel execution with DynamicScheduler
@benchmark solar_position($obs, $times, PSA(), NoRefraction(), DynamicScheduler())
# BenchmarkTools.Trial: 312 samples with 1 evaluation per sample.
#  Range (min … max):   7.588 ms … 35.575 ms  ┊ GC (min … max):  0.00% … 74.79%
#  Time  (median):     14.718 ms              ┊ GC (median):     6.16%
#  Time  (mean ± σ):   16.026 ms ±  6.387 ms  ┊ GC (mean ± σ):  23.51% ± 19.37%

#     ▆▆█▁▃▂▅ ▁▁▁▄▁▃▅▁  ▁▄ ▁
#   █▇███████▄████████▇▄██▆█▇▆▇█▅▆▅▆▄▄▁▅▄▁▁▄▃▄▅▄▃▃▄▃▄▆▃▃▁▄▄▁▃▁▃ ▄
#   7.59 ms         Histogram: frequency by time        34.4 ms <

#  Memory estimate: 66.59 MiB, allocs estimate: 468.

### In-place parallel execution
pos = StructVector{SolPos{Float64}}(undef, length(times))
@benchmark solar_position!($pos, $obs, $times, PSA(), NoRefraction(), DynamicScheduler())
# BenchmarkTools.Trial: 908 samples with 1 evaluation per sample.
#  Range (min … max):  4.061 ms …   7.846 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     5.532 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   5.501 ms ± 644.881 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

#                                ▃▅█▄▂▂       ▁
#   ▃▄▅▃▄▃▃▅▃▄▃▁▃▂▃▄▂▃▂▂▃▂▂▃▂▅▃▃████████▆▅▅▄▆▅██▅▅▆▄▅▄▃▂▃▃▃▃▃▂▂ ▃
#   4.06 ms         Histogram: frequency by time        6.72 ms <

#  Memory estimate: 20.47 KiB, allocs estimate: 284.

### In-place parallel execution with StaticScheduler
@benchmark solar_position!($pos, $obs, $times, PSA(), NoRefraction(), StaticScheduler())
# BenchmarkTools.Trial: 902 samples with 1 evaluation per sample.
#  Range (min … max):  4.027 ms …   7.228 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     5.842 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   5.537 ms ± 802.636 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

#  ▃▁   ▁▄▃                       ▁      ▁▅▆█▂▄▇▇▅▁▃    ▁
#  ██▃▃▅███▅▆▆▄▄▃▃▃▃▃▂▄▅▂▁▃▆▃▃▂▅▃▅█▇▆▂▃▅▅██████████████▇█▆▇▇▂▂ ▄
#  4.03 ms         Histogram: frequency by time        6.72 ms <

#  Memory estimate: 15.97 KiB, allocs estimate: 220.

On a system with 32 threads processing 527,041 timestamps (one year, minutely):

MethodTimeSpeedupAllocations
Serial87.9 ms1.0×12.06 MiB
Parallel (DynamicScheduler)14.7 ms6.0×66.59 MiB
In-place (DynamicScheduler)5.53 ms15.9×20.47 KiB
In-place (StaticScheduler)5.84 ms15.0×15.97 KiB
Performance Tips

For the best performance:

  • Use solar_position! with pre-allocated output for minimal allocations
  • Use DynamicScheduler() for most workloads
  • Ensure Julia is running with multiple threads (e.g., --threads=auto)
  • Process larger batches of timestamps to amortize threading overhead

Working with Different Time Types

The parallel methods work with both DateTime and ZonedDateTime:

using TimeZones

# Using ZonedDateTime (avoiding DST transitions)
tz = tz"Europe/London"
# Use a subset of times to avoid DST transition issues in documentation
summer_times = collect(DateTime(2024, 6, 1):Hour(1):DateTime(2024, 7, 1))
zoned_times = ZonedDateTime.(summer_times, tz)

# Parallel computation with time zone aware timestamps
zoned_positions = solar_position(obs, zoned_times, PSA(), NoRefraction(), DynamicScheduler())

println("Computed $(length(zoned_positions)) positions with time zone awareness")
Computed 721 positions with time zone awareness

Algorithm Comparison

The parallel interface works with all solar position algorithms:

# Test different algorithms in parallel
algorithms = [PSA(), NOAA(), SPA()]

for alg in algorithms
    pos = solar_position(obs, times[1:100], alg, NoRefraction(), DynamicScheduler())
    println("$(typeof(alg).name.name): azimuth=$(round(pos.azimuth[50], digits=5))°")
end
PSA: azimuth=21.29414°
NOAA: azimuth=21.29068°
SPA: azimuth=21.2971°

Refraction Correction

Atmospheric refraction corrections can be applied in parallel computations:

# Parallel computation with Bennett refraction correction
positions_refracted = solar_position(
    obs,
    times,
    PSA(),
    BENNETT(),
    DynamicScheduler()
)

println("First position with refraction:")
println("  Apparent elevation: $(round(positions_refracted.apparent_elevation[1], digits=2))°")
First position with refraction:
  Apparent elevation: -61.56°

Implementation Details

The extension uses OhMyThreads' tmap and tmap! for task-based parallelism. Each timestamp is processed independently, making the computation embarrassingly parallel with no inter-thread communication required.

The results from tmap are automatically converted to a StructVector for efficient columnar storage compatible with the rest of SolarPosition.jl's API.

See Also