Curve Fitting
This tutorial will demonstrate fitting data with a straight line (linear regression), an abitrary non-linear model, and finally a Bayesian model.
Packages
LinearAlgebra
we'll use this built-in Julia standard library to perform a linear regressionOptimization
: we'll use this package to display coordinates along the image and add the scalebarOptimizationOptimJL
: the specific optimizer backend we will use. For your own problems, select the best backend from the Optimization.jl documentation page.Turing
: we'll use this package for Bayesian modelling.PairPlots
: we'll use this for creating a corner plot of the posterior from our Bayesian models.
You can install the necessary packages by running Julia, and typing ]
to enter Pkg-mode. Then: add Plots Optimization OptimizationOptimJL Turing PairPlots
. Alternatively, you can run using Pkg; Pkg.add(["Plots", "Optimization", "OptimizationOptimJL", "Turing", "PairPlots"])
. In your own code, you most likely won't need all of these packages. Pick and choose the one that best fits your problem.
If you will be using these tools as part of a bigger project, it's strongly recommended to create a Julia Project to record package versions. If you're just experimenting, you can create a temporary project by running ] activate --temp
.
If you're using Pluto notebooks, installing and recording package versions in a project are handled for you automatically.
Generating the data
We'll generate synthetic data for this problem. We'll make a weak parabola with some noise. For consistency, we'll seed the Julia random number generator so that we see the same noise each time the tutorial is run.
julia> using Random
julia> Random.seed!(1234)
By calling seed!
, the pattern of random numbers generated by rand
and randn
will be the same each time.
Now we'll generate the data:
julia> x = 0:5:100 # Or equivalently: range(0, 100, step=5)
0:1:100
julia> y = (x ./ 20 .- 0.2).^2 .+ 2 .+ randn(length(x))
101-element Vector{Float64}:
3.010656328855214
1.0432815884648003
⋮
25.653998582068482
26.260043796712125
The randn
function generates a random value normally distributed around 0
with a standard deviation of 1
. rand
on the other hand creates uniformly distributed random values distributed between 0
and 1
.
Let's plot the data to see what it looks like:
julia> using Plots
julia> scatter(x, y; xlabel="x", ylabel="y", label="data")
Linear regression
Before using any packages, let's perform a linear fit from scratch using some linear algebra.
The equation of a line can be written in matrix form as
\[\quad \begin{pmatrix} N & \sum y_i \\ \sum y_i & \sum y_{i}^2 \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ \end{pmatrix}= \begin{pmatrix} \sum y_i \\ \sum y_i x_i \end{pmatrix}\]
where $c_1$ and $c_2$ are the intercept and slope.
Multiplying both sides by the inverse of the first matrix gives
\[\quad \begin{pmatrix} c_1 \\ c_2 \\ \end{pmatrix}= \begin{pmatrix} N & \sum y_i \\ \sum y_i & \sum y_{i}^2 \end{pmatrix}^{-1} \begin{pmatrix} \sum y_i \\ \sum y_i x_i \end{pmatrix}\]
We can write the right-hand side matrix and vector (let's call them A
and b
) in Julia notation like so:
julia> A = [
length(x) sum(x)
sum(x) sum(x.^2)
]
2×2 Matrix{Int64}:
21 1050
1050 71750
julia> b = [
sum(y)
sum(y .* x)
]
2-element Vector{Float64}:
210.4250937868108
15023.030866331104
We can now perform the linear fit by solving the system of equations with the \
operator:
julia> c = A\b
2-element Vector{Float64}:
-1.67268257376372
0.2338585027008085
Let's make a helper function linfunc
that takes an x value, a slope, and an intercept and calculates the corresponding y value:
julia> linfunc(x; slope, intercept) = slope*x + intercept
linfunc (generic function with 1 method)
Finally, we can plot the solution:
julia> yfit = linfunc.(x; slope=c[2], intercept=c[1])
julia> scatter(x, y, xlabel="x", ylabel="y", label="data")
julia> plot!(x, yfit, label="best fit")
The packages LsqFit and GLM (for generalized linear models) contain functions for performing and evaluating these types of linear fits.
(Non-)linear curve fit
The packages above can be used to fit different polynomial models, but if we have a truly arbitrary Julia function we would like to fit to some data we can use the Optimization.jl package. Through its various backends, Optimization.jl supports a very wide range of algorithms for local, global, convex, and non-convex optimization.
The first step is to define our objective function. We'll reuse our simple linfunc
linear function from above:
linfunc(x; slope, intercept) = slope*x + intercept
# We must supply an objective function that will be minimized
# The u argument is a vector of parameters from the optimizer.
# data is a vector of static parameters passed through below.
function objective(u, data)
# Get our fit parameters from u
slope, intercept = u
# equivalent to:
# slope = u[1]
# intercept = u[2]
# Get the x and y vectors from data
x, y = data
# Calculate the residuals between our model and the data
residuals = linfunc.(x; slope, intercept) .- y
# Return the sum of squares of the residuals to minimize
return sum(residuals.^2)
end
# Define the initial parameter values for slope and intercept
u0 = [1.0, 1.0]
# Pass through the data we want to fit
data = [x,y]
# Create an OptimizationProblem object to hold the function, initial
# values, and data.
using Optimization
prob = OptimizationProblem(objective,u0,data)
# Import the optimization backend we want to use
using OptimizationOptimJL
# Minimize the function. Optimization.jl uses the SciML common solver
# interface. Pass the problem you want to solve (optimization problem
# here) and a solver to use.
# NelderMead() is a derivative-free method for finding a function's
# local minimum.
sol = solve(prob,NelderMead())
# Exctract the best-fitting parameters
slope, intercept = sol.u
Note: the NelderMead()
algorithm behaves nearly identically to MATLAB's fminsearch
.
We can now plot the solution:
julia> yfit = linfunc.(x; slope, intercept)
julia> scatter(x, y, xlabel="x", ylabel="y", label="data")
julia> plot!(x, yfit, label="best fit")
We can now test out a quadratic fit using the same package:
function objective(u, data)
x, y = data
# Define an equation of a quadratic, e.g.:
# 3x^2 + 2x + 1
model = u[1] .* x.^2 .+ u[2] .* x .+ u[3]
# Calculate the residuals between our model and the data
residuals = model .- y
# Return the sum of squares of the residuals to minimize
return sum(residuals.^2)
end
u0 = [1.0, 1.0, 1.0]
data = [x,y]
prob = OptimizationProblem(objective,u0,data)
using OptimizationOptimJL
sol = solve(prob,NelderMead())
u = sol.u
yfit = u[1] .* x.^2 .+ u[2] .* x .+ u[3]
scatter(x, y, xlabel="x", ylabel="y", label="data")
plot!(x, yfit, label="quadratic fit")
This is already very fast; however, as the scale of your problem grows, there are several routes you can take to improve the optimization performance. First, you can use automatic differentiation and a higher order optimization algorithm:
using ForwardDiff
optf = OptimizationFunction(objective, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf,u0,data)
@time sol = solve(prob,BFGS()) # another good algorithm is Newton()
You can also write an "in-place" version of objective
that doesn't allocate new arrays with each iteration.
Bayesian models
Let's shift gears and now create a fully Bayesian model using the Turing.jl package.
Instead of defining an arbitrary Julia function, this package requires us to use a macro called @model
.
Let's start with a linear model once more, now with the Turing @model
syntax:
# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
σ₂ ~ truncated(Normal(0, 100), 0, Inf)
# Typed as \sigma <tab> \_2 <tab>
# Set intercept prior.
intercept ~ Normal(0, 5)
# Set the prior on our slope coefficient.
slope ~ Normal(0, 10)
# Each point is drawn from a gaussian (Normal) distribution
# with mean calculated form our linear model, and standard
# deviation as the square root of the variance variable
for i in eachindex(x,y)
y[i] ~ Normal(x[i] * slope + intercept, sqrt(σ₂))
end
end
We can now draw posterior samples from this model using one of many available samplers, NUTS
, the No U-Turn Sampler.
model = linear_regression(x, y)
chain = sample(model, NUTS(0.65), 500)
┌ Info: Found initial step size
└ ϵ = 0.003125
Sampling 100%|█████████████████████████████████████| Time: 0:00:05
Chains MCMC chain (25000×15×1 Array{Float64, 3}):
Iterations = 1001:1:26000
Number of chains = 1
Samples per chain = 25000
Wall duration = 5.88 seconds
Compute duration = 5.88 seconds
parameters = σ₂, intercept, slope
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
σ₂ 6.7431 2.6279 0.0166 0.0265 10640.9415 1.0000 1810.6077
intercept -1.5979 1.0739 0.0068 0.0105 10239.7534 1.0001 1742.3436
slope 0.2328 0.0186 0.0001 0.0002 10306.9493 1.0001 1753.7773
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
σ₂ 3.3126 4.9457 6.1965 7.9372 13.3608
intercept -3.6910 -2.2965 -1.5992 -0.9049 0.5423
slope 0.1959 0.2206 0.2329 0.2449 0.2690
intercept = chain["intercept"]
slope = chain["slope"]
σ₂ = chain["σ₂"]
plot(x, x .* slope' .+ intercept';
label="",
color=:gray,
alpha=0.05
)
scatter!(x, y, xlabel="x", ylabel="y", label="data", color=1)
Each gray curve is a sample from the posterior distribution of this model. To examine the model parameters and their covariance in greater detail, we can make a corner plot using the PairPlots.jl package. We'll need a few more samples for a nice plot, so re-run the NUTS sampler with more iterations first.
Random.seed!(1234)
chain = sample(model, NUTS(0.65), 25_000)
using PairPlots
table = (;
intercept= chain["intercept"],
slope= chain["slope"],
σ= sqrt.(chain["σ₂"])
)
PairPlots.corner(table)
Let's now repeat this proceedure with a Bayesian quadratic model.
@model function quad_regression(x, y)
# Prior on the variance of the data around the best-fit line
σ₂ ~ truncated(Normal(0, 10), 0, Inf)
# Priors on the three quadratic parameters
u1 ~ Normal(0,0.01)
u2 ~ Normal(0,0.1)
u3 ~ Normal(0,5)
for i in eachindex(x,y)
model = u1 * x[i]^2 + u2*x[i] + u3
y[i] ~ Normal(model, sqrt(σ₂))
end
end
We can now draw posterior samples from this model using one of many available samplers, NUTS
, or the No U-Turn Sampler.
model = quad_regression(x, y)
chain = sample(model, NUTS(0.65), 500)
┌ Info: Found initial step size
└ ϵ = 0.0001953125
Sampling 100%|█████████████████████████████████████| Time: 0:00:05
Chains MCMC chain (500×16×1 Array{Float64, 3}):
Iterations = 251:1:750
Number of chains = 1
Samples per chain = 500
Wall duration = 5.89 seconds
Compute duration = 5.89 seconds
parameters = σ₂, u1, u2, u3
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
σ₂ 1.5698 0.6322 0.0283 0.0518 117.5553 0.9994 19.9517
u1 0.0024 0.0003 0.0000 0.0000 134.9184 0.9997 22.8986
u2 -0.0059 0.0283 0.0013 0.0024 107.3698 0.9995 18.2230
u3 2.1371 0.6109 0.0273 0.0562 87.2121 0.9995 14.8018
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
σ₂ 0.8757 1.1468 1.3945 1.8181 3.3834
u1 0.0018 0.0022 0.0024 0.0026 0.0030
u2 -0.0612 -0.0237 -0.0045 0.0133 0.0438
u3 0.9635 1.7155 2.1211 2.5172 3.3960
u1 = chain["u1"]
u2 = chain["u2"]
u3 = chain["u3"]
posterior = u1' .* x.^2 .+ u2' .* x .+ u3'
plot(x, posterior;
label="",
color=:gray,
alpha=0.1
)
scatter!(x, y, xlabel="x", ylabel="y", label="data", color=1)
Random.seed!(1)
chain = sample(model, NUTS(0.65), 25_000)
using PairPlots
table = (;
u_1 = chain["u1"],
u_2 = chain["u2"],
u_3 = chain["u3"],
σ= sqrt.(chain["σ₂"])
)
PairPlots.corner(table)