Uncertainty Quantification Tutorial

This tutorial demonstrates how to perform Bayesian uncertainty quantification in piblin-jax for rheological and scientific data analysis. You’ll learn how to:

  • Fit Bayesian models to experimental data

  • Obtain parameter estimates with credible intervals

  • Generate predictions with uncertainty bands

  • Propagate uncertainty through transforms

  • Interpret posterior distributions

Why Uncertainty Quantification?

Traditional least-squares fitting gives point estimates of parameters but doesn’t quantify uncertainty in those estimates. Bayesian methods provide:

Full posterior distributions

Not just a single value, but the entire probability distribution over parameter values given your data.

Credible intervals

Rigorous uncertainty ranges for parameters (e.g., “95% probability that K is between 4.5 and 5.5”).

Prediction intervals

Uncertainty bands for model predictions that account for both parameter uncertainty and measurement noise.

Principled model comparison

Bayesian evidence for comparing competing models.

Basic Example: Power-Law Model

Let’s start with a simple shear-thinning fluid following the power-law model:

\[\eta = K \dot{\gamma}^{n-1}\]

where \(\eta\) is viscosity, \(\dot{\gamma}\) is shear rate, \(K\) is consistency index, and \(n\) is flow behavior index.

Generate Synthetic Data

First, create some synthetic data with noise:

import numpy as np
from piblin_jax.bayesian.models import PowerLawModel

# True parameters
K_true = 5.0
n_true = 0.6

# Generate data
np.random.seed(42)
shear_rate = np.logspace(-1, 2, 20)
viscosity_true = K_true * shear_rate ** (n_true - 1)
viscosity = viscosity_true + np.random.normal(0, 0.5, size=len(shear_rate))

Fit Bayesian Model

Now fit the Bayesian power-law model:

# Create and fit model
model = PowerLawModel(n_samples=2000, n_warmup=1000)
model.fit(shear_rate, viscosity)

# Check if sampling succeeded
if model.is_fitted:
    print("Model fitting successful!")
else:
    print("Warning: Check sampling diagnostics")

The model uses MCMC (Markov Chain Monte Carlo) to sample from the posterior distribution. n_samples=2000 specifies the number of posterior samples, and n_warmup=1000 is the burn-in period.

Examine Results

View a summary of the posterior distribution:

summary = model.summary()
print(summary)

Output:

Parameter Posterior Summary:
----------------------------
K: mean=5.02, std=0.15, 95% CI=[4.73, 5.31]
n: mean=0.598, std=0.012, 95% CI=[0.575, 0.621]
sigma: mean=0.51, std=0.09, 95% CI=[0.38, 0.71]

The summary shows:

  • mean: Posterior mean (point estimate)

  • std: Posterior standard deviation (uncertainty)

  • 95% CI: 95% credible interval

Extract Specific Parameters

Access posterior samples directly:

# Get posterior samples
samples = model.samples
K_samples = samples['K']
n_samples = samples['n']

# Calculate custom statistics
K_median = np.median(K_samples)
K_90_interval = np.percentile(K_samples, [5, 95])

print(f"K median: {K_median:.3f}")
print(f"K 90% interval: [{K_90_interval[0]:.3f}, {K_90_interval[1]:.3f}]")

Visualize Results

Generate diagnostic plots:

import matplotlib.pyplot as plt

# Plot fit with uncertainty
fig, axes = model.plot_fit(
    shear_rate,
    viscosity,
    show_uncertainty=True,
    uncertainty_level=0.95
)
plt.savefig('power_law_fit.png', dpi=300)
plt.show()

The plot shows:

  • Data points (observed viscosity)

  • Mean prediction (posterior mean fit)

  • 95% credible interval (shaded band)

Make Predictions

Predict at new shear rates with uncertainty:

# New prediction points
new_shear_rate = np.logspace(-2, 3, 100)

# Predict with uncertainty
predictions = model.predict(new_shear_rate, return_uncertainty=True)
mean_pred = predictions['mean']
lower_pred = predictions['lower']
upper_pred = predictions['upper']

# Plot predictions
plt.figure(figsize=(10, 6))
plt.plot(new_shear_rate, mean_pred, 'r-', label='Mean prediction')
plt.fill_between(new_shear_rate, lower_pred, upper_pred,
                 alpha=0.3, label='95% prediction interval')
plt.scatter(shear_rate, viscosity, c='k', label='Data')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('Shear Rate (1/s)')
plt.ylabel('Viscosity (Pa·s)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Working with Datasets

Quantiq integrates Bayesian uncertainty with the Dataset API:

from piblin_jax.data.datasets import OneDimensionalDataset

# Create dataset
dataset = OneDimensionalDataset(
    independent_variable_data=shear_rate,
    dependent_variable_data=viscosity,
    conditions={'temperature': 25.0, 'sample': 'A1'}
)

# Fit model to dataset
model = PowerLawModel(n_samples=2000)
model.fit(dataset.independent_variable_data,
          dataset.dependent_variable_data)

# Add uncertainty to dataset
dataset_with_unc = dataset.with_uncertainty(
    model=model,
    n_samples=1000,
    keep_samples=True
)

# Check uncertainty status
print(f"Has uncertainty: {dataset_with_unc.has_uncertainty}")

# Get credible intervals
lower, upper = dataset_with_unc.get_credible_intervals(level=0.95)

Propagating Uncertainty Through Transforms

Uncertainty can be propagated through data transformations:

from piblin_jax.transform.dataset import GaussianSmoothing

# Create dataset with uncertainty
dataset_with_unc = dataset.with_uncertainty(
    model=model,
    n_samples=1000,
    keep_samples=True
)

# Apply transform with uncertainty propagation
smoother = GaussianSmoothing(sigma=2.0)
smoothed = smoother.apply_to(
    dataset_with_unc,
    propagate_uncertainty=True
)

# Uncertainty is now propagated through the transform
print(f"Smoothed has uncertainty: {smoothed.has_uncertainty}")

Advanced Example: Arrhenius Model

Temperature-dependent viscosity following Arrhenius equation:

\[\eta = A \exp\left(\frac{E_a}{RT}\right)\]

where \(A\) is pre-exponential factor, \(E_a\) is activation energy, \(R\) is gas constant, and \(T\) is temperature.

Generate Temperature-Dependent Data

from piblin_jax.bayesian.models import ArrheniusModel

# True parameters
A_true = 1e-3  # Pa·s
Ea_true = 50000  # J/mol
R = 8.314  # J/(mol·K)

# Temperature range (K)
temperature = np.linspace(273, 373, 15)

# Generate data with noise
np.random.seed(42)
viscosity_true = A_true * np.exp(Ea_true / (R * temperature))
viscosity = viscosity_true * np.random.lognormal(0, 0.1, size=len(temperature))

Fit Arrhenius Model

# Fit model
model = ArrheniusModel(n_samples=2000)
model.fit(temperature, viscosity)

# View results
print(model.summary())

# Plot fit
fig, axes = model.plot_fit(temperature, viscosity, show_uncertainty=True)
plt.xlabel('Temperature (K)')
plt.ylabel('Viscosity (Pa·s)')
plt.show()

Interpreting Results

The posterior samples reveal parameter correlations:

import matplotlib.pyplot as plt

samples = model.samples
A_samples = samples['A']
Ea_samples = samples['Ea']

# Plot joint distribution
plt.figure(figsize=(8, 6))
plt.scatter(A_samples, Ea_samples / 1000, alpha=0.3, s=1)
plt.xlabel('A (Pa·s)')
plt.ylabel('Ea (kJ/mol)')
plt.title('Joint Posterior Distribution')
plt.grid(True, alpha=0.3)
plt.show()

This scatter plot reveals correlation between \(A\) and \(E_a\): if \(A\) is higher, \(E_a\) tends to be higher too.

Model Comparison

Compare different rheological models using Bayesian evidence:

from piblin_jax.bayesian.models import PowerLawModel, CrossModel

# Fit competing models
power_law = PowerLawModel(n_samples=2000)
power_law.fit(shear_rate, viscosity)

cross = CrossModel(n_samples=2000)
cross.fit(shear_rate, viscosity)

# Compare using information criteria
power_law_aic = power_law.aic()
cross_aic = cross.aic()

print(f"Power-law AIC: {power_law_aic:.1f}")
print(f"Cross AIC: {cross_aic:.1f}")

if cross_aic < power_law_aic:
    print("Cross model is preferred (lower AIC)")
else:
    print("Power-law model is preferred (lower AIC)")

Lower AIC indicates better model fit penalized for complexity.

Tips and Best Practices

Number of samples

Use at least 1000-2000 samples for reliable uncertainty estimates. More samples give smoother distributions but take longer.

Convergence diagnostics

Always check model.is_fitted and examine trace plots to ensure MCMC chains have converged:

if not model.is_fitted:
    print("Warning: Sampling may not have converged")
    # Increase n_samples or n_warmup
Prior sensitivity

Bayesian results depend on priors. piblin-jax uses weakly informative priors by default. For custom priors, see the API documentation.

Computational cost

Bayesian fitting is ~10-100x slower than NLSQ. Use NLSQ first for initial exploration, then Bayesian for final analysis with uncertainty.

Uncertainty vs confidence

Credible intervals (Bayesian) have a different interpretation than confidence intervals (frequentist). A 95% credible interval means “95% probability the parameter is in this range given the data.”

Next Steps

References

  • Gelman, A., et al. (2013). Bayesian Data Analysis, 3rd Edition. Chapman and Hall/CRC.

  • McElreath, R. (2020). Statistical Rethinking, 2nd Edition. CRC Press.