Uncertainty Quantification Tutorial
This tutorial demonstrates how to perform Bayesian uncertainty quantification in piblin-jax for rheological and scientific data analysis. You’ll learn how to:
Fit Bayesian models to experimental data
Obtain parameter estimates with credible intervals
Generate predictions with uncertainty bands
Propagate uncertainty through transforms
Interpret posterior distributions
Why Uncertainty Quantification?
Traditional least-squares fitting gives point estimates of parameters but doesn’t quantify uncertainty in those estimates. Bayesian methods provide:
- Full posterior distributions
Not just a single value, but the entire probability distribution over parameter values given your data.
- Credible intervals
Rigorous uncertainty ranges for parameters (e.g., “95% probability that K is between 4.5 and 5.5”).
- Prediction intervals
Uncertainty bands for model predictions that account for both parameter uncertainty and measurement noise.
- Principled model comparison
Bayesian evidence for comparing competing models.
Basic Example: Power-Law Model
Let’s start with a simple shear-thinning fluid following the power-law model:
where \(\eta\) is viscosity, \(\dot{\gamma}\) is shear rate, \(K\) is consistency index, and \(n\) is flow behavior index.
Generate Synthetic Data
First, create some synthetic data with noise:
import numpy as np
from piblin_jax.bayesian.models import PowerLawModel
# True parameters
K_true = 5.0
n_true = 0.6
# Generate data
np.random.seed(42)
shear_rate = np.logspace(-1, 2, 20)
viscosity_true = K_true * shear_rate ** (n_true - 1)
viscosity = viscosity_true + np.random.normal(0, 0.5, size=len(shear_rate))
Fit Bayesian Model
Now fit the Bayesian power-law model:
# Create and fit model
model = PowerLawModel(n_samples=2000, n_warmup=1000)
model.fit(shear_rate, viscosity)
# Check if sampling succeeded
if model.is_fitted:
print("Model fitting successful!")
else:
print("Warning: Check sampling diagnostics")
The model uses MCMC (Markov Chain Monte Carlo) to sample from the posterior
distribution. n_samples=2000 specifies the number of posterior samples,
and n_warmup=1000 is the burn-in period.
Examine Results
View a summary of the posterior distribution:
summary = model.summary()
print(summary)
Output:
Parameter Posterior Summary:
----------------------------
K: mean=5.02, std=0.15, 95% CI=[4.73, 5.31]
n: mean=0.598, std=0.012, 95% CI=[0.575, 0.621]
sigma: mean=0.51, std=0.09, 95% CI=[0.38, 0.71]
The summary shows:
mean: Posterior mean (point estimate)
std: Posterior standard deviation (uncertainty)
95% CI: 95% credible interval
Extract Specific Parameters
Access posterior samples directly:
# Get posterior samples
samples = model.samples
K_samples = samples['K']
n_samples = samples['n']
# Calculate custom statistics
K_median = np.median(K_samples)
K_90_interval = np.percentile(K_samples, [5, 95])
print(f"K median: {K_median:.3f}")
print(f"K 90% interval: [{K_90_interval[0]:.3f}, {K_90_interval[1]:.3f}]")
Visualize Results
Generate diagnostic plots:
import matplotlib.pyplot as plt
# Plot fit with uncertainty
fig, axes = model.plot_fit(
shear_rate,
viscosity,
show_uncertainty=True,
uncertainty_level=0.95
)
plt.savefig('power_law_fit.png', dpi=300)
plt.show()
The plot shows:
Data points (observed viscosity)
Mean prediction (posterior mean fit)
95% credible interval (shaded band)
Make Predictions
Predict at new shear rates with uncertainty:
# New prediction points
new_shear_rate = np.logspace(-2, 3, 100)
# Predict with uncertainty
predictions = model.predict(new_shear_rate, return_uncertainty=True)
mean_pred = predictions['mean']
lower_pred = predictions['lower']
upper_pred = predictions['upper']
# Plot predictions
plt.figure(figsize=(10, 6))
plt.plot(new_shear_rate, mean_pred, 'r-', label='Mean prediction')
plt.fill_between(new_shear_rate, lower_pred, upper_pred,
alpha=0.3, label='95% prediction interval')
plt.scatter(shear_rate, viscosity, c='k', label='Data')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('Shear Rate (1/s)')
plt.ylabel('Viscosity (Pa·s)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Working with Datasets
Quantiq integrates Bayesian uncertainty with the Dataset API:
from piblin_jax.data.datasets import OneDimensionalDataset
# Create dataset
dataset = OneDimensionalDataset(
independent_variable_data=shear_rate,
dependent_variable_data=viscosity,
conditions={'temperature': 25.0, 'sample': 'A1'}
)
# Fit model to dataset
model = PowerLawModel(n_samples=2000)
model.fit(dataset.independent_variable_data,
dataset.dependent_variable_data)
# Add uncertainty to dataset
dataset_with_unc = dataset.with_uncertainty(
model=model,
n_samples=1000,
keep_samples=True
)
# Check uncertainty status
print(f"Has uncertainty: {dataset_with_unc.has_uncertainty}")
# Get credible intervals
lower, upper = dataset_with_unc.get_credible_intervals(level=0.95)
Propagating Uncertainty Through Transforms
Uncertainty can be propagated through data transformations:
from piblin_jax.transform.dataset import GaussianSmoothing
# Create dataset with uncertainty
dataset_with_unc = dataset.with_uncertainty(
model=model,
n_samples=1000,
keep_samples=True
)
# Apply transform with uncertainty propagation
smoother = GaussianSmoothing(sigma=2.0)
smoothed = smoother.apply_to(
dataset_with_unc,
propagate_uncertainty=True
)
# Uncertainty is now propagated through the transform
print(f"Smoothed has uncertainty: {smoothed.has_uncertainty}")
Advanced Example: Arrhenius Model
Temperature-dependent viscosity following Arrhenius equation:
where \(A\) is pre-exponential factor, \(E_a\) is activation energy, \(R\) is gas constant, and \(T\) is temperature.
Generate Temperature-Dependent Data
from piblin_jax.bayesian.models import ArrheniusModel
# True parameters
A_true = 1e-3 # Pa·s
Ea_true = 50000 # J/mol
R = 8.314 # J/(mol·K)
# Temperature range (K)
temperature = np.linspace(273, 373, 15)
# Generate data with noise
np.random.seed(42)
viscosity_true = A_true * np.exp(Ea_true / (R * temperature))
viscosity = viscosity_true * np.random.lognormal(0, 0.1, size=len(temperature))
Fit Arrhenius Model
# Fit model
model = ArrheniusModel(n_samples=2000)
model.fit(temperature, viscosity)
# View results
print(model.summary())
# Plot fit
fig, axes = model.plot_fit(temperature, viscosity, show_uncertainty=True)
plt.xlabel('Temperature (K)')
plt.ylabel('Viscosity (Pa·s)')
plt.show()
Interpreting Results
The posterior samples reveal parameter correlations:
import matplotlib.pyplot as plt
samples = model.samples
A_samples = samples['A']
Ea_samples = samples['Ea']
# Plot joint distribution
plt.figure(figsize=(8, 6))
plt.scatter(A_samples, Ea_samples / 1000, alpha=0.3, s=1)
plt.xlabel('A (Pa·s)')
plt.ylabel('Ea (kJ/mol)')
plt.title('Joint Posterior Distribution')
plt.grid(True, alpha=0.3)
plt.show()
This scatter plot reveals correlation between \(A\) and \(E_a\): if \(A\) is higher, \(E_a\) tends to be higher too.
Model Comparison
Compare different rheological models using Bayesian evidence:
from piblin_jax.bayesian.models import PowerLawModel, CrossModel
# Fit competing models
power_law = PowerLawModel(n_samples=2000)
power_law.fit(shear_rate, viscosity)
cross = CrossModel(n_samples=2000)
cross.fit(shear_rate, viscosity)
# Compare using information criteria
power_law_aic = power_law.aic()
cross_aic = cross.aic()
print(f"Power-law AIC: {power_law_aic:.1f}")
print(f"Cross AIC: {cross_aic:.1f}")
if cross_aic < power_law_aic:
print("Cross model is preferred (lower AIC)")
else:
print("Power-law model is preferred (lower AIC)")
Lower AIC indicates better model fit penalized for complexity.
Tips and Best Practices
- Number of samples
Use at least 1000-2000 samples for reliable uncertainty estimates. More samples give smoother distributions but take longer.
- Convergence diagnostics
Always check
model.is_fittedand examine trace plots to ensure MCMC chains have converged:if not model.is_fitted: print("Warning: Sampling may not have converged") # Increase n_samples or n_warmup
- Prior sensitivity
Bayesian results depend on priors. piblin-jax uses weakly informative priors by default. For custom priors, see the API documentation.
- Computational cost
Bayesian fitting is ~10-100x slower than NLSQ. Use NLSQ first for initial exploration, then Bayesian for final analysis with uncertainty.
- Uncertainty vs confidence
Credible intervals (Bayesian) have a different interpretation than confidence intervals (frequentist). A 95% credible interval means “95% probability the parameter is in this range given the data.”
Next Steps
See Rheological Models Tutorial for detailed model descriptions
See Custom Transforms Tutorial to create uncertainty-aware transforms
See Uncertainty Quantification for complete API reference
See
examples/bayesian_rheological_models.pyfor full code examples
References
Gelman, A., et al. (2013). Bayesian Data Analysis, 3rd Edition. Chapman and Hall/CRC.
McElreath, R. (2020). Statistical Rethinking, 2nd Edition. CRC Press.