Basic Workflow Tutorial
This tutorial demonstrates a complete data analysis workflow using piblin-jax, from data loading through transformation to visualization and analysis.
Overview
We’ll walk through a typical rheology experiment workflow:
Load experimental data
Inspect and visualize raw data
Apply smoothing to reduce noise
Normalize and interpolate data
Extract regions of interest
Perform statistical analysis
Generate publication-quality plots
This tutorial assumes you have piblin-jax installed. See Installation if you need to install it first.
Step 1: Loading Data
Let’s start by loading some experimental rheology data. We’ll create synthetic data for this tutorial, but in practice you’d load from a file.
Creating Sample Data
import numpy as np
import matplotlib.pyplot as plt
from piblin_jax.data import OneDimensionalDataset
# Generate synthetic flow curve data
# (shear rate vs viscosity for a shear-thinning fluid)
np.random.seed(42)
# Shear rate from 0.1 to 100 s^-1
shear_rate = np.logspace(-1, 2, 50)
# Power-law fluid: eta = K * gamma_dot^(n-1)
K = 5.0 # Consistency index
n = 0.6 # Flow behavior index (< 1 = shear-thinning)
# True viscosity with added noise
viscosity_true = K * shear_rate**(n - 1)
noise = 0.05 * viscosity_true * np.random.randn(len(shear_rate))
viscosity = viscosity_true + noise
# Create dataset
dataset = OneDimensionalDataset(
x=shear_rate,
y=viscosity,
x_label='Shear Rate (1/s)',
y_label='Viscosity (Pa.s)',
name='Flow Curve'
)
print(f"Dataset: {dataset.name}")
print(f"Points: {len(dataset.x)}")
print(f"X range: [{dataset.x.min():.2f}, {dataset.x.max():.2f}]")
print(f"Y range: [{dataset.y.min():.2f}, {dataset.y.max():.2f}]")
Loading from File
In real applications, you’d load data from files:
import piblin_jax
# Load CSV file
dataset = piblin_jax.read_file('flow_curve.csv')
# Or use specific reader
from piblin_jax.dataio import CSVReader
reader = CSVReader(x_column=0, y_column=1)
dataset = reader.read('flow_curve.csv')
Step 2: Initial Visualization
Always visualize your raw data first to understand its characteristics:
fig, ax = plt.subplots(figsize=(8, 6))
# Plot on log-log scale (common for rheology)
ax.loglog(dataset.x, dataset.y, 'o', alpha=0.6, label='Raw Data')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title(f'{dataset.name} - Raw Data')
ax.grid(True, alpha=0.3)
ax.legend()
plt.tight_layout()
plt.show()
Key observations from the plot:
Data shows power-law behavior (linear on log-log plot)
Some scatter due to measurement noise
No obvious outliers
Good coverage of shear rate range
Step 3: Data Smoothing
Apply Gaussian smoothing to reduce noise while preserving trends:
from piblin_jax.transform import GaussianSmoothing
# Create smoothing transform
# sigma controls smoothness (higher = more smooth)
smoother = GaussianSmoothing(sigma=1.5)
# Apply to dataset
smoothed = smoother.apply_to(dataset)
print(f"Original dataset: {len(dataset.x)} points")
print(f"Smoothed dataset: {len(smoothed.x)} points")
Compare raw and smoothed data:
fig, ax = plt.subplots(figsize=(8, 6))
ax.loglog(dataset.x, dataset.y, 'o', alpha=0.4, label='Raw Data')
ax.loglog(smoothed.x, smoothed.y, '-', linewidth=2, label='Smoothed')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title('Smoothing Effect')
ax.grid(True, alpha=0.3)
ax.legend()
plt.tight_layout()
plt.show()
Step 4: Interpolation
Interpolate to a regular grid for easier analysis:
from piblin_jax.transform import Interpolate1D
# Create regular grid on log scale
new_shear_rate = np.logspace(-1, 2, 100)
# Interpolate
interpolator = Interpolate1D(
new_x=new_shear_rate,
kind='cubic' # Use cubic interpolation
)
interpolated = interpolator.apply_to(smoothed)
print(f"Interpolated to {len(interpolated.x)} points")
Step 5: Building a Pipeline
Combine multiple transforms into a reusable pipeline:
from piblin_jax.transform import Pipeline
# Create pipeline: smooth -> interpolate
pipeline = Pipeline([
GaussianSmoothing(sigma=1.5),
Interpolate1D(new_x=new_shear_rate, kind='cubic')
])
# Apply pipeline
processed = pipeline.apply_to(dataset)
# Visualize result
fig, ax = plt.subplots(figsize=(8, 6))
ax.loglog(dataset.x, dataset.y, 'o', alpha=0.4, label='Raw')
ax.loglog(processed.x, processed.y, '-', linewidth=2, label='Processed')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title('Pipeline Result')
ax.grid(True, alpha=0.3)
ax.legend()
plt.tight_layout()
plt.show()
Pipelines are reusable - apply to multiple datasets:
dataset1 = piblin_jax.read_file('sample1.csv')
dataset2 = piblin_jax.read_file('sample2.csv')
result1 = pipeline.apply_to(dataset1)
result2 = pipeline.apply_to(dataset2)
Step 6: Region of Interest
Extract and analyze specific regions:
from piblin_jax.transform import SelectRegion
# Extract low shear rate region (gamma_dot < 10 s^-1)
low_shear_selector = SelectRegion(x_min=0.1, x_max=10.0)
low_shear = low_shear_selector.apply_to(processed)
# Extract high shear rate region (gamma_dot > 10 s^-1)
high_shear_selector = SelectRegion(x_min=10.0, x_max=100.0)
high_shear = high_shear_selector.apply_to(processed)
# Visualize regions
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 4))
# Full range
ax1.loglog(processed.x, processed.y, '-', linewidth=2)
ax1.set_xlabel(dataset.x_label)
ax1.set_ylabel(dataset.y_label)
ax1.set_title('Full Range')
ax1.grid(True, alpha=0.3)
# Low shear
ax2.loglog(low_shear.x, low_shear.y, '-', linewidth=2, color='orange')
ax2.set_xlabel(dataset.x_label)
ax2.set_ylabel(dataset.y_label)
ax2.set_title('Low Shear Rate')
ax2.grid(True, alpha=0.3)
# High shear
ax3.loglog(high_shear.x, high_shear.y, '-', linewidth=2, color='green')
ax3.set_xlabel(dataset.x_label)
ax3.set_ylabel(dataset.y_label)
ax3.set_title('High Shear Rate')
ax3.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Step 7: Numerical Derivatives
Calculate shear stress from viscosity and shear rate:
from piblin_jax.transform import Derivative
# Shear stress tau = eta * gamma_dot
# In log-log space, this is addition: log(tau) = log(eta) + log(gamma_dot)
# For direct calculation, use element-wise operations
log_shear_rate = np.log10(processed.x)
log_viscosity = np.log10(processed.y)
log_shear_stress = log_viscosity + log_shear_rate
# Create shear stress dataset
from piblin_jax.data import OneDimensionalDataset
stress_dataset = OneDimensionalDataset(
x=processed.x,
y=10**log_shear_stress,
x_label='Shear Rate (1/s)',
y_label='Shear Stress (Pa)',
name='Shear Stress Curve'
)
# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.loglog(processed.x, processed.y, '-', linewidth=2)
ax1.set_xlabel('Shear Rate (1/s)')
ax1.set_ylabel('Viscosity (Pa.s)')
ax1.set_title('Flow Curve')
ax1.grid(True, alpha=0.3)
ax2.loglog(stress_dataset.x, stress_dataset.y, '-', linewidth=2, color='red')
ax2.set_xlabel('Shear Rate (1/s)')
ax2.set_ylabel('Shear Stress (Pa)')
ax2.set_title('Stress Curve')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Step 8: Statistical Analysis
Perform statistical analysis on processed data:
# Calculate statistics
mean_viscosity = np.mean(processed.y)
std_viscosity = np.std(processed.y)
min_viscosity = np.min(processed.y)
max_viscosity = np.max(processed.y)
print("\\nViscosity Statistics:")
print(f" Mean: {mean_viscosity:.2f} Pa.s")
print(f" Std Dev: {std_viscosity:.2f} Pa.s")
print(f" Range: [{min_viscosity:.2f}, {max_viscosity:.2f}] Pa.s")
# Power-law parameters from log-log slope
log_x = np.log10(processed.x)
log_y = np.log10(processed.y)
# Linear fit in log-log space
coeffs = np.polyfit(log_x, log_y, 1)
slope = coeffs[0]
intercept = coeffs[1]
n_fitted = slope + 1 # Power-law index
K_fitted = 10**intercept # Consistency
print("\\nPower-Law Fit (eta = K*gamma_dot^(n-1)):")
print(f" K (consistency): {K_fitted:.2f} Pa.s^n")
print(f" n (flow index): {n_fitted:.2f}")
print(f" True values: K={K:.2f}, n={n:.2f}")
Step 9: Publication-Quality Plot
Create a polished figure for publication:
fig = plt.figure(figsize=(10, 8))
gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)
# Main plot: Flow curve
ax_main = fig.add_subplot(gs[0, :])
ax_main.loglog(dataset.x, dataset.y, 'o', alpha=0.3,
markersize=6, label='Raw Data')
ax_main.loglog(processed.x, processed.y, '-', linewidth=2.5,
color='darkblue', label='Smoothed & Interpolated')
# Add power-law fit
y_fit = K_fitted * processed.x**(n_fitted - 1)
ax_main.loglog(processed.x, y_fit, '--', linewidth=2,
color='red', alpha=0.7,
label=f'Power-Law Fit (n={n_fitted:.2f})')
ax_main.set_xlabel('Shear Rate, $\\dot{\\gamma}$ (s$^{-1}$)', fontsize=12)
ax_main.set_ylabel('Viscosity, $\\eta$ (Pa.s)', fontsize=12)
ax_main.set_title('Rheological Flow Curve', fontsize=14, fontweight='bold')
ax_main.grid(True, alpha=0.3, which='both')
ax_main.legend(fontsize=10, framealpha=0.9)
# Bottom left: Residuals
ax_resid = fig.add_subplot(gs[1, 0])
residuals = (processed.y - y_fit) / y_fit * 100 # Percent error
ax_resid.semilogx(processed.x, residuals, 'o-', markersize=4, alpha=0.7)
ax_resid.axhline(0, color='black', linestyle='--', alpha=0.5)
ax_resid.set_xlabel('Shear Rate (s$^{-1}$)', fontsize=10)
ax_resid.set_ylabel('Residual (%)', fontsize=10)
ax_resid.set_title('Fit Residuals', fontsize=11)
ax_resid.grid(True, alpha=0.3)
# Bottom right: Statistics
ax_stats = fig.add_subplot(gs[1, 1])
ax_stats.axis('off')
stats_text = f"""
Dataset Statistics
Data Points: {len(processed.x)}
Shear Rate Range:
{processed.x.min():.2f} - {processed.x.max():.2f} s{^-1
Viscosity Range:
{processed.y.min():.2f} - {processed.y.max():.2f} Pa.s
Power-Law Parameters:
K = {K_fitted:.2f} Pa.s
n = {n_fitted:.2f}
Shear-Thinning Index:
{((1-n_fitted)*100):.0f}% (n < 1)
"""
ax_stats.text(0.1, 0.9, stats_text, transform=ax_stats.transAxes,
fontsize=9, verticalalignment='top',
fontfamily='monospace',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))
plt.suptitle('Rheological Analysis with piblin-jax', fontsize=15,
fontweight='bold', y=0.98)
# Save figure
plt.savefig('rheology_analysis.png', dpi=300, bbox_inches='tight')
plt.show()
print("\\nFigure saved as 'rheology_analysis.png'")
Step 10: Working with Multiple Samples
Analyze multiple samples using measurement sets:
from piblin_jax.data.collections import MeasurementSet
# Create multiple datasets (e.g., different temperatures)
temperatures = [20, 40, 60] # degC
datasets = {}
for temp in temperatures:
# Generate data with temperature-dependent viscosity
# (Arrhenius behavior)
viscosity_temp = viscosity * np.exp(0.02 * (temp - 20))
noise_temp = 0.05 * viscosity_temp * np.random.randn(len(shear_rate))
datasets[temp] = OneDimensionalDataset(
x=shear_rate,
y=viscosity_temp + noise_temp,
x_label='Shear Rate (1/s)',
y_label='Viscosity (Pa.s)',
name=f'Flow Curve @ {temp} degC'
)
# Apply same pipeline to all datasets
processed_datasets = {}
for temp, ds in datasets.items():
processed_datasets[temp] = pipeline.apply_to(ds)
# Visualize all temperatures
fig, ax = plt.subplots(figsize=(10, 7))
colors = plt.cm.coolwarm(np.linspace(0, 1, len(temperatures)))
for i, (temp, ds) in enumerate(processed_datasets.items()):
ax.loglog(ds.x, ds.y, '-', linewidth=2,
color=colors[i], label=f'{temp} degC')
ax.set_xlabel('Shear Rate (s$^{-1}$)', fontsize=12)
ax.set_ylabel('Viscosity (Pa.s)', fontsize=12)
ax.set_title('Temperature-Dependent Flow Curves', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, which='both')
ax.legend(fontsize=10, title='Temperature')
plt.tight_layout()
plt.show()
Summary
In this tutorial, we’ve covered a complete workflow:
Data loading (synthetic and from files)
Initial visualization and inspection
Data smoothing and noise reduction
Interpolation to regular grids
Building reusable transform pipelines
Region selection and analysis
Derivative calculations
Statistical analysis and model fitting
Publication-quality visualization
Multi-sample analysis
Next Steps
Bayesian Analysis: See Uncertainty Quantification Tutorial for advanced parameter estimation with uncertainty
Custom Transforms: Learn to create your own transforms in Custom Transforms Tutorial
Rheological Models: Explore built-in models in Rheological Models Tutorial
Performance: Optimize for large datasets in Performance Guide
Complete Code
The complete code for this tutorial is available in the
examples/ directory as basic_workflow.py.
To run it:
python examples/basic_workflow.py