Basic Workflow Tutorial

This tutorial demonstrates a complete data analysis workflow using piblin-jax, from data loading through transformation to visualization and analysis.

Overview

We’ll walk through a typical rheology experiment workflow:

  1. Load experimental data

  2. Inspect and visualize raw data

  3. Apply smoothing to reduce noise

  4. Normalize and interpolate data

  5. Extract regions of interest

  6. Perform statistical analysis

  7. Generate publication-quality plots

This tutorial assumes you have piblin-jax installed. See Installation if you need to install it first.

Step 1: Loading Data

Let’s start by loading some experimental rheology data. We’ll create synthetic data for this tutorial, but in practice you’d load from a file.

Creating Sample Data

import numpy as np
import matplotlib.pyplot as plt
from piblin_jax.data import OneDimensionalDataset

# Generate synthetic flow curve data
# (shear rate vs viscosity for a shear-thinning fluid)
np.random.seed(42)

# Shear rate from 0.1 to 100 s^-1
shear_rate = np.logspace(-1, 2, 50)

# Power-law fluid: eta = K * gamma_dot^(n-1)
K = 5.0  # Consistency index
n = 0.6  # Flow behavior index (< 1 = shear-thinning)

# True viscosity with added noise
viscosity_true = K * shear_rate**(n - 1)
noise = 0.05 * viscosity_true * np.random.randn(len(shear_rate))
viscosity = viscosity_true + noise

# Create dataset
dataset = OneDimensionalDataset(
    x=shear_rate,
    y=viscosity,
    x_label='Shear Rate (1/s)',
    y_label='Viscosity (Pa.s)',
    name='Flow Curve'
)

print(f"Dataset: {dataset.name}")
print(f"Points: {len(dataset.x)}")
print(f"X range: [{dataset.x.min():.2f}, {dataset.x.max():.2f}]")
print(f"Y range: [{dataset.y.min():.2f}, {dataset.y.max():.2f}]")

Loading from File

In real applications, you’d load data from files:

import piblin_jax

# Load CSV file
dataset = piblin_jax.read_file('flow_curve.csv')

# Or use specific reader
from piblin_jax.dataio import CSVReader

reader = CSVReader(x_column=0, y_column=1)
dataset = reader.read('flow_curve.csv')

Step 2: Initial Visualization

Always visualize your raw data first to understand its characteristics:

fig, ax = plt.subplots(figsize=(8, 6))

# Plot on log-log scale (common for rheology)
ax.loglog(dataset.x, dataset.y, 'o', alpha=0.6, label='Raw Data')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title(f'{dataset.name} - Raw Data')
ax.grid(True, alpha=0.3)
ax.legend()

plt.tight_layout()
plt.show()

Key observations from the plot:

  • Data shows power-law behavior (linear on log-log plot)

  • Some scatter due to measurement noise

  • No obvious outliers

  • Good coverage of shear rate range

Step 3: Data Smoothing

Apply Gaussian smoothing to reduce noise while preserving trends:

from piblin_jax.transform import GaussianSmoothing

# Create smoothing transform
# sigma controls smoothness (higher = more smooth)
smoother = GaussianSmoothing(sigma=1.5)

# Apply to dataset
smoothed = smoother.apply_to(dataset)

print(f"Original dataset: {len(dataset.x)} points")
print(f"Smoothed dataset: {len(smoothed.x)} points")

Compare raw and smoothed data:

fig, ax = plt.subplots(figsize=(8, 6))

ax.loglog(dataset.x, dataset.y, 'o', alpha=0.4, label='Raw Data')
ax.loglog(smoothed.x, smoothed.y, '-', linewidth=2, label='Smoothed')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title('Smoothing Effect')
ax.grid(True, alpha=0.3)
ax.legend()

plt.tight_layout()
plt.show()

Step 4: Interpolation

Interpolate to a regular grid for easier analysis:

from piblin_jax.transform import Interpolate1D

# Create regular grid on log scale
new_shear_rate = np.logspace(-1, 2, 100)

# Interpolate
interpolator = Interpolate1D(
    new_x=new_shear_rate,
    kind='cubic'  # Use cubic interpolation
)

interpolated = interpolator.apply_to(smoothed)

print(f"Interpolated to {len(interpolated.x)} points")

Step 5: Building a Pipeline

Combine multiple transforms into a reusable pipeline:

from piblin_jax.transform import Pipeline

# Create pipeline: smooth  ->  interpolate
pipeline = Pipeline([
    GaussianSmoothing(sigma=1.5),
    Interpolate1D(new_x=new_shear_rate, kind='cubic')
])

# Apply pipeline
processed = pipeline.apply_to(dataset)

# Visualize result
fig, ax = plt.subplots(figsize=(8, 6))

ax.loglog(dataset.x, dataset.y, 'o', alpha=0.4, label='Raw')
ax.loglog(processed.x, processed.y, '-', linewidth=2, label='Processed')
ax.set_xlabel(dataset.x_label)
ax.set_ylabel(dataset.y_label)
ax.set_title('Pipeline Result')
ax.grid(True, alpha=0.3)
ax.legend()

plt.tight_layout()
plt.show()

Pipelines are reusable - apply to multiple datasets:

dataset1 = piblin_jax.read_file('sample1.csv')
dataset2 = piblin_jax.read_file('sample2.csv')

result1 = pipeline.apply_to(dataset1)
result2 = pipeline.apply_to(dataset2)

Step 6: Region of Interest

Extract and analyze specific regions:

from piblin_jax.transform import SelectRegion

# Extract low shear rate region (gamma_dot < 10 s^-1)
low_shear_selector = SelectRegion(x_min=0.1, x_max=10.0)
low_shear = low_shear_selector.apply_to(processed)

# Extract high shear rate region (gamma_dot > 10 s^-1)
high_shear_selector = SelectRegion(x_min=10.0, x_max=100.0)
high_shear = high_shear_selector.apply_to(processed)

# Visualize regions
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 4))

# Full range
ax1.loglog(processed.x, processed.y, '-', linewidth=2)
ax1.set_xlabel(dataset.x_label)
ax1.set_ylabel(dataset.y_label)
ax1.set_title('Full Range')
ax1.grid(True, alpha=0.3)

# Low shear
ax2.loglog(low_shear.x, low_shear.y, '-', linewidth=2, color='orange')
ax2.set_xlabel(dataset.x_label)
ax2.set_ylabel(dataset.y_label)
ax2.set_title('Low Shear Rate')
ax2.grid(True, alpha=0.3)

# High shear
ax3.loglog(high_shear.x, high_shear.y, '-', linewidth=2, color='green')
ax3.set_xlabel(dataset.x_label)
ax3.set_ylabel(dataset.y_label)
ax3.set_title('High Shear Rate')
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Step 7: Numerical Derivatives

Calculate shear stress from viscosity and shear rate:

from piblin_jax.transform import Derivative

# Shear stress tau = eta * gamma_dot
# In log-log space, this is addition: log(tau) = log(eta) + log(gamma_dot)

# For direct calculation, use element-wise operations
log_shear_rate = np.log10(processed.x)
log_viscosity = np.log10(processed.y)
log_shear_stress = log_viscosity + log_shear_rate

# Create shear stress dataset
from piblin_jax.data import OneDimensionalDataset

stress_dataset = OneDimensionalDataset(
    x=processed.x,
    y=10**log_shear_stress,
    x_label='Shear Rate (1/s)',
    y_label='Shear Stress (Pa)',
    name='Shear Stress Curve'
)

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.loglog(processed.x, processed.y, '-', linewidth=2)
ax1.set_xlabel('Shear Rate (1/s)')
ax1.set_ylabel('Viscosity (Pa.s)')
ax1.set_title('Flow Curve')
ax1.grid(True, alpha=0.3)

ax2.loglog(stress_dataset.x, stress_dataset.y, '-', linewidth=2, color='red')
ax2.set_xlabel('Shear Rate (1/s)')
ax2.set_ylabel('Shear Stress (Pa)')
ax2.set_title('Stress Curve')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Step 8: Statistical Analysis

Perform statistical analysis on processed data:

# Calculate statistics
mean_viscosity = np.mean(processed.y)
std_viscosity = np.std(processed.y)
min_viscosity = np.min(processed.y)
max_viscosity = np.max(processed.y)

print("\\nViscosity Statistics:")
print(f"  Mean: {mean_viscosity:.2f} Pa.s")
print(f"  Std Dev: {std_viscosity:.2f} Pa.s")
print(f"  Range: [{min_viscosity:.2f}, {max_viscosity:.2f}] Pa.s")

# Power-law parameters from log-log slope
log_x = np.log10(processed.x)
log_y = np.log10(processed.y)

# Linear fit in log-log space
coeffs = np.polyfit(log_x, log_y, 1)
slope = coeffs[0]
intercept = coeffs[1]

n_fitted = slope + 1  # Power-law index
K_fitted = 10**intercept  # Consistency

print("\\nPower-Law Fit (eta = K*gamma_dot^(n-1)):")
print(f"  K (consistency): {K_fitted:.2f} Pa.s^n")
print(f"  n (flow index): {n_fitted:.2f}")
print(f"  True values: K={K:.2f}, n={n:.2f}")

Step 9: Publication-Quality Plot

Create a polished figure for publication:

fig = plt.figure(figsize=(10, 8))
gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)

# Main plot: Flow curve
ax_main = fig.add_subplot(gs[0, :])
ax_main.loglog(dataset.x, dataset.y, 'o', alpha=0.3,
               markersize=6, label='Raw Data')
ax_main.loglog(processed.x, processed.y, '-', linewidth=2.5,
               color='darkblue', label='Smoothed & Interpolated')

# Add power-law fit
y_fit = K_fitted * processed.x**(n_fitted - 1)
ax_main.loglog(processed.x, y_fit, '--', linewidth=2,
               color='red', alpha=0.7,
               label=f'Power-Law Fit (n={n_fitted:.2f})')

ax_main.set_xlabel('Shear Rate, $\\dot{\\gamma}$ (s$^{-1}$)', fontsize=12)
ax_main.set_ylabel('Viscosity, $\\eta$ (Pa.s)', fontsize=12)
ax_main.set_title('Rheological Flow Curve', fontsize=14, fontweight='bold')
ax_main.grid(True, alpha=0.3, which='both')
ax_main.legend(fontsize=10, framealpha=0.9)

# Bottom left: Residuals
ax_resid = fig.add_subplot(gs[1, 0])
residuals = (processed.y - y_fit) / y_fit * 100  # Percent error
ax_resid.semilogx(processed.x, residuals, 'o-', markersize=4, alpha=0.7)
ax_resid.axhline(0, color='black', linestyle='--', alpha=0.5)
ax_resid.set_xlabel('Shear Rate (s$^{-1}$)', fontsize=10)
ax_resid.set_ylabel('Residual (%)', fontsize=10)
ax_resid.set_title('Fit Residuals', fontsize=11)
ax_resid.grid(True, alpha=0.3)

# Bottom right: Statistics
ax_stats = fig.add_subplot(gs[1, 1])
ax_stats.axis('off')

stats_text = f"""
Dataset Statistics
Data Points: {len(processed.x)}

Shear Rate Range:
  {processed.x.min():.2f} - {processed.x.max():.2f} s{^-1

Viscosity Range:
  {processed.y.min():.2f} - {processed.y.max():.2f} Pa.s

Power-Law Parameters:
  K = {K_fitted:.2f} Pa.s
  n = {n_fitted:.2f}

Shear-Thinning Index:
  {((1-n_fitted)*100):.0f}% (n < 1)
"""

ax_stats.text(0.1, 0.9, stats_text, transform=ax_stats.transAxes,
              fontsize=9, verticalalignment='top',
              fontfamily='monospace',
              bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))

plt.suptitle('Rheological Analysis with piblin-jax', fontsize=15,
             fontweight='bold', y=0.98)

# Save figure
plt.savefig('rheology_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\\nFigure saved as 'rheology_analysis.png'")

Step 10: Working with Multiple Samples

Analyze multiple samples using measurement sets:

from piblin_jax.data.collections import MeasurementSet

# Create multiple datasets (e.g., different temperatures)
temperatures = [20, 40, 60]  #  degC
datasets = {}

for temp in temperatures:
    # Generate data with temperature-dependent viscosity
    # (Arrhenius behavior)
    viscosity_temp = viscosity * np.exp(0.02 * (temp - 20))
    noise_temp = 0.05 * viscosity_temp * np.random.randn(len(shear_rate))

    datasets[temp] = OneDimensionalDataset(
        x=shear_rate,
        y=viscosity_temp + noise_temp,
        x_label='Shear Rate (1/s)',
        y_label='Viscosity (Pa.s)',
        name=f'Flow Curve @ {temp} degC'
    )

# Apply same pipeline to all datasets
processed_datasets = {}
for temp, ds in datasets.items():
    processed_datasets[temp] = pipeline.apply_to(ds)

# Visualize all temperatures
fig, ax = plt.subplots(figsize=(10, 7))

colors = plt.cm.coolwarm(np.linspace(0, 1, len(temperatures)))

for i, (temp, ds) in enumerate(processed_datasets.items()):
    ax.loglog(ds.x, ds.y, '-', linewidth=2,
             color=colors[i], label=f'{temp} degC')

ax.set_xlabel('Shear Rate (s$^{-1}$)', fontsize=12)
ax.set_ylabel('Viscosity (Pa.s)', fontsize=12)
ax.set_title('Temperature-Dependent Flow Curves', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, which='both')
ax.legend(fontsize=10, title='Temperature')

plt.tight_layout()
plt.show()

Summary

In this tutorial, we’ve covered a complete workflow:

  1.  Data loading (synthetic and from files)

  2.  Initial visualization and inspection

  3.  Data smoothing and noise reduction

  4.  Interpolation to regular grids

  5.  Building reusable transform pipelines

  6.  Region selection and analysis

  7.  Derivative calculations

  8.  Statistical analysis and model fitting

  9.  Publication-quality visualization

  10.  Multi-sample analysis

Next Steps

Complete Code

The complete code for this tutorial is available in the examples/ directory as basic_workflow.py.

To run it:

python examples/basic_workflow.py