Data Structures

Overview

The piblin_jax.data module provides the core data structures for representing measurement data in piblin_jax. It implements a hierarchical system for organizing experimental data, from individual measurements to complex experimental campaigns with multiple conditions and replicates.

The module is built around three key concepts:

  • Immutability: All data structures are immutable by design, ensuring data integrity and enabling safe sharing across transforms and analyses. Once created, datasets cannot be modified; instead, transformations return new datasets.

  • Type Safety: Each dataset type (0D, 1D, 2D, 3D) has specific guarantees about its structure. This type system enables compile-time validation and better IDE support, while maintaining flexibility through metadata.

  • Hierarchical Organization: Data is organized in a natural hierarchy: Dataset → Measurement → MeasurementSet → Experiment → ExperimentSet. This mirrors typical experimental workflows where you collect replicate measurements under various conditions.

The module also provides comprehensive metadata support with validation, merging utilities, and Region of Interest (ROI) definitions for selective data processing.

Quick Examples

Creating a 1D Dataset

The most common use case is creating a 1D dataset from arrays:

from piblin_jax.data.datasets import OneDimensionalDataset
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)
dataset = OneDimensionalDataset(x=x, y=y, name="Sine Wave")

Working with Metadata

Datasets support rich metadata for tracking experimental conditions:

dataset = OneDimensionalDataset(
    x=x, y=y,
    name="Viscosity vs Shear Rate",
    metadata={
        "temperature": 25.0,
        "sample_id": "ABC123",
        "operator": "Jane Doe",
        "timestamp": "2024-01-15T10:30:00"
    }
)

# Access metadata
temp = dataset.metadata["temperature"]

Building Collections

Organize multiple measurements into collections:

from piblin_jax.data.collections import Measurement, MeasurementSet

# Create a measurement with multiple datasets
measurement = Measurement(
    datasets=[dataset1, dataset2, dataset3],
    metadata={"replicate": 1, "temperature": 25.0}
)

# Group measurements into a set
measurement_set = MeasurementSet(
    measurements=[meas1, meas2, meas3],
    metadata={"experiment_id": "EXP001"}
)

See Also

API Reference

Module Contents

Data types and utilities for piblin-jax.

This package provides the core data structures for measurement data science: - Datasets: Typed array containers (0D, 1D, 2D, 3D, composite, distributions) - Collections: Hierarchical measurement organization (Measurement, MeasurementSet, Experiment, ExperimentSet) - Metadata: Structured conditions and details with validation and merging - ROI: Region of interest definitions for selective data analysis

## Package Structure

### Datasets Module (piblin_jax.data.datasets)

Core dataset classes for different dimensionalities: - ZeroDimensionalDataset: Scalar values with metadata - OneDimensionalDataset: Paired (x, y) data (time series, spectra, etc.) - TwoDimensionalDataset: 2D grid data (heatmaps, images) - ThreeDimensionalDataset: 3D volumetric data - OneDimensionalCompositeDataset: Multiple dependent variables with shared x-axis - Histogram: Binned frequency distributions - Distribution: Probability density functions

All datasets include: - JAX/NumPy backend abstraction for performance - Metadata system (conditions and details) - Uncertainty quantification support - Immutable design for functional programming - Type-safe API with comprehensive type hints

### Collections Module (piblin_jax.data.collections)

Hierarchical organization for experimental data:

ExperimentSet (top level)
└── Experiment (single experimental condition set)
    └── MeasurementSet (group of related measurements)
        └── Measurement (individual measurement with datasets)

Collection Types: - Measurement: Container for related datasets from one measurement - MeasurementSet: Group of measurements (e.g., replicate trials) - ConsistentMeasurementSet: Enforces same conditions across measurements - TabularMeasurementSet: Optimized for tabular data access - TidyMeasurementSet: Tidy (long-form) data representation - Experiment: Collection of measurement sets under same conditions - ExperimentSet: Top-level container for multiple experiments

### Metadata Module (piblin_jax.data.metadata)

Metadata management utilities: - Merging: Combine metadata from multiple sources with conflict resolution - Validation: Type checking and schema validation - Extraction: Parse metadata from filenames, paths, and file headers - Separation: Distinguish experimental conditions from details

Supported Operations: - merge_metadata() - Combine metadata with strategies (override, keep_first, raise, list) - validate_metadata() - Validate against schemas with type checking - extract_from_filename() - Parse metadata from file naming patterns - extract_from_path() - Extract metadata from directory structure - parse_header_metadata() - Parse comment headers in data files - separate_conditions_details() - Split metadata into conditions and details

### ROI Module (piblin_jax.data.roi)

Region of interest definitions for selective analysis: - ROI: Base class for defining regions in datasets - Support for 1D, 2D, and 3D regions - Boolean masking and index-based selection - Integration with transform pipeline

## Usage Examples

### Basic Dataset Creation

Example:

import numpy as np
from piblin_jax.data.datasets import OneDimensionalDataset

# Create 1D dataset
x = np.linspace(0, 10, 100)
y = np.sin(x)
dataset = OneDimensionalDataset(
    independent_variable_data=x,
    dependent_variable_data=y,
    conditions={"temperature": 25.0, "sample": "A"},
    details={"operator": "John", "date": "2025-01-15"}
)

### Building Hierarchical Collections

Example:

from piblin_jax.data.collections import Measurement, MeasurementSet, Experiment

# Create measurements
m1 = Measurement({"dataset1": dataset1, "dataset2": dataset2})
m2 = Measurement({"dataset1": dataset3, "dataset2": dataset4})

# Group into measurement set
mset = MeasurementSet([m1, m2])

# Create experiment
experiment = Experiment({"trial1": mset})

### Metadata Operations

Example:

from piblin_jax.data import metadata

# Merge metadata from multiple sources
file_meta = metadata.extract_from_filename("sample_A1_25C.csv")
path_meta = {"experiment": "viscosity"}
combined = metadata.merge_metadata([file_meta, path_meta])

# Validate against schema
schema = {"temperature": float, "sample": str}
metadata.validate_metadata(combined, schema=schema)

# Separate conditions from details
conditions, details = metadata.separate_conditions_details(
    combined,
    condition_keys=["temperature", "pressure"]
)

## Design Principles

  1. Type Safety: Comprehensive type hints for all public APIs

  2. Immutability: Datasets are immutable by design (functional programming)

  3. Backend Agnostic: JAX for performance, NumPy for compatibility

  4. Metadata-First: Rich metadata support throughout the hierarchy

  5. Hierarchical Organization: Natural experiment → measurement → dataset structure

  6. Extensibility: Easy to add custom dataset types and collection classes

## See Also

  • piblin_jax.transform - Transform pipelines for data processing

  • piblin_jax.bayesian - Bayesian uncertainty quantification

  • piblin_jax.dataio - File I/O for reading experimental data

  • piblin_jax.backend - Backend abstraction layer (JAX/NumPy)

Datasets

Base Dataset

Base dataset class for piblin-jax.

Provides the abstract base class for all dataset types with metadata support.

class piblin_jax.data.datasets.base.Dataset(conditions=None, details=None)[source]

Bases: ABC

Abstract base class for all dataset types.

All piblin-jax datasets inherit from this class and provide: - Metadata system (conditions and details) - Internal storage using backend arrays (JAX or NumPy) - External NumPy conversion for API boundaries - Immutable design for JAX compatibility

Parameters:
  • conditions (dict[str, Any] | None, optional) – Experimental conditions (temperature, pressure, flow rate, etc.). Default is empty dict.

  • details (dict[str, Any] | None, optional) – Additional context (sample ID, operator, instrument, date, etc.). Default is empty dict.

conditions

Experimental conditions associated with the dataset.

Type:

dict[str, Any]

details

Additional metadata and context for the dataset.

Type:

dict[str, Any]

Notes

This class cannot be instantiated directly. Use one of the concrete dataset types: - ZeroDimensionalDataset (0D) - OneDimensionalDataset (1D) - TwoDimensionalDataset (2D) - ThreeDimensionalDataset (3D) - Histogram - Distribution - OneDimensionalCompositeDataset

The dataset uses an immutable design pattern to ensure compatibility with JAX transformations (jit, grad, vmap). Arrays are stored internally as backend arrays (JAX DeviceArray when available, NumPy ndarray otherwise) and converted to NumPy arrays when accessed through properties.

Examples

>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> import numpy as np
>>> x = np.linspace(0, 10, 100)
>>> y = np.sin(x)
>>> conditions = {"temperature": 25.0, "sample": "A"}
>>> details = {"operator": "Jane Doe", "date": "2025-10-18"}
>>> dataset = OneDimensionalDataset(
...     independent_variable_data=x,
...     dependent_variable_data=y,
...     conditions=conditions,
...     details=details
... )
>>> dataset.conditions["temperature"]
25.0
>>> type(dataset.independent_variable_data)
<class 'numpy.ndarray'>
Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

__init__(conditions=None, details=None)[source]

Initialize Dataset with metadata.

Parameters:
  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional context and metadata.

property conditions: dict[str, Any]

Get experimental conditions.

No-index:

Returns:

Dictionary of experimental conditions (temperature, pressure, etc.).

Return type:

dict[str, Any]

Examples

>>> dataset.conditions
{'temperature': 25.0, 'pressure': 1.0, 'sample': 'A'}
property details: dict[str, Any]

Get additional dataset details.

No-index:

Returns:

Dictionary of additional context (operator, instrument, date, etc.).

Return type:

dict[str, Any]

Examples

>>> dataset.details
{'operator': 'Jane Doe', 'instrument': 'Spectrometer X', 'date': '2025-10-18'}
property has_uncertainty: bool

Check if dataset has uncertainty information.

No-index:

Returns:

True if dataset has uncertainty information, False otherwise.

Return type:

bool

Examples

>>> dataset.has_uncertainty
False
>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000)
>>> dataset_with_unc.has_uncertainty
True

Notes

This property checks for the presence of either uncertainty samples or cached credible intervals. It does not validate the uncertainty quantification method or parameter values.

property uncertainty_samples: Any | None

Get uncertainty samples (if keep_samples=True was used).

No-index:

Returns:

Posterior samples from Bayesian inference if keep_samples=True, None otherwise.

Return type:

dict | None

Examples

>>> dataset_with_unc = dataset.with_uncertainty(
...     n_samples=1000,
...     method='bayesian',
...     keep_samples=True
... )
>>> samples = dataset_with_unc.uncertainty_samples
>>> sigma_samples = samples['sigma']

Notes

Storing samples can be memory-intensive for large datasets. Use keep_samples=False if you only need credible intervals.

property credible_intervals: Any | None

Get cached credible intervals.

No-index:

Returns:

Cached credible intervals (lower, upper) if computed, None otherwise.

Return type:

tuple | None

Examples

>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000)
>>> intervals = dataset_with_unc.credible_intervals
>>> if intervals is not None:
...     lower, upper = intervals

Notes

Credible intervals are cached after computation to avoid recomputation. Use get_credible_intervals() to compute intervals with custom parameters.

copy()[source]

Create a deep copy of this dataset.

Returns:

A new dataset instance with copied data and metadata.

Return type:

Dataset

Examples

>>> dataset_copy = dataset.copy()
>>> dataset_copy.conditions is not dataset.conditions
True

Notes

This creates a deep copy of all data arrays, metadata, and uncertainty information. The copied dataset is completely independent of the original.

Zero-Dimensional Dataset

Zero-dimensional dataset for scalar values.

Used for single values like steady-state measurements, summary statistics, or aggregated results.

class piblin_jax.data.datasets.zero_dimensional.ZeroDimensionalDataset(value, conditions=None, details=None)[source]

Bases: Dataset

Zero-dimensional dataset containing a single scalar value.

This dataset type represents a single measured or calculated value, such as a steady-state measurement, summary statistic, or aggregated result.

Parameters:
  • value (float) – The scalar value to store.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions associated with this measurement.

  • details (dict[str, Any] | None, optional) – Additional context and metadata.

value

The scalar value (converted to Python float).

Type:

float

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Examples

>>> from piblin_jax.data.datasets import ZeroDimensionalDataset
>>> # Steady-state temperature measurement
>>> temp = ZeroDimensionalDataset(
...     value=98.6,
...     conditions={"location": "oral", "patient_id": "12345"},
...     details={"units": "fahrenheit", "instrument": "thermometer"}
... )
>>> temp.value
98.6
>>> # Summary statistic
>>> mean_concentration = ZeroDimensionalDataset(
...     value=2.5e-3,
...     conditions={"sample": "batch_42"},
...     details={"units": "mol/L", "statistic": "mean"}
... )
>>> mean_concentration.value
0.0025

Notes

The value is stored internally as a backend array (JAX or NumPy scalar) and converted to a Python float when accessed through the value property. This ensures compatibility with both JAX transformations and standard Python numeric operations.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

value

Get the scalar value as a Python float.

Methods

copy()

Create a deep copy of this dataset.

__init__(value, conditions=None, details=None)[source]

Initialize zero-dimensional dataset with a scalar value.

Parameters:
  • value (float) – The scalar value to store.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

property value: float

Get the scalar value as a Python float.

No-index:

Returns:

The stored scalar value.

Return type:

float

Examples

>>> dataset = ZeroDimensionalDataset(value=42.5)
>>> dataset.value
42.5
>>> type(dataset.value)
<class 'float'> or <class 'numpy.floating'>

One-Dimensional Dataset

One-dimensional dataset with independent and dependent variables.

This is the most common dataset type, used for time series, spectra, chromatograms, and other 1D data.

class piblin_jax.data.datasets.one_dimensional.OneDimensionalDataset(independent_variable_data, dependent_variable_data, conditions=None, details=None)[source]

Bases: Dataset

One-dimensional dataset with independent and dependent variables.

This is the most common dataset type, representing paired (x, y) data such as: - Time series measurements - Spectroscopy data (wavelength vs. absorbance) - Chromatography traces (time vs. detector response) - Titration curves (volume vs. pH)

Parameters:
  • independent_variable_data (array_like) – 1D array of independent variable values (e.g., time, wavelength).

  • dependent_variable_data (array_like) – 1D array of dependent variable values (e.g., signal, absorbance).

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

independent_variable_data

Independent variable as NumPy array.

Type:

np.ndarray

dependent_variable_data

Dependent variable as NumPy array.

Type:

np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If independent and dependent arrays have different shapes.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> # Time series data
>>> time = np.linspace(0, 10, 100)
>>> signal = np.sin(time)
>>> dataset = OneDimensionalDataset(
...     independent_variable_data=time,
...     dependent_variable_data=signal,
...     conditions={"temperature": 25.0, "sample": "A"},
...     details={"instrument": "oscilloscope", "sampling_rate": 10.0}
... )
>>> dataset.independent_variable_data.shape
(100,)
>>> dataset.dependent_variable_data.shape
(100,)
>>> # Spectroscopy data
>>> wavelength = np.linspace(200, 800, 500)
>>> absorbance = np.exp(-((wavelength - 450) ** 2) / 5000)
>>> spectrum = OneDimensionalDataset(
...     independent_variable_data=wavelength,
...     dependent_variable_data=absorbance,
...     conditions={"concentration": 1e-5, "solvent": "water"},
...     details={"units_x": "nm", "units_y": "AU"}
... )

Notes

Arrays are stored internally as backend arrays (JAX DeviceArray when available, NumPy ndarray otherwise) and converted to NumPy arrays when accessed through properties. This ensures compatibility with JAX transformations while maintaining a NumPy-compatible API.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

dependent_variable_data

Get dependent variable data as NumPy array.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

independent_variable_data

Get independent variable data as NumPy array.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

get_credible_intervals([level, method])

Get credible intervals for dependent variable.

visualize([show_uncertainty, level, ...])

Visualize the 1D dataset with optional uncertainty bands.

with_uncertainty([n_samples, method, ...])

Add uncertainty quantification to dataset.

__init__(independent_variable_data, dependent_variable_data, conditions=None, details=None)[source]

Initialize one-dimensional dataset.

Parameters:
  • independent_variable_data (array_like) – 1D array of independent variable values.

  • dependent_variable_data (array_like) – 1D array of dependent variable values.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If arrays have different shapes.

property independent_variable_data: ndarray

Get independent variable data as NumPy array.

No-index:

Returns:

1D NumPy array of independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data
array([0., 0.1, 0.2, ..., 9.8, 9.9, 10.])
property dependent_variable_data: ndarray

Get dependent variable data as NumPy array.

No-index:

Returns:

1D NumPy array of dependent variable values.

Return type:

np.ndarray

Examples

>>> dataset.dependent_variable_data
array([0.000, 0.099, 0.198, ..., -0.544, -0.456, -0.544])
with_uncertainty(n_samples=1000, method='bayesian', keep_samples=False, level=0.95)[source]

Add uncertainty quantification to dataset.

This method creates a new dataset with uncertainty information computed using the specified method. The original dataset is not modified.

Parameters:
  • n_samples (int, optional) – Number of samples for uncertainty quantification (default: 1000)

  • method (str, optional) – Method for uncertainty quantification (default: ‘bayesian’): - ‘bayesian’: NumPyro MCMC sampling - ‘bootstrap’: Bootstrap resampling (not yet implemented) - ‘analytical’: Analytical uncertainty propagation (not yet implemented)

  • keep_samples (bool, optional) – If True, store full posterior samples (default: False)

  • level (float, optional) – Credible interval level (default: 0.95)

Returns:

New dataset with uncertainty information

Return type:

OneDimensionalDataset

Raises:

NotImplementedError – If method is not ‘bayesian’

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> x = np.linspace(0, 10, 50)
>>> y = 2.0 * x + 1.0 + 0.1 * np.random.randn(len(x))
>>> dataset = OneDimensionalDataset(
...     independent_variable_data=x,
...     dependent_variable_data=y
... )
>>> # Add Bayesian uncertainty
>>> dataset_with_unc = dataset.with_uncertainty(
...     n_samples=1000,
...     method='bayesian',
...     keep_samples=False,
...     level=0.95
... )
>>> dataset_with_unc.has_uncertainty
True
>>> lower, upper = dataset_with_unc.credible_intervals
>>> # With full samples
>>> dataset_with_samples = dataset.with_uncertainty(
...     n_samples=1000,
...     keep_samples=True
... )
>>> samples = dataset_with_samples.uncertainty_samples
>>> sigma_samples = samples['sigma']

Notes

Currently only the ‘bayesian’ method is implemented. This uses a simple Gaussian noise model to estimate measurement uncertainty. Future versions will support custom priors and more sophisticated models.

The method creates a copy of the dataset to preserve immutability.

get_credible_intervals(level=0.95, method='eti')[source]

Get credible intervals for dependent variable.

Parameters:
  • level (float, optional) – Credible interval level (default: 0.95)

  • method (str, optional) – Method for computing intervals (default: ‘eti’): - ‘eti’: Equal-tailed interval - ‘hpd’: Highest posterior density (not yet implemented)

Returns:

(lower_bound, upper_bound) arrays with same shape as dependent variable

Return type:

tuple[np.ndarray, np.ndarray]

Raises:

Examples

>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000)
>>> lower, upper = dataset_with_unc.get_credible_intervals(level=0.95)
>>> # 68% interval (approximately 1 sigma)
>>> lower_68, upper_68 = dataset_with_unc.get_credible_intervals(level=0.68)

Notes

If credible intervals have been cached (from with_uncertainty call), they are returned directly. Otherwise, they are computed from stored uncertainty samples.

For the simple Gaussian noise model, the credible intervals represent the uncertainty in the measurement noise level, not the data points themselves.

visualize(show_uncertainty=False, level=0.95, figsize=(10, 6), xlabel=None, ylabel=None, title=None, **kwargs)[source]

Visualize the 1D dataset with optional uncertainty bands.

Creates a line plot of the data with optional shaded uncertainty regions when the dataset has uncertainty information.

Parameters:
  • show_uncertainty (bool, default False) – If True and dataset has uncertainty, show shaded error bands

  • level (float, default 0.95) – Credible interval level for uncertainty bands (e.g., 0.95 for 95% CI)

  • figsize (tuple[float, float], default (10, 6)) – Figure size in inches (width, height)

  • xlabel (str, optional) – Label for x-axis. If None, uses “Independent Variable”

  • ylabel (str, optional) – Label for y-axis. If None, uses “Dependent Variable”

  • title (str, optional) – Plot title. If None, no title is shown

  • **kwargs (Any) – Additional keyword arguments passed to matplotlib.pyplot.plot()

Returns:

(fig, ax) matplotlib figure and axis objects

Return type:

tuple

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> x = np.linspace(0, 10, 50)
>>> y = 2.0 * x + 1.0
>>> dataset = OneDimensionalDataset(
...     independent_variable_data=x,
...     dependent_variable_data=y
... )
>>> fig, ax = dataset.visualize(xlabel='Time (s)', ylabel='Signal (V)')
>>> # With uncertainty
>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000, method='bootstrap')
>>> fig, ax = dataset_with_unc.visualize(
...     show_uncertainty=True,
...     level=0.95,
...     xlabel='Time (s)',
...     ylabel='Signal (V)'
... )

Notes

  • Requires matplotlib to be installed

  • For datasets with uncertainty, shaded bands show the credible intervals

  • Multiple confidence levels can be shown by calling visualize multiple times

Two-Dimensional Dataset

Two-dimensional dataset with two independent variables and 2D dependent data.

Used for data that varies with two parameters, such as time-temperature maps, spatial imaging data, or parameter sweeps.

class piblin_jax.data.datasets.two_dimensional.TwoDimensionalDataset(independent_variable_data_1, independent_variable_data_2, dependent_variable_data, conditions=None, details=None)[source]

Bases: Dataset

Two-dimensional dataset with two independent variables and a 2D dependent array.

This dataset type represents data that varies with two independent parameters: - Time-temperature maps (kinetics studies) - Spatial imaging data (microscopy, spectroscopy maps) - Parameter sweep experiments - Contour plots and heatmaps

Parameters:
  • independent_variable_data_1 (array_like) – 1D array of first independent variable (e.g., temperature, x-coordinate).

  • independent_variable_data_2 (array_like) – 1D array of second independent variable (e.g., time, y-coordinate).

  • dependent_variable_data (array_like) – 2D array of dependent variable values with shape (len(var1), len(var2)).

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

independent_variable_data_1

First independent variable as NumPy array.

Type:

np.ndarray

independent_variable_data_2

Second independent variable as NumPy array.

Type:

np.ndarray

dependent_variable_data

2D dependent variable as NumPy array.

Type:

np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If dimension compatibility is violated.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import TwoDimensionalDataset
>>> # Temperature-time kinetics map
>>> temperature = np.linspace(20, 100, 50)  # 50 temperatures
>>> time = np.linspace(0, 3600, 100)        # 100 time points
>>> # Reaction extent at each (temp, time) combination
>>> extent = np.random.rand(50, 100)
>>> dataset = TwoDimensionalDataset(
...     independent_variable_data_1=temperature,
...     independent_variable_data_2=time,
...     dependent_variable_data=extent,
...     conditions={"catalyst": "Pd/C", "solvent": "ethanol"},
...     details={"experiment_id": "KIN-2025-042"}
... )
>>> dataset.independent_variable_data_1.shape
(50,)
>>> dataset.dependent_variable_data.shape
(50, 100)
>>> # Spectroscopy imaging (spatial map)
>>> x_coords = np.linspace(0, 10, 64)
>>> y_coords = np.linspace(0, 10, 64)
>>> intensity_map = np.random.rand(64, 64)
>>> image = TwoDimensionalDataset(
...     independent_variable_data_1=x_coords,
...     independent_variable_data_2=y_coords,
...     dependent_variable_data=intensity_map,
...     conditions={"wavelength": 532, "power": 10},
...     details={"units": "microns", "resolution": "0.156 um/pixel"}
... )

Notes

The dependent variable must have shape (len(var1), len(var2)). Arrays are stored internally as backend arrays and converted to NumPy when accessed.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

dependent_variable_data

Get dependent variable data as NumPy array.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

independent_variable_data_1

Get first independent variable as NumPy array.

independent_variable_data_2

Get second independent variable as NumPy array.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

__init__(independent_variable_data_1, independent_variable_data_2, dependent_variable_data, conditions=None, details=None)[source]

Initialize two-dimensional dataset.

Parameters:
  • independent_variable_data_1 (array_like) – 1D array of first independent variable.

  • independent_variable_data_2 (array_like) – 1D array of second independent variable.

  • dependent_variable_data (array_like) – 2D array of dependent variable.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If dimension compatibility is violated.

property independent_variable_data_1: ndarray

Get first independent variable as NumPy array.

Returns:

1D NumPy array of first independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data_1
array([20., 21.63, 23.27, ..., 98.37, 100.])
property independent_variable_data_2: ndarray

Get second independent variable as NumPy array.

Returns:

1D NumPy array of second independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data_2
array([0., 36.36, 72.73, ..., 3527.27, 3563.64, 3600.])
property dependent_variable_data: ndarray

Get dependent variable data as NumPy array.

Returns:

2D NumPy array of dependent variable values with shape (len(var1), len(var2)).

Return type:

np.ndarray

Examples

>>> dataset.dependent_variable_data.shape
(50, 100)
>>> dataset.dependent_variable_data[0, 0]  # Value at first (temp, time)
0.234

Three-Dimensional Dataset

Three-dimensional dataset with three independent variables and 3D dependent data.

Used for volumetric data, 3D imaging, or data varying with three parameters.

class piblin_jax.data.datasets.three_dimensional.ThreeDimensionalDataset(independent_variable_data_1, independent_variable_data_2, independent_variable_data_3, dependent_variable_data, conditions=None, details=None)[source]

Bases: Dataset

Three-dimensional dataset with three independent variables and a 3D dependent array.

This dataset type represents volumetric or 3D data: - 3D microscopy/imaging (confocal, CT, MRI) - Volumetric spectroscopy - Three-parameter experiments (e.g., temperature, pressure, time) - Computational fluid dynamics results - Molecular dynamics trajectories

Parameters:
  • independent_variable_data_1 (array_like) – 1D array of first independent variable (e.g., x-coordinate, temperature).

  • independent_variable_data_2 (array_like) – 1D array of second independent variable (e.g., y-coordinate, pressure).

  • independent_variable_data_3 (array_like) – 1D array of third independent variable (e.g., z-coordinate, time).

  • dependent_variable_data (array_like) – 3D array of dependent variable values with shape (len(var1), len(var2), len(var3)).

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

independent_variable_data_1

First independent variable as NumPy array.

Type:

np.ndarray

independent_variable_data_2

Second independent variable as NumPy array.

Type:

np.ndarray

independent_variable_data_3

Third independent variable as NumPy array.

Type:

np.ndarray

dependent_variable_data

3D dependent variable as NumPy array.

Type:

np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If dimension compatibility is violated.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import ThreeDimensionalDataset
>>> # 3D confocal microscopy data
>>> x = np.linspace(0, 100, 64)  # microns
>>> y = np.linspace(0, 100, 64)  # microns
>>> z = np.linspace(0, 50, 32)   # microns (z-stack)
>>> intensity = np.random.rand(64, 64, 32)
>>> volume = ThreeDimensionalDataset(
...     independent_variable_data_1=x,
...     independent_variable_data_2=y,
...     independent_variable_data_3=z,
...     dependent_variable_data=intensity,
...     conditions={"wavelength": 488, "objective": "40x"},
...     details={"voxel_size": "1.56 x 1.56 x 1.56 um"}
... )
>>> volume.dependent_variable_data.shape
(64, 64, 32)
>>> # Three-parameter experiment (T, P, t)
>>> temperatures = np.array([25, 50, 75, 100])
>>> pressures = np.array([1, 5, 10, 15, 20])
>>> times = np.array([0, 60, 120, 180])
>>> conversion = np.random.rand(4, 5, 4)
>>> experiment = ThreeDimensionalDataset(
...     independent_variable_data_1=temperatures,
...     independent_variable_data_2=pressures,
...     independent_variable_data_3=times,
...     dependent_variable_data=conversion,
...     conditions={"catalyst": "Pt/Al2O3", "reactant": "H2 + CO"},
...     details={"experiment": "Fischer-Tropsch"}
... )

Notes

The dependent variable must have shape (len(var1), len(var2), len(var3)). Arrays are stored internally as backend arrays and converted to NumPy when accessed.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

dependent_variable_data

Get dependent variable data as NumPy array.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

independent_variable_data_1

Get first independent variable as NumPy array.

independent_variable_data_2

Get second independent variable as NumPy array.

independent_variable_data_3

Get third independent variable as NumPy array.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

__init__(independent_variable_data_1, independent_variable_data_2, independent_variable_data_3, dependent_variable_data, conditions=None, details=None)[source]

Initialize three-dimensional dataset.

Parameters:
  • independent_variable_data_1 (array_like) – 1D array of first independent variable.

  • independent_variable_data_2 (array_like) – 1D array of second independent variable.

  • independent_variable_data_3 (array_like) – 1D array of third independent variable.

  • dependent_variable_data (array_like) – 3D array of dependent variable.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If dimension compatibility is violated.

property independent_variable_data_1: ndarray

Get first independent variable as NumPy array.

Returns:

1D NumPy array of first independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data_1
array([0., 1.59, 3.17, ..., 98.41, 100.])
property independent_variable_data_2: ndarray

Get second independent variable as NumPy array.

Returns:

1D NumPy array of second independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data_2
array([0., 1.59, 3.17, ..., 98.41, 100.])
property independent_variable_data_3: ndarray

Get third independent variable as NumPy array.

Returns:

1D NumPy array of third independent variable values.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data_3
array([0., 1.61, 3.23, ..., 48.39, 50.])
property dependent_variable_data: ndarray

Get dependent variable data as NumPy array.

Returns:

3D NumPy array of dependent variable values with shape (len(var1), len(var2), len(var3)).

Return type:

np.ndarray

Examples

>>> dataset.dependent_variable_data.shape
(64, 64, 32)
>>> dataset.dependent_variable_data[0, 0, 0]  # Value at first point
0.456

Composite Dataset

Composite one-dimensional dataset with multiple dependent variables.

Used for multi-channel instrument data where multiple signals share the same independent variable (e.g., time, wavelength).

class piblin_jax.data.datasets.composite.OneDimensionalCompositeDataset(independent_variable_data, dependent_variable_data_list, conditions=None, details=None)[source]

Bases: Dataset

Composite 1D dataset with shared independent variable and multiple dependents.

This dataset type represents multi-channel or multi-detector data where multiple signals share the same independent variable: - Multi-detector chromatography (UV, fluorescence, conductivity) - Multi-channel spectroscopy - Multi-sensor time series - Parallel measurements with shared axis

Parameters:
  • independent_variable_data (array_like) – 1D array of independent variable (time, wavelength, etc.) shared by all channels.

  • dependent_variable_data_list (list of array_like) – List of 1D arrays, each representing a different channel/detector. All must have the same length as independent_variable_data.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

independent_variable_data

Shared independent variable as NumPy array.

Type:

np.ndarray

dependent_variable_data_list

List of dependent variables as NumPy arrays.

Type:

list of np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If dependent_variable_data_list is empty, or if any channel has different length than independent_variable_data.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalCompositeDataset
>>> # Multi-detector HPLC data
>>> time = np.linspace(0, 20, 2000)  # minutes
>>> uv_254 = np.sin(time) + 0.1 * np.random.randn(2000)
>>> uv_280 = np.cos(time) + 0.1 * np.random.randn(2000)
>>> fluorescence = np.sin(2 * time) + 0.05 * np.random.randn(2000)
>>> hplc = OneDimensionalCompositeDataset(
...     independent_variable_data=time,
...     dependent_variable_data_list=[uv_254, uv_280, fluorescence],
...     conditions={"mobile_phase": "ACN/H2O 60:40", "flow_rate": 1.0},
...     details={
...         "channels": ["UV 254nm", "UV 280nm", "Fluorescence"],
...         "instrument": "HPLC-1"
...     }
... )
>>> hplc.independent_variable_data.shape
(2000,)
>>> len(hplc.dependent_variable_data_list)
3
>>> hplc.dependent_variable_data_list[0].shape
(2000,)
>>> # Multi-channel oscilloscope data
>>> t = np.linspace(0, 1, 10000)
>>> ch1 = np.sin(2 * np.pi * 5 * t)
>>> ch2 = np.sin(2 * np.pi * 10 * t)
>>> ch3 = np.sin(2 * np.pi * 15 * t)
>>> ch4 = np.sin(2 * np.pi * 20 * t)
>>> scope_data = OneDimensionalCompositeDataset(
...     independent_variable_data=t,
...     dependent_variable_data_list=[ch1, ch2, ch3, ch4],
...     conditions={"sampling_rate": 10000},
...     details={"instrument": "oscilloscope", "channels": 4}
... )

Notes

This dataset type is useful when multiple measurements are made simultaneously along the same independent axis. Each channel is stored as a separate NumPy array in the list, allowing different processing or analysis on each channel while maintaining their shared relationship through the common independent variable.

The internal storage uses backend arrays (JAX when available) and converts to NumPy at the property boundaries.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

dependent_variable_data_list

Get list of dependent variables as NumPy arrays.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

independent_variable_data

Get shared independent variable as NumPy array.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

__init__(independent_variable_data, dependent_variable_data_list, conditions=None, details=None)[source]

Initialize composite one-dimensional dataset.

Parameters:
  • independent_variable_data (array_like) – 1D array of shared independent variable.

  • dependent_variable_data_list (list of array_like) – List of 1D arrays for each channel.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If list is empty or if any channel length doesn’t match independent variable.

property independent_variable_data: ndarray

Get shared independent variable as NumPy array.

Returns:

1D NumPy array of independent variable shared by all channels.

Return type:

np.ndarray

Examples

>>> dataset.independent_variable_data
array([0., 0.01, 0.02, ..., 19.98, 19.99, 20.])
property dependent_variable_data_list: list[ndarray]

Get list of dependent variables as NumPy arrays.

Returns:

List of 1D NumPy arrays, one for each channel/detector.

Return type:

list of np.ndarray

Examples

>>> len(dataset.dependent_variable_data_list)
3
>>> dataset.dependent_variable_data_list[0]  # First channel
array([0.123, 0.145, ..., 0.234])
>>> dataset.dependent_variable_data_list[1]  # Second channel
array([0.456, 0.478, ..., 0.567])
>>> # Process each channel
>>> for i, channel in enumerate(dataset.dependent_variable_data_list):
...     print(f"Channel {i}: max = {channel.max():.3f}")
Channel 0: max = 1.234
Channel 1: max = 1.567
Channel 2: max = 0.987

Distribution Dataset

Distribution dataset for continuous probability density functions.

Used for molecular weight distributions, continuous PDFs, and other distribution data where the probability density is a continuous function.

class piblin_jax.data.datasets.distribution.Distribution(variable_data, probability_density, conditions=None, details=None)[source]

Bases: Dataset

Distribution dataset with variable data and probability density.

This dataset type represents continuous probability density functions: - Molecular weight distributions (GPC/SEC) - Particle size distributions (continuous) - Statistical distributions - Probability density functions - Any continuous distribution data

Parameters:
  • variable_data (array_like) – 1D array of the variable (e.g., molecular weight, particle size).

  • probability_density (array_like) – 1D array of probability density values corresponding to variable_data. Should have the same length as variable_data.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

variable_data

Variable data as NumPy array.

Type:

np.ndarray

probability_density

Probability density as NumPy array.

Type:

np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If variable_data and probability_density have different shapes.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import Distribution
>>> # Molecular weight distribution from GPC
>>> molecular_weight = np.linspace(1000, 100000, 500)
>>> # Gaussian-like distribution centered at 50000
>>> pdf = np.exp(-((molecular_weight - 50000) ** 2) / (2 * 10000 ** 2))
>>> # Normalize so integral equals 1
>>> pdf = pdf / np.trapz(pdf, molecular_weight)
>>> mwd = Distribution(
...     variable_data=molecular_weight,
...     probability_density=pdf,
...     conditions={"polymer": "polystyrene", "solvent": "THF"},
...     details={"technique": "GPC", "standard": "PS"}
... )
>>> mwd.variable_data.shape
(500,)
>>> mwd.probability_density.shape
(500,)
>>> # Particle size distribution
>>> diameter = np.linspace(1, 1000, 1000)  # nm
>>> psd = np.exp(-((np.log(diameter) - np.log(100)) ** 2) / (2 * 0.5 ** 2))
>>> psd = psd / np.trapz(psd, diameter)
>>> particle_dist = Distribution(
...     variable_data=diameter,
...     probability_density=psd,
...     conditions={"sample": "nanoparticles_Au"},
...     details={"units": "nm", "technique": "DLS"}
... )
>>> # Custom probability distribution
>>> x = np.linspace(-5, 5, 1000)
>>> pdf = np.exp(-x**2 / 2) / np.sqrt(2 * np.pi)
>>> normal_dist = Distribution(
...     variable_data=x,
...     probability_density=pdf,
...     details={"distribution": "standard normal"}
... )

Notes

Unlike Histogram which represents discrete bins, Distribution represents a continuous probability density function. The probability density values are typically normalized such that the integral over the variable range equals 1, but this is not enforced by the class.

The distinction between Distribution and OneDimensionalDataset is primarily semantic: Distribution emphasizes that the dependent variable represents a probability density, while OneDimensionalDataset is more general.

Attributes:
conditions

Get experimental conditions.

credible_intervals

Get cached credible intervals.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

probability_density

Get probability density as NumPy array.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

variable_data

Get variable data as NumPy array.

Methods

copy()

Create a deep copy of this dataset.

__init__(variable_data, probability_density, conditions=None, details=None)[source]

Initialize distribution dataset.

Parameters:
  • variable_data (array_like) – 1D array of variable values.

  • probability_density (array_like) – 1D array of probability density values.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If arrays have different shapes.

property variable_data: ndarray

Get variable data as NumPy array.

Returns:

1D NumPy array of variable values (e.g., molecular weight, particle size, x-values).

Return type:

np.ndarray

Examples

>>> dist.variable_data
array([1000., 1198., 1396., ..., 99604., 99802., 100000.])
property probability_density: ndarray

Get probability density as NumPy array.

Returns:

1D NumPy array of probability density values.

Return type:

np.ndarray

Examples

>>> dist.probability_density
array([0.000001, 0.000002, ..., 0.000003, 0.000001])
>>> # Check normalization (should be close to 1)
>>> np.trapz(dist.probability_density, dist.variable_data)
1.0000234

Histogram Dataset

Histogram dataset for binned data with variable-width bins.

Used for particle size distributions, histograms, and other binned data where bin widths may vary.

class piblin_jax.data.datasets.histogram.Histogram(bin_edges, counts, conditions=None, details=None)[source]

Bases: Dataset

Histogram dataset with bin edges and counts.

This dataset type represents binned data with potentially variable-width bins: - Particle size distributions - Molecular weight distributions (binned) - Intensity histograms - Frequency distributions - Any data organized into discrete bins

Parameters:
  • bin_edges (array_like) – 1D array of bin edges. For n bins, this array has n+1 elements. Bins are defined as [bin_edges[i], bin_edges[i+1]).

  • counts (array_like) – 1D array of counts or frequencies in each bin. Must have length n (one less than bin_edges).

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

bin_edges

Bin edges as NumPy array.

Type:

np.ndarray

counts

Counts per bin as NumPy array.

Type:

np.ndarray

conditions

Experimental conditions.

Type:

dict[str, Any]

details

Additional metadata.

Type:

dict[str, Any]

Raises:

ValueError – If counts length is not compatible with bin_edges (must be len(bin_edges) - 1).

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import Histogram
>>> # Particle size distribution with variable-width bins
>>> bin_edges = np.array([0, 1, 3, 6, 10, 20])  # 5 bins
>>> counts = np.array([12, 45, 67, 34, 8])      # 5 counts
>>> psd = Histogram(
...     bin_edges=bin_edges,
...     counts=counts,
...     conditions={"sample": "nanoparticles_batch_42"},
...     details={"units": "nm", "technique": "DLS"}
... )
>>> psd.bin_edges
array([0, 1, 3, 6, 10, 20])
>>> psd.counts
array([12, 45, 67, 34, 8])
>>> # Intensity histogram
>>> # Equal-width bins for pixel intensities
>>> bins = np.linspace(0, 255, 256)  # 255 bins
>>> hist_counts = np.random.poisson(100, 255)
>>> intensity_hist = Histogram(
...     bin_edges=bins,
...     counts=hist_counts,
...     conditions={"image": "sample_001.tif"},
...     details={"bit_depth": 8}
... )

Notes

Unlike Distribution, Histogram represents discrete bins rather than a continuous probability density. The bin_edges array has one more element than the counts array. For variable-width bins, the bin width can be computed as np.diff(bin_edges).

Attributes:
bin_edges

Get bin edges as NumPy array.

conditions

Get experimental conditions.

counts

Get bin counts as NumPy array.

credible_intervals

Get cached credible intervals.

details

Get additional dataset details.

has_uncertainty

Check if dataset has uncertainty information.

uncertainty_samples

Get uncertainty samples (if keep_samples=True was used).

Methods

copy()

Create a deep copy of this dataset.

__init__(bin_edges, counts, conditions=None, details=None)[source]

Initialize histogram dataset.

Parameters:
  • bin_edges (array_like) – 1D array of bin edges (n+1 elements for n bins).

  • counts (array_like) – 1D array of counts (n elements).

  • conditions (dict[str, Any] | None, optional) – Experimental conditions.

  • details (dict[str, Any] | None, optional) – Additional metadata.

Raises:

ValueError – If counts length is not compatible with bin_edges.

property bin_edges: ndarray

Get bin edges as NumPy array.

Returns:

1D NumPy array of bin edges. For n bins, has n+1 elements.

Return type:

np.ndarray

Examples

>>> hist.bin_edges
array([0, 1, 3, 6, 10, 20])
>>> # Bin widths can be computed as:
>>> np.diff(hist.bin_edges)
array([1, 2, 3, 4, 10])
property counts: ndarray

Get bin counts as NumPy array.

Returns:

1D NumPy array of counts or frequencies in each bin.

Return type:

np.ndarray

Examples

>>> hist.counts
array([12, 45, 67, 34, 8])
>>> # Total count
>>> hist.counts.sum()
166

Collections

Measurement

Measurement class for piblin-jax.

Container for multiple Dataset objects representing a single measurement event.

class piblin_jax.data.collections.measurement.Measurement(datasets, conditions=None, details=None)[source]

Bases: object

Container for multiple Dataset objects from a single measurement.

A Measurement represents a single experimental measurement event that may produce multiple datasets (e.g., multiple channels, multiple observables). The collection is immutable for JAX compatibility.

Parameters:
  • datasets (list[Dataset]) – List of Dataset objects from this measurement.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions specific to this measurement (e.g., timestamp, replicate number, environmental conditions).

  • details (dict[str, Any] | None, optional) – Additional context for this measurement (e.g., quality flags, operator notes, instrument state).

datasets

Immutable tuple of datasets from this measurement.

Type:

tuple[Dataset, ]

conditions

Experimental conditions for this measurement.

Type:

dict[str, Any]

details

Additional metadata for this measurement.

Type:

dict[str, Any]

Notes

The datasets are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual datasets can be accessed by indexing or iteration.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import Measurement
>>>
>>> # Create datasets for multiple channels
>>> x = np.linspace(0, 10, 100)
>>> y_ch1 = np.sin(x)
>>> y_ch2 = np.cos(x)
>>>
>>> ds1 = OneDimensionalDataset(x, y_ch1, conditions={"channel": "A"})
>>> ds2 = OneDimensionalDataset(x, y_ch2, conditions={"channel": "B"})
>>>
>>> # Create measurement with both channels
>>> measurement = Measurement(
...     datasets=[ds1, ds2],
...     conditions={"temperature": 25.0, "replicate": 1},
...     details={"timestamp": "2025-10-18 10:00:00"}
... )
>>>
>>> # Access datasets
>>> len(measurement)
2
>>> first_dataset = measurement[0]
>>> for ds in measurement:
...     print(ds.conditions["channel"])
A
B
Attributes:
conditions

Get experimental conditions for this measurement.

datasets

Get all datasets in this measurement.

details

Get additional details for this measurement.

__init__(datasets, conditions=None, details=None)[source]

Initialize Measurement with datasets and metadata.

Parameters:
  • datasets (list[Dataset]) – List of Dataset objects from this measurement.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this measurement.

  • details (dict[str, Any] | None, optional) – Additional context for this measurement.

property datasets: tuple[Dataset, ...]

Get all datasets in this measurement.

No-index:

Returns:

Immutable tuple of Dataset objects.

Return type:

tuple[Dataset, ]

Examples

>>> measurement.datasets
(<OneDimensionalDataset at 0x...>, <OneDimensionalDataset at 0x...>)
property conditions: dict[str, Any]

Get experimental conditions for this measurement.

No-index:

Returns:

Dictionary of experimental conditions (timestamp, replicate, etc.).

Return type:

dict[str, Any]

Examples

>>> measurement.conditions
{'temperature': 25.0, 'replicate': 1, 'timestamp': '10:00:00'}
property details: dict[str, Any]

Get additional details for this measurement.

No-index:

Returns:

Dictionary of additional context (quality flags, notes, etc.).

Return type:

dict[str, Any]

Examples

>>> measurement.details
{'quality': 'good', 'operator': 'John Doe'}
__len__()[source]

Get number of datasets in this measurement.

Returns:

Number of datasets.

Return type:

int

Examples

>>> len(measurement)
2
__iter__()[source]

Iterate over datasets in this measurement.

Yields:

Dataset – Each dataset in order.

Examples

>>> for dataset in measurement:
...     print(type(dataset).__name__)
OneDimensionalDataset
OneDimensionalDataset
__getitem__(index)[source]

Get dataset by index.

Parameters:

index (int or slice) – Index or slice to access datasets.

Returns:

Dataset at the given index, or tuple of datasets for slice.

Return type:

Dataset or tuple[Dataset, ]

Examples

>>> measurement[0]
<OneDimensionalDataset at 0x...>
>>> measurement[0:2]
(<OneDimensionalDataset at 0x...>, <OneDimensionalDataset at 0x...>)

MeasurementSet

MeasurementSet base class for piblin-jax.

Container for multiple Measurement objects representing a series of related measurements.

class piblin_jax.data.collections.measurement_set.MeasurementSet(measurements, conditions=None, details=None)[source]

Bases: object

Base class for collections of Measurement objects.

A MeasurementSet represents a series of related measurements, such as: - Time series measurements - Replicate measurements - Parameter sweep measurements - Multi-sample measurements

This is the base class. Specialized variants include: - ConsistentMeasurementSet: All measurements have same structure - TidyMeasurementSet: All measurements share comparable conditions - TabularMeasurementSet: Measurements arranged in tabular format

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects in this set.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for the entire measurement series (e.g., sample, experimental setup, date).

  • details (dict[str, Any] | None, optional) – Additional context for this measurement series (e.g., series description, experimental notes).

measurements

Immutable tuple of measurements in this set.

Type:

tuple[Measurement, ]

conditions

Experimental conditions for this measurement series.

Type:

dict[str, Any]

details

Additional metadata for this measurement series.

Type:

dict[str, Any]

Notes

The measurements are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual measurements can be accessed by indexing or iteration.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import Measurement, MeasurementSet
>>>
>>> # Create replicate measurements
>>> x = np.linspace(0, 10, 100)
>>> measurements = []
>>> for i in range(3):
...     y = np.sin(x) + np.random.normal(0, 0.1, len(x))
...     ds = OneDimensionalDataset(x, y)
...     m = Measurement(
...         datasets=[ds],
...         conditions={"replicate": i+1}
...     )
...     measurements.append(m)
>>>
>>> # Create measurement set
>>> ms = MeasurementSet(
...     measurements=measurements,
...     conditions={"sample": "S1", "date": "2025-10-18"},
...     details={"notes": "Replicate measurements with noise"}
... )
>>>
>>> # Access measurements
>>> len(ms)
3
>>> first_measurement = ms[0]
>>> for m in ms:
...     print(m.conditions["replicate"])
1
2
3
Attributes:
conditions

Get experimental conditions for this measurement series.

details

Get additional details for this measurement series.

measurements

Get all measurements in this set.

__init__(measurements, conditions=None, details=None)[source]

Initialize MeasurementSet with measurements and metadata.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects in this set.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for this measurement series.

property measurements: tuple[Measurement, ...]

Get all measurements in this set.

No-index:

Returns:

Immutable tuple of Measurement objects.

Return type:

tuple[Measurement, ]

Examples

>>> ms.measurements
(<Measurement at 0x...>, <Measurement at 0x...>, <Measurement at 0x...>)
property conditions: dict[str, Any]

Get experimental conditions for this measurement series.

No-index:

Returns:

Dictionary of experimental conditions (sample, date, setup, etc.).

Return type:

dict[str, Any]

Examples

>>> ms.conditions
{'sample': 'S1', 'date': '2025-10-18', 'instrument': 'Spec-X'}
property details: dict[str, Any]

Get additional details for this measurement series.

No-index:

Returns:

Dictionary of additional context (notes, quality, etc.).

Return type:

dict[str, Any]

Examples

>>> ms.details
{'notes': 'Time series', 'quality': 'good'}
__len__()[source]

Get number of measurements in this set.

Returns:

Number of measurements.

Return type:

int

Examples

>>> len(ms)
3
__iter__()[source]

Iterate over measurements in this set.

Yields:

Measurement – Each measurement in order.

Examples

>>> for measurement in ms:
...     print(len(measurement))
1
1
1
__getitem__(index)[source]

Get measurement by index.

Parameters:

index (int or slice) – Index or slice to access measurements.

Returns:

Measurement at the given index, or tuple of measurements for slice.

Return type:

Measurement or tuple[Measurement, ]

Examples

>>> ms[0]
<Measurement at 0x...>
>>> ms[0:2]
(<Measurement at 0x...>, <Measurement at 0x...>)

ConsistentMeasurementSet

ConsistentMeasurementSet class for piblin-jax.

MeasurementSet variant where all measurements have the same dataset structure.

class piblin_jax.data.collections.consistent_measurement_set.ConsistentMeasurementSet(measurements, conditions=None, details=None)[source]

Bases: MeasurementSet

MeasurementSet where all measurements have the same dataset structure.

This specialized variant enforces that all measurements contain datasets of the same types in the same order. This is useful for: - Replicate measurements (same protocol, multiple runs) - Time series measurements (same observables at different times) - Consistent multi-channel measurements

The structural consistency enables array-based operations and easier data aggregation.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects. All must have the same structure.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for the measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for the measurement series.

Raises:

ValueError – If measurements do not all have the same dataset structure.

Notes

Structure is defined as the sequence of dataset types. For example:

  • [OneDimensionalDataset, OneDimensionalDataset] is consistent with itself

  • [OneDimensionalDataset] is NOT consistent with [ZeroDimensionalDataset]

  • [OneDimensionalDataset, ZeroDimensionalDataset] is NOT consistent with [ZeroDimensionalDataset, OneDimensionalDataset] (order matters)

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import Measurement, ConsistentMeasurementSet
>>>
>>> # Create replicate measurements with consistent structure
>>> x = np.linspace(0, 10, 100)
>>> measurements = []
>>> for i in range(5):
...     y = np.sin(x) + np.random.normal(0, 0.1, len(x))
...     ds = OneDimensionalDataset(x, y)
...     m = Measurement([ds], conditions={"replicate": i+1})
...     measurements.append(m)
>>>
>>> # Create consistent measurement set
>>> cms = ConsistentMeasurementSet(
...     measurements=measurements,
...     conditions={"sample": "S1", "experiment": "replicates"}
... )
>>>
>>> len(cms)
5
>>>
>>> # All measurements have the same structure
>>> for m in cms:
...     print(len(m.datasets), type(m.datasets[0]).__name__)
1 OneDimensionalDataset
1 OneDimensionalDataset
1 OneDimensionalDataset
1 OneDimensionalDataset
1 OneDimensionalDataset
>>>
>>> # This will raise ValueError - inconsistent structures
>>> from piblin_jax.data.datasets import ZeroDimensionalDataset
>>> m1 = Measurement([OneDimensionalDataset(np.array([1, 2]), np.array([3, 4]))])
>>> m2 = Measurement([ZeroDimensionalDataset(5.0)])
>>> ConsistentMeasurementSet([m1, m2])
ValueError: All measurements must have same structure
Attributes:
conditions

Get experimental conditions for this measurement series.

details

Get additional details for this measurement series.

measurements

Get all measurements in this set.

__init__(measurements, conditions=None, details=None)[source]

Initialize ConsistentMeasurementSet with structure validation.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects with consistent structure.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for this measurement series.

Raises:

ValueError – If measurements do not all have the same structure.

TabularMeasurementSet

TabularMeasurementSet class for piblin-jax.

MeasurementSet variant with tabular access patterns (rows and columns).

class piblin_jax.data.collections.tabular_measurement_set.TabularMeasurementSet(measurements, row_labels=None, col_labels=None, conditions=None, details=None)[source]

Bases: MeasurementSet

MeasurementSet with measurements arranged in tabular format.

This specialized variant organizes measurements in a logical table structure with row and column labels. This is useful for: - Experimental design matrices (e.g., multi-factor designs) - Microplate/well plate layouts - Spatial arrangements of measurements - Grid-based sampling patterns

The tabular structure enables intuitive access patterns and natural visualization as tables or heatmaps.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects. The order corresponds to row-major ordering in the table (row1-col1, row1-col2, …, row2-col1, …).

  • row_labels (list[str] | None, optional) – Labels for table rows. If provided, must satisfy: len(row_labels) * len(col_labels) == len(measurements)

  • col_labels (list[str] | None, optional) – Labels for table columns. If provided, must satisfy: len(row_labels) * len(col_labels) == len(measurements)

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for the measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for the measurement series.

row_labels

Labels for table rows.

Type:

list[str] | None

col_labels

Labels for table columns.

Type:

list[str] | None

Notes

Measurements are stored in row-major order. For a 2x3 table: - measurements[0] = row 0, col 0 - measurements[1] = row 0, col 1 - measurements[2] = row 0, col 2 - measurements[3] = row 1, col 0 - measurements[4] = row 1, col 1 - measurements[5] = row 1, col 2

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import Measurement, TabularMeasurementSet
>>>
>>> # Create a 2x3 grid of measurements
>>> x = np.linspace(0, 10, 50)
>>> measurements = []
>>>
>>> for i in range(2):  # rows
...     for j in range(3):  # columns
...         y = np.sin(x * (i + 1)) * (j + 1)
...         ds = OneDimensionalDataset(x, y)
...         m = Measurement(
...             [ds],
...             conditions={"row": i, "col": j}
...         )
...         measurements.append(m)
>>>
>>> # Create tabular measurement set
>>> tms = TabularMeasurementSet(
...     measurements=measurements,
...     row_labels=["row_A", "row_B"],
...     col_labels=["col_1", "col_2", "col_3"],
...     conditions={"plate": "plate_001"},
...     details={"date": "2025-10-18"}
... )
>>>
>>> len(tms)
6
>>> tms.row_labels
['row_A', 'row_B']
>>> tms.col_labels
['col_1', 'col_2', 'col_3']
>>>
>>> # Access measurement at row 1, col 2
>>> m = tms.get_measurement(1, 2)
>>> m.conditions["row"]
1
>>> m.conditions["col"]
2
Attributes:
col_labels

Get column labels for the table.

conditions

Get experimental conditions for this measurement series.

details

Get additional details for this measurement series.

measurements

Get all measurements in this set.

row_labels

Get row labels for the table.

shape

Get the shape of the table (rows, columns).

Methods

get_column(col)

Get all measurements in a specified column.

get_measurement(row, col)

Get measurement at specified row and column indices.

get_row(row)

Get all measurements in a specified row.

__init__(measurements, row_labels=None, col_labels=None, conditions=None, details=None)[source]

Initialize TabularMeasurementSet with optional row/column labels.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects in row-major order.

  • row_labels (list[str] | None, optional) – Labels for table rows.

  • col_labels (list[str] | None, optional) – Labels for table columns.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for this measurement series.

Raises:

ValueError – If row_labels and col_labels are provided but their product doesn’t match the number of measurements.

property row_labels: list[str] | None

Get row labels for the table.

Returns:

List of row labels, or None if not provided.

Return type:

list[str] | None

Examples

>>> tms.row_labels
['row_A', 'row_B', 'row_C']
property col_labels: list[str] | None

Get column labels for the table.

Returns:

List of column labels, or None if not provided.

Return type:

list[str] | None

Examples

>>> tms.col_labels
['col_1', 'col_2', 'col_3', 'col_4']
property shape: tuple[int, int] | None

Get the shape of the table (rows, columns).

Returns:

(n_rows, n_cols) if labels are provided, None otherwise.

Return type:

tuple[int, int] | None

Examples

>>> tms.shape
(2, 3)
get_measurement(row, col)[source]

Get measurement at specified row and column indices.

Uses row-major ordering: index = row * n_cols + col

Parameters:
  • row (int) – Row index (0-based).

  • col (int) – Column index (0-based).

Returns:

Measurement at the specified position.

Return type:

Measurement

Raises:
  • ValueError – If row_labels and col_labels were not provided.

  • IndexError – If row or col indices are out of bounds.

Examples

>>> m = tms.get_measurement(1, 2)
>>> m.conditions["row"]
1
>>> m.conditions["col"]
2
get_row(row)[source]

Get all measurements in a specified row.

Parameters:

row (int) – Row index (0-based).

Returns:

List of measurements in the row.

Return type:

list[Measurement]

Raises:
  • ValueError – If row_labels and col_labels were not provided.

  • IndexError – If row index is out of bounds.

Examples

>>> row_measurements = tms.get_row(0)
>>> len(row_measurements)
3
>>> [m.conditions["col"] for m in row_measurements]
[0, 1, 2]
get_column(col)[source]

Get all measurements in a specified column.

Parameters:

col (int) – Column index (0-based).

Returns:

List of measurements in the column.

Return type:

list[Measurement]

Raises:
  • ValueError – If row_labels and col_labels were not provided.

  • IndexError – If col index is out of bounds.

Examples

>>> col_measurements = tms.get_column(1)
>>> len(col_measurements)
2
>>> [m.conditions["row"] for m in col_measurements]
[0, 1]

TidyMeasurementSet

TidyMeasurementSet class for piblin-jax.

MeasurementSet variant where measurements share comparable experimental conditions.

class piblin_jax.data.collections.tidy_measurement_set.TidyMeasurementSet(measurements, conditions=None, details=None)[source]

Bases: MeasurementSet

MeasurementSet where measurements share comparable experimental conditions.

This specialized variant is designed for measurements that can be compared across shared experimental conditions, following “tidy data” principles. This is useful for: - Parameter sweeps (varying temperature, pressure, etc.) - Multi-factor experiments (factorial designs) - Grouped experimental conditions - Long-form data representation

The shared condition structure enables statistical analysis, grouping, and faceted visualization.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects with comparable conditions.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for the measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for the measurement series.

Notes

“Tidy data” refers to a data organization principle where: - Each measurement is an observation - Each condition is a variable - Each unique condition value identifies a group

This enables standard statistical and data manipulation tools to work effectively with the measurement set.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import Measurement, TidyMeasurementSet
>>>
>>> # Create measurements with varying conditions
>>> x = np.linspace(0, 10, 100)
>>> measurements = []
>>>
>>> for temp in [20, 25, 30]:
...     for sample in ['A', 'B']:
...         y = np.sin(x) * temp / 25
...         ds = OneDimensionalDataset(x, y)
...         m = Measurement(
...             [ds],
...             conditions={"temperature": temp, "sample": sample}
...         )
...         measurements.append(m)
>>>
>>> # Create tidy measurement set
>>> tms = TidyMeasurementSet(
...     measurements=measurements,
...     conditions={"experiment": "temperature_sweep"},
...     details={"date": "2025-10-18"}
... )
>>>
>>> len(tms)
6
>>>
>>> # Get unique condition values
>>> unique = tms.get_unique_conditions()
>>> sorted(unique["temperature"])
[20, 25, 30]
>>> sorted(unique["sample"])
['A', 'B']
Attributes:
conditions

Get experimental conditions for this measurement series.

details

Get additional details for this measurement series.

measurements

Get all measurements in this set.

Methods

filter_by_conditions(**condition_filters)

Create a new TidyMeasurementSet with measurements matching conditions.

get_unique_conditions()

Get all unique values for each condition across measurements.

__init__(measurements, conditions=None, details=None)[source]

Initialize TidyMeasurementSet.

Parameters:
  • measurements (list[Measurement]) – List of Measurement objects with comparable conditions.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this measurement series.

  • details (dict[str, Any] | None, optional) – Additional context for this measurement series.

get_unique_conditions()[source]

Get all unique values for each condition across measurements.

This method analyzes all measurements and returns the set of unique values for each condition key. This is useful for: - Understanding the experimental design - Identifying factor levels - Grouping measurements - Creating faceted plots

Returns:

Dictionary mapping condition names to sets of unique values.

Return type:

dict[str, set]

Examples

>>> # Continuing from class docstring example
>>> unique = tms.get_unique_conditions()
>>> unique["temperature"]
{20, 25, 30}
>>> unique["sample"]
{'A', 'B'}
>>>
>>> # Empty measurement set
>>> tms_empty = TidyMeasurementSet([])
>>> tms_empty.get_unique_conditions()
{}
>>>
>>> # Measurements with different condition keys
>>> m1 = Measurement([OneDimensionalDataset(np.array([1]), np.array([2]))],
...                  conditions={"temp": 25, "pressure": 1.0})
>>> m2 = Measurement([OneDimensionalDataset(np.array([3]), np.array([4]))],
...                  conditions={"temp": 30, "sample": "A"})
>>> tms = TidyMeasurementSet([m1, m2])
>>> unique = tms.get_unique_conditions()
>>> sorted(unique.keys())
['pressure', 'sample', 'temp']
>>> unique["temp"]
{25, 30}
filter_by_conditions(**condition_filters)[source]

Create a new TidyMeasurementSet with measurements matching conditions.

Parameters:

**condition_filters (Any) – Keyword arguments specifying condition values to match. Only measurements where ALL specified conditions match the given values will be included.

Returns:

New TidyMeasurementSet containing only matching measurements.

Return type:

TidyMeasurementSet

Examples

>>> # Filter by single condition
>>> tms_25 = tms.filter_by_conditions(temperature=25)
>>> len(tms_25)
2
>>> all(m.conditions["temperature"] == 25 for m in tms_25)
True
>>>
>>> # Filter by multiple conditions
>>> tms_25_A = tms.filter_by_conditions(temperature=25, sample="A")
>>> len(tms_25_A)
1
>>> m = tms_25_A[0]
>>> m.conditions["temperature"]
25
>>> m.conditions["sample"]
'A'

Experiment

Experiment class for piblin-jax.

Container for multiple MeasurementSet objects representing a single experiment.

class piblin_jax.data.collections.experiment.Experiment(measurement_sets, conditions=None, details=None)[source]

Bases: object

Container for MeasurementSet objects from a single experiment.

An Experiment represents a complete experimental run or sample, which may contain multiple series of measurements (MeasurementSets). This is useful for: - Single sample with multiple measurement types - Complete experimental protocol with multiple phases - Single experimental run with multiple observables - One sample measured under different conditions

Parameters:
  • measurement_sets (list[MeasurementSet]) – List of MeasurementSet objects from this experiment.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for the entire experiment (e.g., sample ID, experimental date, operator).

  • details (dict[str, Any] | None, optional) – Additional context for this experiment (e.g., sample description, experimental notes, quality flags).

measurement_sets

Immutable tuple of measurement sets in this experiment.

Type:

tuple[MeasurementSet, ]

conditions

Experimental conditions for this experiment.

Type:

dict[str, Any]

details

Additional metadata for this experiment.

Type:

dict[str, Any]

Notes

The measurement sets are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual measurement sets can be accessed by indexing or iteration.

Hierarchy level: ExperimentSet → Experiment → MeasurementSet → Measurement → Dataset

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import (
...     Measurement, MeasurementSet, Experiment
... )
>>>
>>> # Create first measurement set (absorption spectra)
>>> x_abs = np.linspace(400, 800, 200)
>>> measurements_abs = []
>>> for i in range(3):
...     y = np.exp(-(x_abs - 550)**2 / 1000) * (1 + 0.1 * i)
...     ds = OneDimensionalDataset(x_abs, y)
...     m = Measurement([ds], conditions={"replicate": i+1})
...     measurements_abs.append(m)
>>> ms_abs = MeasurementSet(
...     measurements_abs,
...     conditions={"measurement_type": "absorption"}
... )
>>>
>>> # Create second measurement set (fluorescence spectra)
>>> x_fl = np.linspace(500, 900, 200)
>>> measurements_fl = []
>>> for i in range(3):
...     y = np.exp(-(x_fl - 650)**2 / 1500) * (0.8 + 0.1 * i)
...     ds = OneDimensionalDataset(x_fl, y)
...     m = Measurement([ds], conditions={"replicate": i+1})
...     measurements_fl.append(m)
>>> ms_fl = MeasurementSet(
...     measurements_fl,
...     conditions={"measurement_type": "fluorescence"}
... )
>>>
>>> # Create experiment combining both measurement types
>>> exp = Experiment(
...     measurement_sets=[ms_abs, ms_fl],
...     conditions={"sample": "S001", "date": "2025-10-18"},
...     details={"operator": "Jane Doe", "instrument": "Spec-X"}
... )
>>>
>>> len(exp)
2
>>> exp.conditions["sample"]
'S001'
>>> exp[0].conditions["measurement_type"]
'absorption'
>>> exp[1].conditions["measurement_type"]
'fluorescence'
Attributes:
conditions

Get experimental conditions for this experiment.

details

Get additional details for this experiment.

measurement_sets

Get all measurement sets in this experiment.

__init__(measurement_sets, conditions=None, details=None)[source]

Initialize Experiment with measurement sets and metadata.

Parameters:
  • measurement_sets (list[MeasurementSet]) – List of MeasurementSet objects from this experiment.

  • conditions (dict[str, Any] | None, optional) – Experimental conditions for this experiment.

  • details (dict[str, Any] | None, optional) – Additional context for this experiment.

property measurement_sets: tuple[MeasurementSet, ...]

Get all measurement sets in this experiment.

No-index:

Returns:

Immutable tuple of MeasurementSet objects.

Return type:

tuple[MeasurementSet, ]

Examples

>>> exp.measurement_sets
(<MeasurementSet at 0x...>, <MeasurementSet at 0x...>)
property conditions: dict[str, Any]

Get experimental conditions for this experiment.

No-index:

Returns:

Dictionary of experimental conditions (sample, date, operator, etc.).

Return type:

dict[str, Any]

Examples

>>> exp.conditions
{'sample': 'S001', 'date': '2025-10-18', 'temperature': 25.0}
property details: dict[str, Any]

Get additional details for this experiment.

No-index:

Returns:

Dictionary of additional context (notes, quality, instrument, etc.).

Return type:

dict[str, Any]

Examples

>>> exp.details
{'operator': 'Jane Doe', 'instrument': 'Spec-X', 'notes': 'Good quality'}
__len__()[source]

Get number of measurement sets in this experiment.

Returns:

Number of measurement sets.

Return type:

int

Examples

>>> len(exp)
2
__iter__()[source]

Iterate over measurement sets in this experiment.

Yields:

MeasurementSet – Each measurement set in order.

Examples

>>> for ms in exp:
...     print(ms.conditions["measurement_type"])
absorption
fluorescence
__getitem__(index)[source]

Get measurement set by index.

Parameters:

index (int or slice) – Index or slice to access measurement sets.

Returns:

MeasurementSet at the given index, or tuple for slice.

Return type:

MeasurementSet or tuple[MeasurementSet, ]

Examples

>>> exp[0]
<MeasurementSet at 0x...>
>>> exp[0:2]
(<MeasurementSet at 0x...>, <MeasurementSet at 0x...>)

ExperimentSet

ExperimentSet class for piblin-jax.

Top-level container for multiple Experiment objects representing a study or project.

class piblin_jax.data.collections.experiment_set.ExperimentSet(experiments, conditions=None, details=None)[source]

Bases: object

Top-level container for multiple Experiment objects.

An ExperimentSet represents the highest level of the data hierarchy, typically corresponding to: - Complete research project or study - Multi-sample analysis - Entire experimental campaign - Publication dataset

This is the entry point for organizing and managing entire experimental datasets with consistent metadata and structure.

Parameters:
  • experiments (list[Experiment]) – List of Experiment objects in this set.

  • conditions (dict[str, Any] | None, optional) – Global conditions for the entire study (e.g., project name, year, instrument, principal investigator).

  • details (dict[str, Any] | None, optional) – Additional context for the study (e.g., publication info, funding source, study objectives).

experiments

Immutable tuple of experiments in this set.

Type:

tuple[Experiment, ]

conditions

Global metadata for the entire study.

Type:

dict[str, Any]

details

Additional metadata for the study.

Type:

dict[str, Any]

Notes

The experiments are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual experiments can be accessed by indexing or iteration.

Hierarchy level: ExperimentSet → Experiment → MeasurementSet → Measurement → Dataset

This is the top level of the hierarchy and provides global context for all contained data.

Examples

>>> import numpy as np
>>> from piblin_jax.data.datasets import OneDimensionalDataset
>>> from piblin_jax.data.collections import (
...     Measurement, MeasurementSet, Experiment, ExperimentSet
... )
>>>
>>> # Create experiments for multiple samples
>>> experiments = []
>>>
>>> for sample_id in ['S001', 'S002', 'S003']:
...     # Create measurements for this sample
...     x = np.linspace(0, 10, 100)
...     y = np.sin(x) * (ord(sample_id[-1]) - ord('0'))
...     ds = OneDimensionalDataset(x, y)
...     m = Measurement([ds])
...     ms = MeasurementSet([m])
...
...     # Create experiment for this sample
...     exp = Experiment(
...         [ms],
...         conditions={"sample": sample_id, "date": "2025-10-18"}
...     )
...     experiments.append(exp)
>>>
>>> # Create experiment set for the complete study
>>> study = ExperimentSet(
...     experiments=experiments,
...     conditions={
...         "project": "QuantIQ-2025",
...         "instrument": "Spectrometer-X",
...         "year": 2025
...     },
...     details={
...         "pi": "Dr. Jane Smith",
...         "funding": "NSF Grant 12345",
...         "description": "Comparative spectroscopy study"
...     }
... )
>>>
>>> len(study)
3
>>> study.conditions["project"]
'QuantIQ-2025'
>>> study[0].conditions["sample"]
'S001'
>>> study.details["pi"]
'Dr. Jane Smith'
Attributes:
conditions

Get global conditions for the entire study.

details

Get additional details for the study.

experiments

Get all experiments in this set.

Methods

get_experiment_by_condition(**condition_filters)

Get experiments matching specified conditions.

__init__(experiments, conditions=None, details=None)[source]

Initialize ExperimentSet with experiments and metadata.

Parameters:
  • experiments (list[Experiment]) – List of Experiment objects in this set.

  • conditions (dict[str, Any] | None, optional) – Global conditions for the entire study.

  • details (dict[str, Any] | None, optional) – Additional context for the study.

property experiments: tuple[Experiment, ...]

Get all experiments in this set.

No-index:

Returns:

Immutable tuple of Experiment objects.

Return type:

tuple[Experiment, ]

Examples

>>> study.experiments
(<Experiment at 0x...>, <Experiment at 0x...>, <Experiment at 0x...>)
property conditions: dict[str, Any]

Get global conditions for the entire study.

No-index:

Returns:

Dictionary of global metadata (project, year, instrument, etc.).

Return type:

dict[str, Any]

Examples

>>> study.conditions
{'project': 'QuantIQ-2025', 'instrument': 'Spectrometer-X', 'year': 2025}
property details: dict[str, Any]

Get additional details for the study.

No-index:

Returns:

Dictionary of additional context (PI, funding, objectives, etc.).

Return type:

dict[str, Any]

Examples

>>> study.details
{'pi': 'Dr. Jane Smith', 'funding': 'NSF Grant 12345', 'description': '...'}
__len__()[source]

Get number of experiments in this set.

Returns:

Number of experiments.

Return type:

int

Examples

>>> len(study)
3
__iter__()[source]

Iterate over experiments in this set.

Yields:

Experiment – Each experiment in order.

Examples

>>> for exp in study:
...     print(exp.conditions["sample"])
S001
S002
S003
__getitem__(index)[source]

Get experiment by index.

Parameters:

index (int or slice) – Index or slice to access experiments.

Returns:

Experiment at the given index, or tuple of experiments for slice.

Return type:

Experiment or tuple[Experiment, ]

Examples

>>> study[0]
<Experiment at 0x...>
>>> study[0:2]
(<Experiment at 0x...>, <Experiment at 0x...>)
get_experiment_by_condition(**condition_filters)[source]

Get experiments matching specified conditions.

Parameters:

**condition_filters (Any) – Keyword arguments specifying condition values to match. Only experiments where ALL specified conditions match the given values will be included.

Returns:

List of experiments matching the conditions.

Return type:

list[Experiment]

Examples

>>> # Get all experiments for sample S001
>>> s001_exps = study.get_experiment_by_condition(sample="S001")
>>> len(s001_exps)
1
>>> s001_exps[0].conditions["sample"]
'S001'
>>>
>>> # Get experiments matching multiple conditions
>>> dated_exps = study.get_experiment_by_condition(
...     date="2025-10-18",
...     sample="S002"
... )

Utilities

Metadata

Metadata utilities for managing, validating, extracting, and merging metadata.

This module provides utilities for working with metadata (conditions and details) across the data hierarchy. Metadata is separated into:

  • Conditions: Experimental parameters that define comparability between datasets (e.g., temperature, pressure, concentration)

  • Details: Contextual information that doesn’t affect experimental conditions (e.g., operator, date, notes)

The module supports: - Merging metadata from multiple sources with configurable conflict resolution - Separating conditions from details using explicit keys or heuristics - Validating metadata against schemas with type checking - Extracting metadata from filenames, paths, and file headers

piblin_jax.data.metadata.merge_metadata(metadata_list, strategy='override')[source]

Merge multiple metadata dictionaries.

Combines metadata from multiple sources with configurable conflict resolution. Metadata dictionaries are processed in order, with later dictionaries having higher priority (for ‘override’ strategy).

Parameters:
  • metadata_list (list[dict[str, Any]]) – List of metadata dictionaries to merge (in priority order). Earlier dictionaries have lower priority for conflict resolution.

  • strategy (str, optional) –

    Conflict resolution strategy (default: “override”):

    • ’override’: Later values override earlier ones

    • ’keep_first’: Keep first value encountered

    • ’raise’: Raise ValueError on conflicts

    • ’list’: Collect conflicting values in a list (duplicates removed)

Returns:

Merged metadata dictionary

Return type:

dict[str, Any]

Raises:

ValueError – If strategy is ‘raise’ and conflicts are detected, or if strategy is unknown

Examples

>>> meta1 = {"temp": 20, "sample": "A1"}
>>> meta2 = {"temp": 25, "pressure": 1.0}
>>> merge_metadata([meta1, meta2])
{'temp': 25, 'sample': 'A1', 'pressure': 1.0}
>>> merge_metadata([meta1, meta2], strategy="keep_first")
{'temp': 20, 'sample': 'A1', 'pressure': 1.0}
>>> merge_metadata([meta1, meta2], strategy="list")
{'temp': [20, 25], 'sample': 'A1', 'pressure': 1.0}
piblin_jax.data.metadata.separate_conditions_details(metadata, condition_keys=None)[source]

Separate metadata into conditions and details.

Conditions are experimental parameters that define comparability between datasets (e.g., temperature, pressure). Details are contextual information (e.g., operator, date, notes).

Parameters:
  • metadata (dict[str, Any]) – Combined metadata dictionary

  • condition_keys (list[str] | None, optional) – Known condition keys (experimental parameters). If None, heuristics are used to identify conditions based on common experimental parameter names.

Returns:

  • conditions (dict[str, Any]) – Experimental conditions (parameters defining comparability)

  • details (dict[str, Any]) – Context information (non-experimental metadata)

Return type:

tuple[dict[str, Any], dict[str, Any]]

Examples

>>> metadata = {"temp": 25, "pressure": 1.0, "operator": "John"}
>>> conditions, details = separate_conditions_details(
...     metadata,
...     condition_keys=["temp", "pressure"]
... )
>>> conditions
{'temp': 25, 'pressure': 1.0}
>>> details
{'operator': 'John'}

Using heuristics:

>>> metadata = {"temperature": 25, "strain": 0.1, "notes": "Trial 1"}
>>> conditions, details = separate_conditions_details(metadata)
>>> "temperature" in conditions
True
>>> "notes" in details
True
piblin_jax.data.metadata.validate_metadata(metadata, schema=None, required_keys=None)[source]

Validate metadata against a schema.

Performs type checking and required key validation. Validation is optional and can be configured with schema and required_keys parameters.

Parameters:
  • metadata (dict[str, Any]) – Metadata to validate

  • schema (dict[str, type | Callable[[Any], bool]] | None, optional) –

    Schema defining expected types or validation functions. Keys are metadata field names, values are either:

    • Type objects (e.g., float, str, int) for type checking

    • Callable validators that return True if valid

    Example: {'temperature': float, 'sample_id': str}

  • required_keys (list[str] | None, optional) – Keys that must be present in metadata

Returns:

True if valid

Return type:

bool

Raises:

ValueError – If validation fails (missing required keys, type mismatch, or custom validation function returns False)

Examples

Type checking:

>>> metadata = {"temp": 25.0, "sample": "A1"}
>>> schema = {"temp": float, "sample": str}
>>> validate_metadata(metadata, schema=schema)
True

Required keys:

>>> validate_metadata(metadata, required_keys=["temp", "sample"])
True

Custom validation:

>>> schema = {"ph": lambda x: 0 <= x <= 14}
>>> validate_metadata({"ph": 7.0}, schema=schema)
True
piblin_jax.data.metadata.parse_key_value_string(text, separator='=', delimiter=',')[source]

Parse key-value pairs from a string.

Extracts metadata from delimited key-value strings commonly found in filenames, headers, or configuration strings.

Parameters:
  • text (str) – String containing key-value pairs. Example: "temp=25,pressure=1.0,sample=A1"

  • separator (str, optional) – Character separating keys from values (default: “=”)

  • delimiter (str, optional) – Character separating pairs (default: “,”)

Returns:

Parsed metadata (all values are strings, convert as needed)

Return type:

dict[str, str]

Examples

>>> parse_key_value_string("temp=25,pressure=1.0")
{'temp': '25', 'pressure': '1.0'}
>>> parse_key_value_string("temp:25;pressure:1.0", separator=":", delimiter=";")
{'temp': '25', 'pressure': '1.0'}
piblin_jax.data.metadata.extract_from_filename(filename, pattern=None)[source]

Extract metadata from filename using regex pattern.

Parses filenames to extract metadata using either custom regex patterns or common heuristics for scientific data files.

Parameters:
  • filename (str | Path) – Filename or path (extension is removed before matching)

  • pattern (str | None, optional) – Regex pattern with named groups for extraction. If None, uses common heuristics for sample names, temperatures, and replicate numbers.

Returns:

Extracted metadata (all values are strings)

Return type:

dict[str, str]

Examples

Using heuristics:

>>> extract_from_filename("sample_A1_temp_25C_001.csv")
{'sample': 'A1', 'temp': '25', 'replicate': '001'}

Using custom pattern:

>>> pattern = r"(?P<sample>\w+)_(?P<temp>\d+)C"
>>> extract_from_filename("sample_A1_25C.csv", pattern)
{'sample': 'A1', 'temp': '25'}
piblin_jax.data.metadata.extract_from_path(filepath, level_names=None)[source]

Extract metadata from directory structure.

Parses directory hierarchy to extract metadata based on directory names at different levels.

Parameters:
  • filepath (str | Path) – File path

  • level_names (list[str] | None, optional) – Names for each directory level (from deepest to root). Example: ['sample', 'experiment', 'project'] extracts sample from parent directory, experiment from grandparent, etc. If None, returns empty dict.

Returns:

Extracted metadata

Return type:

dict[str, str]

Examples

>>> extract_from_path(
...     "/data/ProjectA/ExpB/SampleC/data.csv",
...     ['sample', 'experiment', 'project']
... )
{'sample': 'SampleC', 'experiment': 'ExpB', 'project': 'ProjectA'}
piblin_jax.data.metadata.parse_header_metadata(header_lines, comment_char='#', separator=':')[source]

Parse metadata from file header comment lines.

Extracts metadata from comment lines in file headers, commonly used in scientific data files to store experimental conditions and context.

Parameters:
  • header_lines (list[str]) – Lines from file header

  • comment_char (str, optional) – Comment character (default: “#”)

  • separator (str, optional) – Character separating keys from values (default: “:”)

Returns:

Parsed metadata (all values are strings)

Return type:

dict[str, str]

Examples

>>> lines = [
...     "# Temperature: 25",
...     "# Pressure: 1.0",
...     "# Sample: A1"
... ]
>>> parse_header_metadata(lines)
{'Temperature': '25', 'Pressure': '1.0', 'Sample': 'A1'}

With custom separators:

>>> lines = ["// Temp = 25", "// Sample = A1"]
>>> parse_header_metadata(lines, comment_char="//", separator="=")
{'Temp': '25', 'Sample': 'A1'}

Region of Interest (ROI)

Region of Interest (ROI) classes for piblin-jax.

This module provides classes for defining regions on independent variables: - LinearRegion: Contiguous region on a 1D independent variable - CompoundRegion: Container for multiple LinearRegion objects (union)

Regions are used with RegionTransform to apply transformations only within specified regions while preserving data outside those regions.

class piblin_jax.data.roi.CompoundRegion(regions)[source]

Bases: object

Container for multiple LinearRegion objects (union of regions).

A CompoundRegion represents the union of multiple disjoint or overlapping LinearRegion objects. It generates combined masks that include all points in any of the constituent regions.

Parameters:

regions (list[LinearRegion]) – List of LinearRegion objects

Raises:

Examples

>>> import numpy as np
>>> from piblin_jax.data.roi import LinearRegion, CompoundRegion
>>> # Define two disjoint regions
>>> region1 = LinearRegion(x_min=1.0, x_max=2.0)
>>> region2 = LinearRegion(x_min=4.0, x_max=5.0)
>>> compound = CompoundRegion([region1, region2])
>>> # Generate combined mask
>>> x_data = np.array([0, 1, 2, 3, 4, 5, 6])
>>> mask = compound.get_mask(x_data)
>>> mask
array([False,  True,  True, False,  True,  True, False])
>>> # Extract data from both regions
>>> x_data[mask]
array([1, 2, 4, 5])

Notes

  • The mask is the union (OR) of all constituent region masks

  • Regions can be disjoint or overlapping

  • Access individual regions using indexing: compound[0], compound[1], etc.

  • Get number of regions using len(compound)

Methods

get_mask(x_data)

Generate combined boolean mask (union of all regions).

__init__(regions)[source]

Initialize CompoundRegion.

Parameters:

regions (list[LinearRegion]) – List of LinearRegion objects

Raises:
get_mask(x_data)[source]

Generate combined boolean mask (union of all regions).

Creates a boolean array where True indicates points within any of the constituent regions.

Parameters:

x_data (np.ndarray) – Independent variable data

Returns:

Boolean mask (True for points in any region)

Return type:

np.ndarray

Examples

>>> region1 = LinearRegion(x_min=1.0, x_max=2.0)
>>> region2 = LinearRegion(x_min=4.0, x_max=5.0)
>>> compound = CompoundRegion([region1, region2])
>>> x_data = np.array([0, 1, 2, 3, 4, 5, 6])
>>> compound.get_mask(x_data)
array([False,  True,  True, False,  True,  True, False])
__len__()[source]

Return number of regions.

__getitem__(index)[source]

Get region by index.

__repr__()[source]

Return string representation of CompoundRegion.

class piblin_jax.data.roi.LinearRegion(x_min, x_max)[source]

Bases: object

Represents a contiguous region on a 1D independent variable.

A LinearRegion defines a contiguous range [x_min, x_max] (inclusive) on an independent variable. It can generate boolean masks to select data points within this range.

Parameters:
  • x_min (float) – Lower bound (inclusive)

  • x_max (float) – Upper bound (inclusive)

Raises:

ValueError – If x_min >= x_max

Examples

>>> import numpy as np
>>> from piblin_jax.data.roi import LinearRegion
>>> # Define region from 2.0 to 5.0
>>> region = LinearRegion(x_min=2.0, x_max=5.0)
>>> # Generate mask for data
>>> x_data = np.array([0, 1, 2, 3, 4, 5, 6, 7])
>>> mask = region.get_mask(x_data)
>>> mask
array([False, False,  True,  True,  True,  True, False, False])
>>> # Extract data within region
>>> x_data[mask]
array([2, 3, 4, 5])

Notes

  • Bounds are inclusive: both x_min and x_max are included in the region

  • Masks are generated using NumPy arrays for compatibility

  • Use with RegionTransform to apply selective transformations

Methods

get_mask(x_data)

Generate boolean mask for data within region.

__init__(x_min, x_max)[source]

Initialize LinearRegion.

Parameters:
  • x_min (float) – Lower bound (inclusive)

  • x_max (float) – Upper bound (inclusive)

Raises:

ValueError – If x_min >= x_max

get_mask(x_data)[source]

Generate boolean mask for data within region.

Creates a boolean array where True indicates points within the region [x_min, x_max] (inclusive).

Parameters:

x_data (np.ndarray) – Independent variable data

Returns:

Boolean mask (True for points in region)

Return type:

np.ndarray

Examples

>>> region = LinearRegion(x_min=2.0, x_max=5.0)
>>> x_data = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> region.get_mask(x_data)
array([False,  True,  True,  True,  True, False])
__repr__()[source]

Return string representation of LinearRegion.