Data Structures
Overview
The piblin_jax.data module provides the core data structures for representing
measurement data in piblin_jax. It implements a hierarchical system for organizing
experimental data, from individual measurements to complex experimental campaigns
with multiple conditions and replicates.
The module is built around three key concepts:
Immutability: All data structures are immutable by design, ensuring data integrity and enabling safe sharing across transforms and analyses. Once created, datasets cannot be modified; instead, transformations return new datasets.
Type Safety: Each dataset type (0D, 1D, 2D, 3D) has specific guarantees about its structure. This type system enables compile-time validation and better IDE support, while maintaining flexibility through metadata.
Hierarchical Organization: Data is organized in a natural hierarchy: Dataset → Measurement → MeasurementSet → Experiment → ExperimentSet. This mirrors typical experimental workflows where you collect replicate measurements under various conditions.
The module also provides comprehensive metadata support with validation, merging utilities, and Region of Interest (ROI) definitions for selective data processing.
Quick Examples
Creating a 1D Dataset
The most common use case is creating a 1D dataset from arrays:
from piblin_jax.data.datasets import OneDimensionalDataset
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
dataset = OneDimensionalDataset(x=x, y=y, name="Sine Wave")
Working with Metadata
Datasets support rich metadata for tracking experimental conditions:
dataset = OneDimensionalDataset(
x=x, y=y,
name="Viscosity vs Shear Rate",
metadata={
"temperature": 25.0,
"sample_id": "ABC123",
"operator": "Jane Doe",
"timestamp": "2024-01-15T10:30:00"
}
)
# Access metadata
temp = dataset.metadata["temperature"]
Building Collections
Organize multiple measurements into collections:
from piblin_jax.data.collections import Measurement, MeasurementSet
# Create a measurement with multiple datasets
measurement = Measurement(
datasets=[dataset1, dataset2, dataset3],
metadata={"replicate": 1, "temperature": 25.0}
)
# Group measurements into a set
measurement_set = MeasurementSet(
measurements=[meas1, meas2, meas3],
metadata={"experiment_id": "EXP001"}
)
See Also
Transformations - Data transformation API
Data I/O - Reading and writing data files
JAX Array API - Underlying array operations
API Reference
Module Contents
Data types and utilities for piblin-jax.
This package provides the core data structures for measurement data science: - Datasets: Typed array containers (0D, 1D, 2D, 3D, composite, distributions) - Collections: Hierarchical measurement organization (Measurement, MeasurementSet, Experiment, ExperimentSet) - Metadata: Structured conditions and details with validation and merging - ROI: Region of interest definitions for selective data analysis
## Package Structure
### Datasets Module (piblin_jax.data.datasets)
Core dataset classes for different dimensionalities: - ZeroDimensionalDataset: Scalar values with metadata - OneDimensionalDataset: Paired (x, y) data (time series, spectra, etc.) - TwoDimensionalDataset: 2D grid data (heatmaps, images) - ThreeDimensionalDataset: 3D volumetric data - OneDimensionalCompositeDataset: Multiple dependent variables with shared x-axis - Histogram: Binned frequency distributions - Distribution: Probability density functions
All datasets include: - JAX/NumPy backend abstraction for performance - Metadata system (conditions and details) - Uncertainty quantification support - Immutable design for functional programming - Type-safe API with comprehensive type hints
### Collections Module (piblin_jax.data.collections)
Hierarchical organization for experimental data:
ExperimentSet (top level)
└── Experiment (single experimental condition set)
└── MeasurementSet (group of related measurements)
└── Measurement (individual measurement with datasets)
Collection Types: - Measurement: Container for related datasets from one measurement - MeasurementSet: Group of measurements (e.g., replicate trials) - ConsistentMeasurementSet: Enforces same conditions across measurements - TabularMeasurementSet: Optimized for tabular data access - TidyMeasurementSet: Tidy (long-form) data representation - Experiment: Collection of measurement sets under same conditions - ExperimentSet: Top-level container for multiple experiments
### Metadata Module (piblin_jax.data.metadata)
Metadata management utilities: - Merging: Combine metadata from multiple sources with conflict resolution - Validation: Type checking and schema validation - Extraction: Parse metadata from filenames, paths, and file headers - Separation: Distinguish experimental conditions from details
Supported Operations: - merge_metadata() - Combine metadata with strategies (override, keep_first, raise, list) - validate_metadata() - Validate against schemas with type checking - extract_from_filename() - Parse metadata from file naming patterns - extract_from_path() - Extract metadata from directory structure - parse_header_metadata() - Parse comment headers in data files - separate_conditions_details() - Split metadata into conditions and details
### ROI Module (piblin_jax.data.roi)
Region of interest definitions for selective analysis: - ROI: Base class for defining regions in datasets - Support for 1D, 2D, and 3D regions - Boolean masking and index-based selection - Integration with transform pipeline
## Usage Examples
### Basic Dataset Creation
Example:
import numpy as np
from piblin_jax.data.datasets import OneDimensionalDataset
# Create 1D dataset
x = np.linspace(0, 10, 100)
y = np.sin(x)
dataset = OneDimensionalDataset(
independent_variable_data=x,
dependent_variable_data=y,
conditions={"temperature": 25.0, "sample": "A"},
details={"operator": "John", "date": "2025-01-15"}
)
### Building Hierarchical Collections
Example:
from piblin_jax.data.collections import Measurement, MeasurementSet, Experiment
# Create measurements
m1 = Measurement({"dataset1": dataset1, "dataset2": dataset2})
m2 = Measurement({"dataset1": dataset3, "dataset2": dataset4})
# Group into measurement set
mset = MeasurementSet([m1, m2])
# Create experiment
experiment = Experiment({"trial1": mset})
### Metadata Operations
Example:
from piblin_jax.data import metadata
# Merge metadata from multiple sources
file_meta = metadata.extract_from_filename("sample_A1_25C.csv")
path_meta = {"experiment": "viscosity"}
combined = metadata.merge_metadata([file_meta, path_meta])
# Validate against schema
schema = {"temperature": float, "sample": str}
metadata.validate_metadata(combined, schema=schema)
# Separate conditions from details
conditions, details = metadata.separate_conditions_details(
combined,
condition_keys=["temperature", "pressure"]
)
## Design Principles
Type Safety: Comprehensive type hints for all public APIs
Immutability: Datasets are immutable by design (functional programming)
Backend Agnostic: JAX for performance, NumPy for compatibility
Metadata-First: Rich metadata support throughout the hierarchy
Hierarchical Organization: Natural experiment → measurement → dataset structure
Extensibility: Easy to add custom dataset types and collection classes
## See Also
piblin_jax.transform - Transform pipelines for data processing
piblin_jax.bayesian - Bayesian uncertainty quantification
piblin_jax.dataio - File I/O for reading experimental data
piblin_jax.backend - Backend abstraction layer (JAX/NumPy)
Datasets
Base Dataset
Base dataset class for piblin-jax.
Provides the abstract base class for all dataset types with metadata support.
- class piblin_jax.data.datasets.base.Dataset(conditions=None, details=None)[source]
Bases:
ABCAbstract base class for all dataset types.
All piblin-jax datasets inherit from this class and provide: - Metadata system (conditions and details) - Internal storage using backend arrays (JAX or NumPy) - External NumPy conversion for API boundaries - Immutable design for JAX compatibility
- Parameters:
conditions (
dict[str,Any] | None, optional) – Experimental conditions (temperature, pressure, flow rate, etc.). Default is empty dict.details (
dict[str,Any] | None, optional) – Additional context (sample ID, operator, instrument, date, etc.). Default is empty dict.
- conditions
Experimental conditions associated with the dataset.
- Type:
dict[str,Any]
- details
Additional metadata and context for the dataset.
- Type:
dict[str,Any]
Notes
This class cannot be instantiated directly. Use one of the concrete dataset types: - ZeroDimensionalDataset (0D) - OneDimensionalDataset (1D) - TwoDimensionalDataset (2D) - ThreeDimensionalDataset (3D) - Histogram - Distribution - OneDimensionalCompositeDataset
The dataset uses an immutable design pattern to ensure compatibility with JAX transformations (jit, grad, vmap). Arrays are stored internally as backend arrays (JAX DeviceArray when available, NumPy ndarray otherwise) and converted to NumPy arrays when accessed through properties.
Examples
>>> from piblin_jax.data.datasets import OneDimensionalDataset >>> import numpy as np >>> x = np.linspace(0, 10, 100) >>> y = np.sin(x) >>> conditions = {"temperature": 25.0, "sample": "A"} >>> details = {"operator": "Jane Doe", "date": "2025-10-18"} >>> dataset = OneDimensionalDataset( ... independent_variable_data=x, ... dependent_variable_data=y, ... conditions=conditions, ... details=details ... ) >>> dataset.conditions["temperature"] 25.0 >>> type(dataset.independent_variable_data) <class 'numpy.ndarray'>
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
- __init__(conditions=None, details=None)[source]
Initialize Dataset with metadata.
- Parameters:
conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional context and metadata.
- property conditions: dict[str, Any]
Get experimental conditions.
- No-index:
- Returns:
Dictionary of experimental conditions (temperature, pressure, etc.).
- Return type:
dict[str,Any]
Examples
>>> dataset.conditions {'temperature': 25.0, 'pressure': 1.0, 'sample': 'A'}
- property details: dict[str, Any]
Get additional dataset details.
- No-index:
- Returns:
Dictionary of additional context (operator, instrument, date, etc.).
- Return type:
dict[str,Any]
Examples
>>> dataset.details {'operator': 'Jane Doe', 'instrument': 'Spectrometer X', 'date': '2025-10-18'}
- property has_uncertainty: bool
Check if dataset has uncertainty information.
- No-index:
- Returns:
True if dataset has uncertainty information, False otherwise.
- Return type:
Examples
>>> dataset.has_uncertainty False >>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000) >>> dataset_with_unc.has_uncertainty True
Notes
This property checks for the presence of either uncertainty samples or cached credible intervals. It does not validate the uncertainty quantification method or parameter values.
- property uncertainty_samples: Any | None
Get uncertainty samples (if keep_samples=True was used).
- No-index:
- Returns:
Posterior samples from Bayesian inference if keep_samples=True, None otherwise.
- Return type:
dict | None
Examples
>>> dataset_with_unc = dataset.with_uncertainty( ... n_samples=1000, ... method='bayesian', ... keep_samples=True ... ) >>> samples = dataset_with_unc.uncertainty_samples >>> sigma_samples = samples['sigma']
Notes
Storing samples can be memory-intensive for large datasets. Use keep_samples=False if you only need credible intervals.
- property credible_intervals: Any | None
Get cached credible intervals.
- No-index:
- Returns:
Cached credible intervals (lower, upper) if computed, None otherwise.
- Return type:
tuple | None
Examples
>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000) >>> intervals = dataset_with_unc.credible_intervals >>> if intervals is not None: ... lower, upper = intervals
Notes
Credible intervals are cached after computation to avoid recomputation. Use get_credible_intervals() to compute intervals with custom parameters.
- copy()[source]
Create a deep copy of this dataset.
- Returns:
A new dataset instance with copied data and metadata.
- Return type:
Dataset
Examples
>>> dataset_copy = dataset.copy() >>> dataset_copy.conditions is not dataset.conditions True
Notes
This creates a deep copy of all data arrays, metadata, and uncertainty information. The copied dataset is completely independent of the original.
Zero-Dimensional Dataset
Zero-dimensional dataset for scalar values.
Used for single values like steady-state measurements, summary statistics, or aggregated results.
- class piblin_jax.data.datasets.zero_dimensional.ZeroDimensionalDataset(value, conditions=None, details=None)[source]
Bases:
DatasetZero-dimensional dataset containing a single scalar value.
This dataset type represents a single measured or calculated value, such as a steady-state measurement, summary statistic, or aggregated result.
- Parameters:
value (
float) – The scalar value to store.conditions (
dict[str,Any] | None, optional) – Experimental conditions associated with this measurement.details (
dict[str,Any] | None, optional) – Additional context and metadata.
- value
The scalar value (converted to Python float).
- Type:
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
Examples
>>> from piblin_jax.data.datasets import ZeroDimensionalDataset >>> # Steady-state temperature measurement >>> temp = ZeroDimensionalDataset( ... value=98.6, ... conditions={"location": "oral", "patient_id": "12345"}, ... details={"units": "fahrenheit", "instrument": "thermometer"} ... ) >>> temp.value 98.6
>>> # Summary statistic >>> mean_concentration = ZeroDimensionalDataset( ... value=2.5e-3, ... conditions={"sample": "batch_42"}, ... details={"units": "mol/L", "statistic": "mean"} ... ) >>> mean_concentration.value 0.0025
Notes
The value is stored internally as a backend array (JAX or NumPy scalar) and converted to a Python float when accessed through the value property. This ensures compatibility with both JAX transformations and standard Python numeric operations.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
valueGet the scalar value as a Python float.
Methods
copy()Create a deep copy of this dataset.
- __init__(value, conditions=None, details=None)[source]
Initialize zero-dimensional dataset with a scalar value.
- Parameters:
value (
float) – The scalar value to store.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
One-Dimensional Dataset
One-dimensional dataset with independent and dependent variables.
This is the most common dataset type, used for time series, spectra, chromatograms, and other 1D data.
- class piblin_jax.data.datasets.one_dimensional.OneDimensionalDataset(independent_variable_data, dependent_variable_data, conditions=None, details=None)[source]
Bases:
DatasetOne-dimensional dataset with independent and dependent variables.
This is the most common dataset type, representing paired (x, y) data such as: - Time series measurements - Spectroscopy data (wavelength vs. absorbance) - Chromatography traces (time vs. detector response) - Titration curves (volume vs. pH)
- Parameters:
independent_variable_data (
array_like) – 1D array of independent variable values (e.g., time, wavelength).dependent_variable_data (
array_like) – 1D array of dependent variable values (e.g., signal, absorbance).conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- independent_variable_data
Independent variable as NumPy array.
- Type:
np.ndarray
- dependent_variable_data
Dependent variable as NumPy array.
- Type:
np.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If independent and dependent arrays have different shapes.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> # Time series data >>> time = np.linspace(0, 10, 100) >>> signal = np.sin(time) >>> dataset = OneDimensionalDataset( ... independent_variable_data=time, ... dependent_variable_data=signal, ... conditions={"temperature": 25.0, "sample": "A"}, ... details={"instrument": "oscilloscope", "sampling_rate": 10.0} ... ) >>> dataset.independent_variable_data.shape (100,) >>> dataset.dependent_variable_data.shape (100,)
>>> # Spectroscopy data >>> wavelength = np.linspace(200, 800, 500) >>> absorbance = np.exp(-((wavelength - 450) ** 2) / 5000) >>> spectrum = OneDimensionalDataset( ... independent_variable_data=wavelength, ... dependent_variable_data=absorbance, ... conditions={"concentration": 1e-5, "solvent": "water"}, ... details={"units_x": "nm", "units_y": "AU"} ... )
Notes
Arrays are stored internally as backend arrays (JAX DeviceArray when available, NumPy ndarray otherwise) and converted to NumPy arrays when accessed through properties. This ensures compatibility with JAX transformations while maintaining a NumPy-compatible API.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
dependent_variable_dataGet dependent variable data as NumPy array.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
independent_variable_dataGet independent variable data as NumPy array.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
get_credible_intervals([level, method])Get credible intervals for dependent variable.
visualize([show_uncertainty, level, ...])Visualize the 1D dataset with optional uncertainty bands.
with_uncertainty([n_samples, method, ...])Add uncertainty quantification to dataset.
- __init__(independent_variable_data, dependent_variable_data, conditions=None, details=None)[source]
Initialize one-dimensional dataset.
- Parameters:
independent_variable_data (
array_like) – 1D array of independent variable values.dependent_variable_data (
array_like) – 1D array of dependent variable values.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If arrays have different shapes.
- property independent_variable_data: ndarray
Get independent variable data as NumPy array.
- No-index:
- Returns:
1D NumPy array of independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data array([0., 0.1, 0.2, ..., 9.8, 9.9, 10.])
- property dependent_variable_data: ndarray
Get dependent variable data as NumPy array.
- No-index:
- Returns:
1D NumPy array of dependent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.dependent_variable_data array([0.000, 0.099, 0.198, ..., -0.544, -0.456, -0.544])
- with_uncertainty(n_samples=1000, method='bayesian', keep_samples=False, level=0.95)[source]
Add uncertainty quantification to dataset.
This method creates a new dataset with uncertainty information computed using the specified method. The original dataset is not modified.
- Parameters:
n_samples (
int, optional) – Number of samples for uncertainty quantification (default: 1000)method (
str, optional) – Method for uncertainty quantification (default: ‘bayesian’): - ‘bayesian’: NumPyro MCMC sampling - ‘bootstrap’: Bootstrap resampling (not yet implemented) - ‘analytical’: Analytical uncertainty propagation (not yet implemented)keep_samples (
bool, optional) – If True, store full posterior samples (default: False)level (
float, optional) – Credible interval level (default: 0.95)
- Returns:
New dataset with uncertainty information
- Return type:
OneDimensionalDataset- Raises:
NotImplementedError – If method is not ‘bayesian’
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> x = np.linspace(0, 10, 50) >>> y = 2.0 * x + 1.0 + 0.1 * np.random.randn(len(x)) >>> dataset = OneDimensionalDataset( ... independent_variable_data=x, ... dependent_variable_data=y ... ) >>> # Add Bayesian uncertainty >>> dataset_with_unc = dataset.with_uncertainty( ... n_samples=1000, ... method='bayesian', ... keep_samples=False, ... level=0.95 ... ) >>> dataset_with_unc.has_uncertainty True >>> lower, upper = dataset_with_unc.credible_intervals >>> # With full samples >>> dataset_with_samples = dataset.with_uncertainty( ... n_samples=1000, ... keep_samples=True ... ) >>> samples = dataset_with_samples.uncertainty_samples >>> sigma_samples = samples['sigma']
Notes
Currently only the ‘bayesian’ method is implemented. This uses a simple Gaussian noise model to estimate measurement uncertainty. Future versions will support custom priors and more sophisticated models.
The method creates a copy of the dataset to preserve immutability.
- get_credible_intervals(level=0.95, method='eti')[source]
Get credible intervals for dependent variable.
- Parameters:
- Returns:
(lower_bound, upper_bound) arrays with same shape as dependent variable
- Return type:
tuple[np.ndarray,np.ndarray]- Raises:
RuntimeError – If dataset has no uncertainty information
NotImplementedError – If method is not supported
Examples
>>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000) >>> lower, upper = dataset_with_unc.get_credible_intervals(level=0.95) >>> # 68% interval (approximately 1 sigma) >>> lower_68, upper_68 = dataset_with_unc.get_credible_intervals(level=0.68)
Notes
If credible intervals have been cached (from with_uncertainty call), they are returned directly. Otherwise, they are computed from stored uncertainty samples.
For the simple Gaussian noise model, the credible intervals represent the uncertainty in the measurement noise level, not the data points themselves.
- visualize(show_uncertainty=False, level=0.95, figsize=(10, 6), xlabel=None, ylabel=None, title=None, **kwargs)[source]
Visualize the 1D dataset with optional uncertainty bands.
Creates a line plot of the data with optional shaded uncertainty regions when the dataset has uncertainty information.
- Parameters:
show_uncertainty (
bool, defaultFalse) – If True and dataset has uncertainty, show shaded error bandslevel (
float, default0.95) – Credible interval level for uncertainty bands (e.g., 0.95 for 95% CI)figsize (
tuple[float,float], default(10,6)) – Figure size in inches (width, height)xlabel (
str, optional) – Label for x-axis. If None, uses “Independent Variable”ylabel (
str, optional) – Label for y-axis. If None, uses “Dependent Variable”title (
str, optional) – Plot title. If None, no title is shown**kwargs (Any) – Additional keyword arguments passed to matplotlib.pyplot.plot()
- Returns:
(fig, ax) matplotlib figure and axis objects
- Return type:
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> x = np.linspace(0, 10, 50) >>> y = 2.0 * x + 1.0 >>> dataset = OneDimensionalDataset( ... independent_variable_data=x, ... dependent_variable_data=y ... ) >>> fig, ax = dataset.visualize(xlabel='Time (s)', ylabel='Signal (V)')
>>> # With uncertainty >>> dataset_with_unc = dataset.with_uncertainty(n_samples=1000, method='bootstrap') >>> fig, ax = dataset_with_unc.visualize( ... show_uncertainty=True, ... level=0.95, ... xlabel='Time (s)', ... ylabel='Signal (V)' ... )
Notes
Requires matplotlib to be installed
For datasets with uncertainty, shaded bands show the credible intervals
Multiple confidence levels can be shown by calling visualize multiple times
Two-Dimensional Dataset
Two-dimensional dataset with two independent variables and 2D dependent data.
Used for data that varies with two parameters, such as time-temperature maps, spatial imaging data, or parameter sweeps.
- class piblin_jax.data.datasets.two_dimensional.TwoDimensionalDataset(independent_variable_data_1, independent_variable_data_2, dependent_variable_data, conditions=None, details=None)[source]
Bases:
DatasetTwo-dimensional dataset with two independent variables and a 2D dependent array.
This dataset type represents data that varies with two independent parameters: - Time-temperature maps (kinetics studies) - Spatial imaging data (microscopy, spectroscopy maps) - Parameter sweep experiments - Contour plots and heatmaps
- Parameters:
independent_variable_data_1 (
array_like) – 1D array of first independent variable (e.g., temperature, x-coordinate).independent_variable_data_2 (
array_like) – 1D array of second independent variable (e.g., time, y-coordinate).dependent_variable_data (
array_like) – 2D array of dependent variable values with shape (len(var1), len(var2)).conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- independent_variable_data_1
First independent variable as NumPy array.
- Type:
np.ndarray
- independent_variable_data_2
Second independent variable as NumPy array.
- Type:
np.ndarray
- dependent_variable_data
2D dependent variable as NumPy array.
- Type:
np.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If dimension compatibility is violated.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import TwoDimensionalDataset >>> # Temperature-time kinetics map >>> temperature = np.linspace(20, 100, 50) # 50 temperatures >>> time = np.linspace(0, 3600, 100) # 100 time points >>> # Reaction extent at each (temp, time) combination >>> extent = np.random.rand(50, 100) >>> dataset = TwoDimensionalDataset( ... independent_variable_data_1=temperature, ... independent_variable_data_2=time, ... dependent_variable_data=extent, ... conditions={"catalyst": "Pd/C", "solvent": "ethanol"}, ... details={"experiment_id": "KIN-2025-042"} ... ) >>> dataset.independent_variable_data_1.shape (50,) >>> dataset.dependent_variable_data.shape (50, 100)
>>> # Spectroscopy imaging (spatial map) >>> x_coords = np.linspace(0, 10, 64) >>> y_coords = np.linspace(0, 10, 64) >>> intensity_map = np.random.rand(64, 64) >>> image = TwoDimensionalDataset( ... independent_variable_data_1=x_coords, ... independent_variable_data_2=y_coords, ... dependent_variable_data=intensity_map, ... conditions={"wavelength": 532, "power": 10}, ... details={"units": "microns", "resolution": "0.156 um/pixel"} ... )
Notes
The dependent variable must have shape (len(var1), len(var2)). Arrays are stored internally as backend arrays and converted to NumPy when accessed.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
dependent_variable_dataGet dependent variable data as NumPy array.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
independent_variable_data_1Get first independent variable as NumPy array.
independent_variable_data_2Get second independent variable as NumPy array.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
- __init__(independent_variable_data_1, independent_variable_data_2, dependent_variable_data, conditions=None, details=None)[source]
Initialize two-dimensional dataset.
- Parameters:
independent_variable_data_1 (
array_like) – 1D array of first independent variable.independent_variable_data_2 (
array_like) – 1D array of second independent variable.dependent_variable_data (
array_like) – 2D array of dependent variable.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If dimension compatibility is violated.
- property independent_variable_data_1: ndarray
Get first independent variable as NumPy array.
- Returns:
1D NumPy array of first independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data_1 array([20., 21.63, 23.27, ..., 98.37, 100.])
- property independent_variable_data_2: ndarray
Get second independent variable as NumPy array.
- Returns:
1D NumPy array of second independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data_2 array([0., 36.36, 72.73, ..., 3527.27, 3563.64, 3600.])
- property dependent_variable_data: ndarray
Get dependent variable data as NumPy array.
- Returns:
2D NumPy array of dependent variable values with shape (len(var1), len(var2)).
- Return type:
np.ndarray
Examples
>>> dataset.dependent_variable_data.shape (50, 100) >>> dataset.dependent_variable_data[0, 0] # Value at first (temp, time) 0.234
Three-Dimensional Dataset
Three-dimensional dataset with three independent variables and 3D dependent data.
Used for volumetric data, 3D imaging, or data varying with three parameters.
- class piblin_jax.data.datasets.three_dimensional.ThreeDimensionalDataset(independent_variable_data_1, independent_variable_data_2, independent_variable_data_3, dependent_variable_data, conditions=None, details=None)[source]
Bases:
DatasetThree-dimensional dataset with three independent variables and a 3D dependent array.
This dataset type represents volumetric or 3D data: - 3D microscopy/imaging (confocal, CT, MRI) - Volumetric spectroscopy - Three-parameter experiments (e.g., temperature, pressure, time) - Computational fluid dynamics results - Molecular dynamics trajectories
- Parameters:
independent_variable_data_1 (
array_like) – 1D array of first independent variable (e.g., x-coordinate, temperature).independent_variable_data_2 (
array_like) – 1D array of second independent variable (e.g., y-coordinate, pressure).independent_variable_data_3 (
array_like) – 1D array of third independent variable (e.g., z-coordinate, time).dependent_variable_data (
array_like) – 3D array of dependent variable values with shape (len(var1), len(var2), len(var3)).conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- independent_variable_data_1
First independent variable as NumPy array.
- Type:
np.ndarray
- independent_variable_data_2
Second independent variable as NumPy array.
- Type:
np.ndarray
- independent_variable_data_3
Third independent variable as NumPy array.
- Type:
np.ndarray
- dependent_variable_data
3D dependent variable as NumPy array.
- Type:
np.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If dimension compatibility is violated.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import ThreeDimensionalDataset >>> # 3D confocal microscopy data >>> x = np.linspace(0, 100, 64) # microns >>> y = np.linspace(0, 100, 64) # microns >>> z = np.linspace(0, 50, 32) # microns (z-stack) >>> intensity = np.random.rand(64, 64, 32) >>> volume = ThreeDimensionalDataset( ... independent_variable_data_1=x, ... independent_variable_data_2=y, ... independent_variable_data_3=z, ... dependent_variable_data=intensity, ... conditions={"wavelength": 488, "objective": "40x"}, ... details={"voxel_size": "1.56 x 1.56 x 1.56 um"} ... ) >>> volume.dependent_variable_data.shape (64, 64, 32)
>>> # Three-parameter experiment (T, P, t) >>> temperatures = np.array([25, 50, 75, 100]) >>> pressures = np.array([1, 5, 10, 15, 20]) >>> times = np.array([0, 60, 120, 180]) >>> conversion = np.random.rand(4, 5, 4) >>> experiment = ThreeDimensionalDataset( ... independent_variable_data_1=temperatures, ... independent_variable_data_2=pressures, ... independent_variable_data_3=times, ... dependent_variable_data=conversion, ... conditions={"catalyst": "Pt/Al2O3", "reactant": "H2 + CO"}, ... details={"experiment": "Fischer-Tropsch"} ... )
Notes
The dependent variable must have shape (len(var1), len(var2), len(var3)). Arrays are stored internally as backend arrays and converted to NumPy when accessed.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
dependent_variable_dataGet dependent variable data as NumPy array.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
independent_variable_data_1Get first independent variable as NumPy array.
independent_variable_data_2Get second independent variable as NumPy array.
independent_variable_data_3Get third independent variable as NumPy array.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
- __init__(independent_variable_data_1, independent_variable_data_2, independent_variable_data_3, dependent_variable_data, conditions=None, details=None)[source]
Initialize three-dimensional dataset.
- Parameters:
independent_variable_data_1 (
array_like) – 1D array of first independent variable.independent_variable_data_2 (
array_like) – 1D array of second independent variable.independent_variable_data_3 (
array_like) – 1D array of third independent variable.dependent_variable_data (
array_like) – 3D array of dependent variable.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If dimension compatibility is violated.
- property independent_variable_data_1: ndarray
Get first independent variable as NumPy array.
- Returns:
1D NumPy array of first independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data_1 array([0., 1.59, 3.17, ..., 98.41, 100.])
- property independent_variable_data_2: ndarray
Get second independent variable as NumPy array.
- Returns:
1D NumPy array of second independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data_2 array([0., 1.59, 3.17, ..., 98.41, 100.])
- property independent_variable_data_3: ndarray
Get third independent variable as NumPy array.
- Returns:
1D NumPy array of third independent variable values.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data_3 array([0., 1.61, 3.23, ..., 48.39, 50.])
- property dependent_variable_data: ndarray
Get dependent variable data as NumPy array.
- Returns:
3D NumPy array of dependent variable values with shape (len(var1), len(var2), len(var3)).
- Return type:
np.ndarray
Examples
>>> dataset.dependent_variable_data.shape (64, 64, 32) >>> dataset.dependent_variable_data[0, 0, 0] # Value at first point 0.456
Composite Dataset
Composite one-dimensional dataset with multiple dependent variables.
Used for multi-channel instrument data where multiple signals share the same independent variable (e.g., time, wavelength).
- class piblin_jax.data.datasets.composite.OneDimensionalCompositeDataset(independent_variable_data, dependent_variable_data_list, conditions=None, details=None)[source]
Bases:
DatasetComposite 1D dataset with shared independent variable and multiple dependents.
This dataset type represents multi-channel or multi-detector data where multiple signals share the same independent variable: - Multi-detector chromatography (UV, fluorescence, conductivity) - Multi-channel spectroscopy - Multi-sensor time series - Parallel measurements with shared axis
- Parameters:
independent_variable_data (
array_like) – 1D array of independent variable (time, wavelength, etc.) shared by all channels.dependent_variable_data_list (
listofarray_like) – List of 1D arrays, each representing a different channel/detector. All must have the same length as independent_variable_data.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- independent_variable_data
Shared independent variable as NumPy array.
- Type:
np.ndarray
- dependent_variable_data_list
List of dependent variables as NumPy arrays.
- Type:
listofnp.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If dependent_variable_data_list is empty, or if any channel has different length than independent_variable_data.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalCompositeDataset >>> # Multi-detector HPLC data >>> time = np.linspace(0, 20, 2000) # minutes >>> uv_254 = np.sin(time) + 0.1 * np.random.randn(2000) >>> uv_280 = np.cos(time) + 0.1 * np.random.randn(2000) >>> fluorescence = np.sin(2 * time) + 0.05 * np.random.randn(2000) >>> hplc = OneDimensionalCompositeDataset( ... independent_variable_data=time, ... dependent_variable_data_list=[uv_254, uv_280, fluorescence], ... conditions={"mobile_phase": "ACN/H2O 60:40", "flow_rate": 1.0}, ... details={ ... "channels": ["UV 254nm", "UV 280nm", "Fluorescence"], ... "instrument": "HPLC-1" ... } ... ) >>> hplc.independent_variable_data.shape (2000,) >>> len(hplc.dependent_variable_data_list) 3 >>> hplc.dependent_variable_data_list[0].shape (2000,)
>>> # Multi-channel oscilloscope data >>> t = np.linspace(0, 1, 10000) >>> ch1 = np.sin(2 * np.pi * 5 * t) >>> ch2 = np.sin(2 * np.pi * 10 * t) >>> ch3 = np.sin(2 * np.pi * 15 * t) >>> ch4 = np.sin(2 * np.pi * 20 * t) >>> scope_data = OneDimensionalCompositeDataset( ... independent_variable_data=t, ... dependent_variable_data_list=[ch1, ch2, ch3, ch4], ... conditions={"sampling_rate": 10000}, ... details={"instrument": "oscilloscope", "channels": 4} ... )
Notes
This dataset type is useful when multiple measurements are made simultaneously along the same independent axis. Each channel is stored as a separate NumPy array in the list, allowing different processing or analysis on each channel while maintaining their shared relationship through the common independent variable.
The internal storage uses backend arrays (JAX when available) and converts to NumPy at the property boundaries.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
dependent_variable_data_listGet list of dependent variables as NumPy arrays.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
independent_variable_dataGet shared independent variable as NumPy array.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
- __init__(independent_variable_data, dependent_variable_data_list, conditions=None, details=None)[source]
Initialize composite one-dimensional dataset.
- Parameters:
independent_variable_data (
array_like) – 1D array of shared independent variable.dependent_variable_data_list (
listofarray_like) – List of 1D arrays for each channel.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If list is empty or if any channel length doesn’t match independent variable.
- property independent_variable_data: ndarray
Get shared independent variable as NumPy array.
- Returns:
1D NumPy array of independent variable shared by all channels.
- Return type:
np.ndarray
Examples
>>> dataset.independent_variable_data array([0., 0.01, 0.02, ..., 19.98, 19.99, 20.])
- property dependent_variable_data_list: list[ndarray]
Get list of dependent variables as NumPy arrays.
- Returns:
List of 1D NumPy arrays, one for each channel/detector.
- Return type:
listofnp.ndarray
Examples
>>> len(dataset.dependent_variable_data_list) 3 >>> dataset.dependent_variable_data_list[0] # First channel array([0.123, 0.145, ..., 0.234]) >>> dataset.dependent_variable_data_list[1] # Second channel array([0.456, 0.478, ..., 0.567]) >>> # Process each channel >>> for i, channel in enumerate(dataset.dependent_variable_data_list): ... print(f"Channel {i}: max = {channel.max():.3f}") Channel 0: max = 1.234 Channel 1: max = 1.567 Channel 2: max = 0.987
Distribution Dataset
Distribution dataset for continuous probability density functions.
Used for molecular weight distributions, continuous PDFs, and other distribution data where the probability density is a continuous function.
- class piblin_jax.data.datasets.distribution.Distribution(variable_data, probability_density, conditions=None, details=None)[source]
Bases:
DatasetDistribution dataset with variable data and probability density.
This dataset type represents continuous probability density functions: - Molecular weight distributions (GPC/SEC) - Particle size distributions (continuous) - Statistical distributions - Probability density functions - Any continuous distribution data
- Parameters:
variable_data (
array_like) – 1D array of the variable (e.g., molecular weight, particle size).probability_density (
array_like) – 1D array of probability density values corresponding to variable_data. Should have the same length as variable_data.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- variable_data
Variable data as NumPy array.
- Type:
np.ndarray
- probability_density
Probability density as NumPy array.
- Type:
np.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If variable_data and probability_density have different shapes.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import Distribution >>> # Molecular weight distribution from GPC >>> molecular_weight = np.linspace(1000, 100000, 500) >>> # Gaussian-like distribution centered at 50000 >>> pdf = np.exp(-((molecular_weight - 50000) ** 2) / (2 * 10000 ** 2)) >>> # Normalize so integral equals 1 >>> pdf = pdf / np.trapz(pdf, molecular_weight) >>> mwd = Distribution( ... variable_data=molecular_weight, ... probability_density=pdf, ... conditions={"polymer": "polystyrene", "solvent": "THF"}, ... details={"technique": "GPC", "standard": "PS"} ... ) >>> mwd.variable_data.shape (500,) >>> mwd.probability_density.shape (500,)
>>> # Particle size distribution >>> diameter = np.linspace(1, 1000, 1000) # nm >>> psd = np.exp(-((np.log(diameter) - np.log(100)) ** 2) / (2 * 0.5 ** 2)) >>> psd = psd / np.trapz(psd, diameter) >>> particle_dist = Distribution( ... variable_data=diameter, ... probability_density=psd, ... conditions={"sample": "nanoparticles_Au"}, ... details={"units": "nm", "technique": "DLS"} ... )
>>> # Custom probability distribution >>> x = np.linspace(-5, 5, 1000) >>> pdf = np.exp(-x**2 / 2) / np.sqrt(2 * np.pi) >>> normal_dist = Distribution( ... variable_data=x, ... probability_density=pdf, ... details={"distribution": "standard normal"} ... )
Notes
Unlike Histogram which represents discrete bins, Distribution represents a continuous probability density function. The probability density values are typically normalized such that the integral over the variable range equals 1, but this is not enforced by the class.
The distinction between Distribution and OneDimensionalDataset is primarily semantic: Distribution emphasizes that the dependent variable represents a probability density, while OneDimensionalDataset is more general.
- Attributes:
conditionsGet experimental conditions.
credible_intervalsGet cached credible intervals.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
probability_densityGet probability density as NumPy array.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
variable_dataGet variable data as NumPy array.
Methods
copy()Create a deep copy of this dataset.
- __init__(variable_data, probability_density, conditions=None, details=None)[source]
Initialize distribution dataset.
- Parameters:
variable_data (
array_like) – 1D array of variable values.probability_density (
array_like) – 1D array of probability density values.conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If arrays have different shapes.
- property variable_data: ndarray
Get variable data as NumPy array.
- Returns:
1D NumPy array of variable values (e.g., molecular weight, particle size, x-values).
- Return type:
np.ndarray
Examples
>>> dist.variable_data array([1000., 1198., 1396., ..., 99604., 99802., 100000.])
- property probability_density: ndarray
Get probability density as NumPy array.
- Returns:
1D NumPy array of probability density values.
- Return type:
np.ndarray
Examples
>>> dist.probability_density array([0.000001, 0.000002, ..., 0.000003, 0.000001]) >>> # Check normalization (should be close to 1) >>> np.trapz(dist.probability_density, dist.variable_data) 1.0000234
Histogram Dataset
Histogram dataset for binned data with variable-width bins.
Used for particle size distributions, histograms, and other binned data where bin widths may vary.
- class piblin_jax.data.datasets.histogram.Histogram(bin_edges, counts, conditions=None, details=None)[source]
Bases:
DatasetHistogram dataset with bin edges and counts.
This dataset type represents binned data with potentially variable-width bins: - Particle size distributions - Molecular weight distributions (binned) - Intensity histograms - Frequency distributions - Any data organized into discrete bins
- Parameters:
bin_edges (
array_like) – 1D array of bin edges. For n bins, this array has n+1 elements. Bins are defined as [bin_edges[i], bin_edges[i+1]).counts (
array_like) – 1D array of counts or frequencies in each bin. Must have length n (one less than bin_edges).conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- bin_edges
Bin edges as NumPy array.
- Type:
np.ndarray
- counts
Counts per bin as NumPy array.
- Type:
np.ndarray
- conditions
Experimental conditions.
- Type:
dict[str,Any]
- details
Additional metadata.
- Type:
dict[str,Any]
- Raises:
ValueError – If counts length is not compatible with bin_edges (must be len(bin_edges) - 1).
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import Histogram >>> # Particle size distribution with variable-width bins >>> bin_edges = np.array([0, 1, 3, 6, 10, 20]) # 5 bins >>> counts = np.array([12, 45, 67, 34, 8]) # 5 counts >>> psd = Histogram( ... bin_edges=bin_edges, ... counts=counts, ... conditions={"sample": "nanoparticles_batch_42"}, ... details={"units": "nm", "technique": "DLS"} ... ) >>> psd.bin_edges array([0, 1, 3, 6, 10, 20]) >>> psd.counts array([12, 45, 67, 34, 8])
>>> # Intensity histogram >>> # Equal-width bins for pixel intensities >>> bins = np.linspace(0, 255, 256) # 255 bins >>> hist_counts = np.random.poisson(100, 255) >>> intensity_hist = Histogram( ... bin_edges=bins, ... counts=hist_counts, ... conditions={"image": "sample_001.tif"}, ... details={"bit_depth": 8} ... )
Notes
Unlike Distribution, Histogram represents discrete bins rather than a continuous probability density. The bin_edges array has one more element than the counts array. For variable-width bins, the bin width can be computed as np.diff(bin_edges).
- Attributes:
bin_edgesGet bin edges as NumPy array.
conditionsGet experimental conditions.
countsGet bin counts as NumPy array.
credible_intervalsGet cached credible intervals.
detailsGet additional dataset details.
has_uncertaintyCheck if dataset has uncertainty information.
uncertainty_samplesGet uncertainty samples (if keep_samples=True was used).
Methods
copy()Create a deep copy of this dataset.
- __init__(bin_edges, counts, conditions=None, details=None)[source]
Initialize histogram dataset.
- Parameters:
bin_edges (
array_like) – 1D array of bin edges (n+1 elements for n bins).counts (
array_like) – 1D array of counts (n elements).conditions (
dict[str,Any] | None, optional) – Experimental conditions.details (
dict[str,Any] | None, optional) – Additional metadata.
- Raises:
ValueError – If counts length is not compatible with bin_edges.
- property bin_edges: ndarray
Get bin edges as NumPy array.
- Returns:
1D NumPy array of bin edges. For n bins, has n+1 elements.
- Return type:
np.ndarray
Examples
>>> hist.bin_edges array([0, 1, 3, 6, 10, 20]) >>> # Bin widths can be computed as: >>> np.diff(hist.bin_edges) array([1, 2, 3, 4, 10])
- property counts: ndarray
Get bin counts as NumPy array.
- Returns:
1D NumPy array of counts or frequencies in each bin.
- Return type:
np.ndarray
Examples
>>> hist.counts array([12, 45, 67, 34, 8]) >>> # Total count >>> hist.counts.sum() 166
Collections
Measurement
Measurement class for piblin-jax.
Container for multiple Dataset objects representing a single measurement event.
- class piblin_jax.data.collections.measurement.Measurement(datasets, conditions=None, details=None)[source]
Bases:
objectContainer for multiple Dataset objects from a single measurement.
A Measurement represents a single experimental measurement event that may produce multiple datasets (e.g., multiple channels, multiple observables). The collection is immutable for JAX compatibility.
- Parameters:
datasets (
list[Dataset]) – List of Dataset objects from this measurement.conditions (
dict[str,Any] | None, optional) – Experimental conditions specific to this measurement (e.g., timestamp, replicate number, environmental conditions).details (
dict[str,Any] | None, optional) – Additional context for this measurement (e.g., quality flags, operator notes, instrument state).
- datasets
Immutable tuple of datasets from this measurement.
- Type:
tuple[Dataset,]
- conditions
Experimental conditions for this measurement.
- Type:
dict[str,Any]
- details
Additional metadata for this measurement.
- Type:
dict[str,Any]
Notes
The datasets are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual datasets can be accessed by indexing or iteration.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import Measurement >>> >>> # Create datasets for multiple channels >>> x = np.linspace(0, 10, 100) >>> y_ch1 = np.sin(x) >>> y_ch2 = np.cos(x) >>> >>> ds1 = OneDimensionalDataset(x, y_ch1, conditions={"channel": "A"}) >>> ds2 = OneDimensionalDataset(x, y_ch2, conditions={"channel": "B"}) >>> >>> # Create measurement with both channels >>> measurement = Measurement( ... datasets=[ds1, ds2], ... conditions={"temperature": 25.0, "replicate": 1}, ... details={"timestamp": "2025-10-18 10:00:00"} ... ) >>> >>> # Access datasets >>> len(measurement) 2 >>> first_dataset = measurement[0] >>> for ds in measurement: ... print(ds.conditions["channel"]) A B
- Attributes:
conditionsGet experimental conditions for this measurement.
datasetsGet all datasets in this measurement.
detailsGet additional details for this measurement.
- __init__(datasets, conditions=None, details=None)[source]
Initialize Measurement with datasets and metadata.
- Parameters:
datasets (
list[Dataset]) – List of Dataset objects from this measurement.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this measurement.details (
dict[str,Any] | None, optional) – Additional context for this measurement.
- property datasets: tuple[Dataset, ...]
Get all datasets in this measurement.
- No-index:
- Returns:
Immutable tuple of Dataset objects.
- Return type:
tuple[Dataset,]
Examples
>>> measurement.datasets (<OneDimensionalDataset at 0x...>, <OneDimensionalDataset at 0x...>)
- property conditions: dict[str, Any]
Get experimental conditions for this measurement.
- No-index:
- Returns:
Dictionary of experimental conditions (timestamp, replicate, etc.).
- Return type:
dict[str,Any]
Examples
>>> measurement.conditions {'temperature': 25.0, 'replicate': 1, 'timestamp': '10:00:00'}
- property details: dict[str, Any]
Get additional details for this measurement.
- No-index:
- Returns:
Dictionary of additional context (quality flags, notes, etc.).
- Return type:
dict[str,Any]
Examples
>>> measurement.details {'quality': 'good', 'operator': 'John Doe'}
- __len__()[source]
Get number of datasets in this measurement.
- Returns:
Number of datasets.
- Return type:
Examples
>>> len(measurement) 2
- __iter__()[source]
Iterate over datasets in this measurement.
- Yields:
Dataset– Each dataset in order.
Examples
>>> for dataset in measurement: ... print(type(dataset).__name__) OneDimensionalDataset OneDimensionalDataset
- __getitem__(index)[source]
Get dataset by index.
- Parameters:
- Returns:
Dataset at the given index, or tuple of datasets for slice.
- Return type:
Datasetortuple[Dataset,]
Examples
>>> measurement[0] <OneDimensionalDataset at 0x...> >>> measurement[0:2] (<OneDimensionalDataset at 0x...>, <OneDimensionalDataset at 0x...>)
MeasurementSet
MeasurementSet base class for piblin-jax.
Container for multiple Measurement objects representing a series of related measurements.
- class piblin_jax.data.collections.measurement_set.MeasurementSet(measurements, conditions=None, details=None)[source]
Bases:
objectBase class for collections of Measurement objects.
A MeasurementSet represents a series of related measurements, such as: - Time series measurements - Replicate measurements - Parameter sweep measurements - Multi-sample measurements
This is the base class. Specialized variants include: - ConsistentMeasurementSet: All measurements have same structure - TidyMeasurementSet: All measurements share comparable conditions - TabularMeasurementSet: Measurements arranged in tabular format
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects in this set.conditions (
dict[str,Any] | None, optional) – Experimental conditions for the entire measurement series (e.g., sample, experimental setup, date).details (
dict[str,Any] | None, optional) – Additional context for this measurement series (e.g., series description, experimental notes).
- measurements
Immutable tuple of measurements in this set.
- Type:
tuple[Measurement,]
- conditions
Experimental conditions for this measurement series.
- Type:
dict[str,Any]
- details
Additional metadata for this measurement series.
- Type:
dict[str,Any]
Notes
The measurements are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual measurements can be accessed by indexing or iteration.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import Measurement, MeasurementSet >>> >>> # Create replicate measurements >>> x = np.linspace(0, 10, 100) >>> measurements = [] >>> for i in range(3): ... y = np.sin(x) + np.random.normal(0, 0.1, len(x)) ... ds = OneDimensionalDataset(x, y) ... m = Measurement( ... datasets=[ds], ... conditions={"replicate": i+1} ... ) ... measurements.append(m) >>> >>> # Create measurement set >>> ms = MeasurementSet( ... measurements=measurements, ... conditions={"sample": "S1", "date": "2025-10-18"}, ... details={"notes": "Replicate measurements with noise"} ... ) >>> >>> # Access measurements >>> len(ms) 3 >>> first_measurement = ms[0] >>> for m in ms: ... print(m.conditions["replicate"]) 1 2 3
- Attributes:
conditionsGet experimental conditions for this measurement series.
detailsGet additional details for this measurement series.
measurementsGet all measurements in this set.
- __init__(measurements, conditions=None, details=None)[source]
Initialize MeasurementSet with measurements and metadata.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects in this set.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this measurement series.details (
dict[str,Any] | None, optional) – Additional context for this measurement series.
- property measurements: tuple[Measurement, ...]
Get all measurements in this set.
- No-index:
- Returns:
Immutable tuple of Measurement objects.
- Return type:
tuple[Measurement,]
Examples
>>> ms.measurements (<Measurement at 0x...>, <Measurement at 0x...>, <Measurement at 0x...>)
- property conditions: dict[str, Any]
Get experimental conditions for this measurement series.
- No-index:
- Returns:
Dictionary of experimental conditions (sample, date, setup, etc.).
- Return type:
dict[str,Any]
Examples
>>> ms.conditions {'sample': 'S1', 'date': '2025-10-18', 'instrument': 'Spec-X'}
- property details: dict[str, Any]
Get additional details for this measurement series.
- No-index:
- Returns:
Dictionary of additional context (notes, quality, etc.).
- Return type:
dict[str,Any]
Examples
>>> ms.details {'notes': 'Time series', 'quality': 'good'}
- __len__()[source]
Get number of measurements in this set.
- Returns:
Number of measurements.
- Return type:
Examples
>>> len(ms) 3
- __iter__()[source]
Iterate over measurements in this set.
- Yields:
Measurement– Each measurement in order.
Examples
>>> for measurement in ms: ... print(len(measurement)) 1 1 1
- __getitem__(index)[source]
Get measurement by index.
- Parameters:
index (
intorslice) – Index or slice to access measurements.- Returns:
Measurement at the given index, or tuple of measurements for slice.
- Return type:
Measurementortuple[Measurement,]
Examples
>>> ms[0] <Measurement at 0x...> >>> ms[0:2] (<Measurement at 0x...>, <Measurement at 0x...>)
ConsistentMeasurementSet
ConsistentMeasurementSet class for piblin-jax.
MeasurementSet variant where all measurements have the same dataset structure.
- class piblin_jax.data.collections.consistent_measurement_set.ConsistentMeasurementSet(measurements, conditions=None, details=None)[source]
Bases:
MeasurementSetMeasurementSet where all measurements have the same dataset structure.
This specialized variant enforces that all measurements contain datasets of the same types in the same order. This is useful for: - Replicate measurements (same protocol, multiple runs) - Time series measurements (same observables at different times) - Consistent multi-channel measurements
The structural consistency enables array-based operations and easier data aggregation.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects. All must have the same structure.conditions (
dict[str,Any] | None, optional) – Experimental conditions for the measurement series.details (
dict[str,Any] | None, optional) – Additional context for the measurement series.
- Raises:
ValueError – If measurements do not all have the same dataset structure.
Notes
Structure is defined as the sequence of dataset types. For example:
[OneDimensionalDataset, OneDimensionalDataset] is consistent with itself
[OneDimensionalDataset] is NOT consistent with [ZeroDimensionalDataset]
[OneDimensionalDataset, ZeroDimensionalDataset] is NOT consistent with [ZeroDimensionalDataset, OneDimensionalDataset] (order matters)
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import Measurement, ConsistentMeasurementSet >>> >>> # Create replicate measurements with consistent structure >>> x = np.linspace(0, 10, 100) >>> measurements = [] >>> for i in range(5): ... y = np.sin(x) + np.random.normal(0, 0.1, len(x)) ... ds = OneDimensionalDataset(x, y) ... m = Measurement([ds], conditions={"replicate": i+1}) ... measurements.append(m) >>> >>> # Create consistent measurement set >>> cms = ConsistentMeasurementSet( ... measurements=measurements, ... conditions={"sample": "S1", "experiment": "replicates"} ... ) >>> >>> len(cms) 5 >>> >>> # All measurements have the same structure >>> for m in cms: ... print(len(m.datasets), type(m.datasets[0]).__name__) 1 OneDimensionalDataset 1 OneDimensionalDataset 1 OneDimensionalDataset 1 OneDimensionalDataset 1 OneDimensionalDataset >>> >>> # This will raise ValueError - inconsistent structures >>> from piblin_jax.data.datasets import ZeroDimensionalDataset >>> m1 = Measurement([OneDimensionalDataset(np.array([1, 2]), np.array([3, 4]))]) >>> m2 = Measurement([ZeroDimensionalDataset(5.0)]) >>> ConsistentMeasurementSet([m1, m2]) ValueError: All measurements must have same structure
- Attributes:
conditionsGet experimental conditions for this measurement series.
detailsGet additional details for this measurement series.
measurementsGet all measurements in this set.
- __init__(measurements, conditions=None, details=None)[source]
Initialize ConsistentMeasurementSet with structure validation.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects with consistent structure.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this measurement series.details (
dict[str,Any] | None, optional) – Additional context for this measurement series.
- Raises:
ValueError – If measurements do not all have the same structure.
TabularMeasurementSet
TabularMeasurementSet class for piblin-jax.
MeasurementSet variant with tabular access patterns (rows and columns).
- class piblin_jax.data.collections.tabular_measurement_set.TabularMeasurementSet(measurements, row_labels=None, col_labels=None, conditions=None, details=None)[source]
Bases:
MeasurementSetMeasurementSet with measurements arranged in tabular format.
This specialized variant organizes measurements in a logical table structure with row and column labels. This is useful for: - Experimental design matrices (e.g., multi-factor designs) - Microplate/well plate layouts - Spatial arrangements of measurements - Grid-based sampling patterns
The tabular structure enables intuitive access patterns and natural visualization as tables or heatmaps.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects. The order corresponds to row-major ordering in the table (row1-col1, row1-col2, …, row2-col1, …).row_labels (
list[str] | None, optional) – Labels for table rows. If provided, must satisfy: len(row_labels) * len(col_labels) == len(measurements)col_labels (
list[str] | None, optional) – Labels for table columns. If provided, must satisfy: len(row_labels) * len(col_labels) == len(measurements)conditions (
dict[str,Any] | None, optional) – Experimental conditions for the measurement series.details (
dict[str,Any] | None, optional) – Additional context for the measurement series.
- row_labels
Labels for table rows.
- Type:
list[str] | None
- col_labels
Labels for table columns.
- Type:
list[str] | None
Notes
Measurements are stored in row-major order. For a 2x3 table: - measurements[0] = row 0, col 0 - measurements[1] = row 0, col 1 - measurements[2] = row 0, col 2 - measurements[3] = row 1, col 0 - measurements[4] = row 1, col 1 - measurements[5] = row 1, col 2
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import Measurement, TabularMeasurementSet >>> >>> # Create a 2x3 grid of measurements >>> x = np.linspace(0, 10, 50) >>> measurements = [] >>> >>> for i in range(2): # rows ... for j in range(3): # columns ... y = np.sin(x * (i + 1)) * (j + 1) ... ds = OneDimensionalDataset(x, y) ... m = Measurement( ... [ds], ... conditions={"row": i, "col": j} ... ) ... measurements.append(m) >>> >>> # Create tabular measurement set >>> tms = TabularMeasurementSet( ... measurements=measurements, ... row_labels=["row_A", "row_B"], ... col_labels=["col_1", "col_2", "col_3"], ... conditions={"plate": "plate_001"}, ... details={"date": "2025-10-18"} ... ) >>> >>> len(tms) 6 >>> tms.row_labels ['row_A', 'row_B'] >>> tms.col_labels ['col_1', 'col_2', 'col_3'] >>> >>> # Access measurement at row 1, col 2 >>> m = tms.get_measurement(1, 2) >>> m.conditions["row"] 1 >>> m.conditions["col"] 2
- Attributes:
col_labelsGet column labels for the table.
conditionsGet experimental conditions for this measurement series.
detailsGet additional details for this measurement series.
measurementsGet all measurements in this set.
row_labelsGet row labels for the table.
shapeGet the shape of the table (rows, columns).
Methods
get_column(col)Get all measurements in a specified column.
get_measurement(row, col)Get measurement at specified row and column indices.
get_row(row)Get all measurements in a specified row.
- __init__(measurements, row_labels=None, col_labels=None, conditions=None, details=None)[source]
Initialize TabularMeasurementSet with optional row/column labels.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects in row-major order.row_labels (
list[str] | None, optional) – Labels for table rows.col_labels (
list[str] | None, optional) – Labels for table columns.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this measurement series.details (
dict[str,Any] | None, optional) – Additional context for this measurement series.
- Raises:
ValueError – If row_labels and col_labels are provided but their product doesn’t match the number of measurements.
- property row_labels: list[str] | None
Get row labels for the table.
- Returns:
List of row labels, or None if not provided.
- Return type:
list[str] | None
Examples
>>> tms.row_labels ['row_A', 'row_B', 'row_C']
- property col_labels: list[str] | None
Get column labels for the table.
- Returns:
List of column labels, or None if not provided.
- Return type:
list[str] | None
Examples
>>> tms.col_labels ['col_1', 'col_2', 'col_3', 'col_4']
- property shape: tuple[int, int] | None
Get the shape of the table (rows, columns).
- Returns:
(n_rows, n_cols) if labels are provided, None otherwise.
- Return type:
tuple[int,int] | None
Examples
>>> tms.shape (2, 3)
- get_measurement(row, col)[source]
Get measurement at specified row and column indices.
Uses row-major ordering: index = row * n_cols + col
- Parameters:
- Returns:
Measurement at the specified position.
- Return type:
Measurement- Raises:
ValueError – If row_labels and col_labels were not provided.
IndexError – If row or col indices are out of bounds.
Examples
>>> m = tms.get_measurement(1, 2) >>> m.conditions["row"] 1 >>> m.conditions["col"] 2
- get_row(row)[source]
Get all measurements in a specified row.
- Parameters:
row (
int) – Row index (0-based).- Returns:
List of measurements in the row.
- Return type:
list[Measurement]- Raises:
ValueError – If row_labels and col_labels were not provided.
IndexError – If row index is out of bounds.
Examples
>>> row_measurements = tms.get_row(0) >>> len(row_measurements) 3 >>> [m.conditions["col"] for m in row_measurements] [0, 1, 2]
- get_column(col)[source]
Get all measurements in a specified column.
- Parameters:
col (
int) – Column index (0-based).- Returns:
List of measurements in the column.
- Return type:
list[Measurement]- Raises:
ValueError – If row_labels and col_labels were not provided.
IndexError – If col index is out of bounds.
Examples
>>> col_measurements = tms.get_column(1) >>> len(col_measurements) 2 >>> [m.conditions["row"] for m in col_measurements] [0, 1]
TidyMeasurementSet
TidyMeasurementSet class for piblin-jax.
MeasurementSet variant where measurements share comparable experimental conditions.
- class piblin_jax.data.collections.tidy_measurement_set.TidyMeasurementSet(measurements, conditions=None, details=None)[source]
Bases:
MeasurementSetMeasurementSet where measurements share comparable experimental conditions.
This specialized variant is designed for measurements that can be compared across shared experimental conditions, following “tidy data” principles. This is useful for: - Parameter sweeps (varying temperature, pressure, etc.) - Multi-factor experiments (factorial designs) - Grouped experimental conditions - Long-form data representation
The shared condition structure enables statistical analysis, grouping, and faceted visualization.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects with comparable conditions.conditions (
dict[str,Any] | None, optional) – Experimental conditions for the measurement series.details (
dict[str,Any] | None, optional) – Additional context for the measurement series.
Notes
“Tidy data” refers to a data organization principle where: - Each measurement is an observation - Each condition is a variable - Each unique condition value identifies a group
This enables standard statistical and data manipulation tools to work effectively with the measurement set.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import Measurement, TidyMeasurementSet >>> >>> # Create measurements with varying conditions >>> x = np.linspace(0, 10, 100) >>> measurements = [] >>> >>> for temp in [20, 25, 30]: ... for sample in ['A', 'B']: ... y = np.sin(x) * temp / 25 ... ds = OneDimensionalDataset(x, y) ... m = Measurement( ... [ds], ... conditions={"temperature": temp, "sample": sample} ... ) ... measurements.append(m) >>> >>> # Create tidy measurement set >>> tms = TidyMeasurementSet( ... measurements=measurements, ... conditions={"experiment": "temperature_sweep"}, ... details={"date": "2025-10-18"} ... ) >>> >>> len(tms) 6 >>> >>> # Get unique condition values >>> unique = tms.get_unique_conditions() >>> sorted(unique["temperature"]) [20, 25, 30] >>> sorted(unique["sample"]) ['A', 'B']
- Attributes:
conditionsGet experimental conditions for this measurement series.
detailsGet additional details for this measurement series.
measurementsGet all measurements in this set.
Methods
filter_by_conditions(**condition_filters)Create a new TidyMeasurementSet with measurements matching conditions.
get_unique_conditions()Get all unique values for each condition across measurements.
- __init__(measurements, conditions=None, details=None)[source]
Initialize TidyMeasurementSet.
- Parameters:
measurements (
list[Measurement]) – List of Measurement objects with comparable conditions.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this measurement series.details (
dict[str,Any] | None, optional) – Additional context for this measurement series.
- get_unique_conditions()[source]
Get all unique values for each condition across measurements.
This method analyzes all measurements and returns the set of unique values for each condition key. This is useful for: - Understanding the experimental design - Identifying factor levels - Grouping measurements - Creating faceted plots
- Returns:
Dictionary mapping condition names to sets of unique values.
- Return type:
dict[str,set]
Examples
>>> # Continuing from class docstring example >>> unique = tms.get_unique_conditions() >>> unique["temperature"] {20, 25, 30} >>> unique["sample"] {'A', 'B'} >>> >>> # Empty measurement set >>> tms_empty = TidyMeasurementSet([]) >>> tms_empty.get_unique_conditions() {} >>> >>> # Measurements with different condition keys >>> m1 = Measurement([OneDimensionalDataset(np.array([1]), np.array([2]))], ... conditions={"temp": 25, "pressure": 1.0}) >>> m2 = Measurement([OneDimensionalDataset(np.array([3]), np.array([4]))], ... conditions={"temp": 30, "sample": "A"}) >>> tms = TidyMeasurementSet([m1, m2]) >>> unique = tms.get_unique_conditions() >>> sorted(unique.keys()) ['pressure', 'sample', 'temp'] >>> unique["temp"] {25, 30}
- filter_by_conditions(**condition_filters)[source]
Create a new TidyMeasurementSet with measurements matching conditions.
- Parameters:
**condition_filters (Any) – Keyword arguments specifying condition values to match. Only measurements where ALL specified conditions match the given values will be included.
- Returns:
New TidyMeasurementSet containing only matching measurements.
- Return type:
TidyMeasurementSet
Examples
>>> # Filter by single condition >>> tms_25 = tms.filter_by_conditions(temperature=25) >>> len(tms_25) 2 >>> all(m.conditions["temperature"] == 25 for m in tms_25) True >>> >>> # Filter by multiple conditions >>> tms_25_A = tms.filter_by_conditions(temperature=25, sample="A") >>> len(tms_25_A) 1 >>> m = tms_25_A[0] >>> m.conditions["temperature"] 25 >>> m.conditions["sample"] 'A'
Experiment
Experiment class for piblin-jax.
Container for multiple MeasurementSet objects representing a single experiment.
- class piblin_jax.data.collections.experiment.Experiment(measurement_sets, conditions=None, details=None)[source]
Bases:
objectContainer for MeasurementSet objects from a single experiment.
An Experiment represents a complete experimental run or sample, which may contain multiple series of measurements (MeasurementSets). This is useful for: - Single sample with multiple measurement types - Complete experimental protocol with multiple phases - Single experimental run with multiple observables - One sample measured under different conditions
- Parameters:
measurement_sets (
list[MeasurementSet]) – List of MeasurementSet objects from this experiment.conditions (
dict[str,Any] | None, optional) – Experimental conditions for the entire experiment (e.g., sample ID, experimental date, operator).details (
dict[str,Any] | None, optional) – Additional context for this experiment (e.g., sample description, experimental notes, quality flags).
- measurement_sets
Immutable tuple of measurement sets in this experiment.
- Type:
tuple[MeasurementSet,]
- conditions
Experimental conditions for this experiment.
- Type:
dict[str,Any]
- details
Additional metadata for this experiment.
- Type:
dict[str,Any]
Notes
The measurement sets are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual measurement sets can be accessed by indexing or iteration.
Hierarchy level: ExperimentSet → Experiment → MeasurementSet → Measurement → Dataset
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import ( ... Measurement, MeasurementSet, Experiment ... ) >>> >>> # Create first measurement set (absorption spectra) >>> x_abs = np.linspace(400, 800, 200) >>> measurements_abs = [] >>> for i in range(3): ... y = np.exp(-(x_abs - 550)**2 / 1000) * (1 + 0.1 * i) ... ds = OneDimensionalDataset(x_abs, y) ... m = Measurement([ds], conditions={"replicate": i+1}) ... measurements_abs.append(m) >>> ms_abs = MeasurementSet( ... measurements_abs, ... conditions={"measurement_type": "absorption"} ... ) >>> >>> # Create second measurement set (fluorescence spectra) >>> x_fl = np.linspace(500, 900, 200) >>> measurements_fl = [] >>> for i in range(3): ... y = np.exp(-(x_fl - 650)**2 / 1500) * (0.8 + 0.1 * i) ... ds = OneDimensionalDataset(x_fl, y) ... m = Measurement([ds], conditions={"replicate": i+1}) ... measurements_fl.append(m) >>> ms_fl = MeasurementSet( ... measurements_fl, ... conditions={"measurement_type": "fluorescence"} ... ) >>> >>> # Create experiment combining both measurement types >>> exp = Experiment( ... measurement_sets=[ms_abs, ms_fl], ... conditions={"sample": "S001", "date": "2025-10-18"}, ... details={"operator": "Jane Doe", "instrument": "Spec-X"} ... ) >>> >>> len(exp) 2 >>> exp.conditions["sample"] 'S001' >>> exp[0].conditions["measurement_type"] 'absorption' >>> exp[1].conditions["measurement_type"] 'fluorescence'
- Attributes:
conditionsGet experimental conditions for this experiment.
detailsGet additional details for this experiment.
measurement_setsGet all measurement sets in this experiment.
- __init__(measurement_sets, conditions=None, details=None)[source]
Initialize Experiment with measurement sets and metadata.
- Parameters:
measurement_sets (
list[MeasurementSet]) – List of MeasurementSet objects from this experiment.conditions (
dict[str,Any] | None, optional) – Experimental conditions for this experiment.details (
dict[str,Any] | None, optional) – Additional context for this experiment.
- property measurement_sets: tuple[MeasurementSet, ...]
Get all measurement sets in this experiment.
- No-index:
- Returns:
Immutable tuple of MeasurementSet objects.
- Return type:
tuple[MeasurementSet,]
Examples
>>> exp.measurement_sets (<MeasurementSet at 0x...>, <MeasurementSet at 0x...>)
- property conditions: dict[str, Any]
Get experimental conditions for this experiment.
- No-index:
- Returns:
Dictionary of experimental conditions (sample, date, operator, etc.).
- Return type:
dict[str,Any]
Examples
>>> exp.conditions {'sample': 'S001', 'date': '2025-10-18', 'temperature': 25.0}
- property details: dict[str, Any]
Get additional details for this experiment.
- No-index:
- Returns:
Dictionary of additional context (notes, quality, instrument, etc.).
- Return type:
dict[str,Any]
Examples
>>> exp.details {'operator': 'Jane Doe', 'instrument': 'Spec-X', 'notes': 'Good quality'}
- __len__()[source]
Get number of measurement sets in this experiment.
- Returns:
Number of measurement sets.
- Return type:
Examples
>>> len(exp) 2
- __iter__()[source]
Iterate over measurement sets in this experiment.
- Yields:
MeasurementSet– Each measurement set in order.
Examples
>>> for ms in exp: ... print(ms.conditions["measurement_type"]) absorption fluorescence
- __getitem__(index)[source]
Get measurement set by index.
- Parameters:
index (
intorslice) – Index or slice to access measurement sets.- Returns:
MeasurementSet at the given index, or tuple for slice.
- Return type:
MeasurementSetortuple[MeasurementSet,]
Examples
>>> exp[0] <MeasurementSet at 0x...> >>> exp[0:2] (<MeasurementSet at 0x...>, <MeasurementSet at 0x...>)
ExperimentSet
ExperimentSet class for piblin-jax.
Top-level container for multiple Experiment objects representing a study or project.
- class piblin_jax.data.collections.experiment_set.ExperimentSet(experiments, conditions=None, details=None)[source]
Bases:
objectTop-level container for multiple Experiment objects.
An ExperimentSet represents the highest level of the data hierarchy, typically corresponding to: - Complete research project or study - Multi-sample analysis - Entire experimental campaign - Publication dataset
This is the entry point for organizing and managing entire experimental datasets with consistent metadata and structure.
- Parameters:
experiments (
list[Experiment]) – List of Experiment objects in this set.conditions (
dict[str,Any] | None, optional) – Global conditions for the entire study (e.g., project name, year, instrument, principal investigator).details (
dict[str,Any] | None, optional) – Additional context for the study (e.g., publication info, funding source, study objectives).
- experiments
Immutable tuple of experiments in this set.
- Type:
tuple[Experiment,]
- conditions
Global metadata for the entire study.
- Type:
dict[str,Any]
- details
Additional metadata for the study.
- Type:
dict[str,Any]
Notes
The experiments are stored as a tuple to ensure immutability, which is required for JAX transformations. Individual experiments can be accessed by indexing or iteration.
Hierarchy level: ExperimentSet → Experiment → MeasurementSet → Measurement → Dataset
This is the top level of the hierarchy and provides global context for all contained data.
Examples
>>> import numpy as np >>> from piblin_jax.data.datasets import OneDimensionalDataset >>> from piblin_jax.data.collections import ( ... Measurement, MeasurementSet, Experiment, ExperimentSet ... ) >>> >>> # Create experiments for multiple samples >>> experiments = [] >>> >>> for sample_id in ['S001', 'S002', 'S003']: ... # Create measurements for this sample ... x = np.linspace(0, 10, 100) ... y = np.sin(x) * (ord(sample_id[-1]) - ord('0')) ... ds = OneDimensionalDataset(x, y) ... m = Measurement([ds]) ... ms = MeasurementSet([m]) ... ... # Create experiment for this sample ... exp = Experiment( ... [ms], ... conditions={"sample": sample_id, "date": "2025-10-18"} ... ) ... experiments.append(exp) >>> >>> # Create experiment set for the complete study >>> study = ExperimentSet( ... experiments=experiments, ... conditions={ ... "project": "QuantIQ-2025", ... "instrument": "Spectrometer-X", ... "year": 2025 ... }, ... details={ ... "pi": "Dr. Jane Smith", ... "funding": "NSF Grant 12345", ... "description": "Comparative spectroscopy study" ... } ... ) >>> >>> len(study) 3 >>> study.conditions["project"] 'QuantIQ-2025' >>> study[0].conditions["sample"] 'S001' >>> study.details["pi"] 'Dr. Jane Smith'
- Attributes:
conditionsGet global conditions for the entire study.
detailsGet additional details for the study.
experimentsGet all experiments in this set.
Methods
get_experiment_by_condition(**condition_filters)Get experiments matching specified conditions.
- __init__(experiments, conditions=None, details=None)[source]
Initialize ExperimentSet with experiments and metadata.
- Parameters:
experiments (
list[Experiment]) – List of Experiment objects in this set.conditions (
dict[str,Any] | None, optional) – Global conditions for the entire study.details (
dict[str,Any] | None, optional) – Additional context for the study.
- property experiments: tuple[Experiment, ...]
Get all experiments in this set.
- No-index:
- Returns:
Immutable tuple of Experiment objects.
- Return type:
tuple[Experiment,]
Examples
>>> study.experiments (<Experiment at 0x...>, <Experiment at 0x...>, <Experiment at 0x...>)
- property conditions: dict[str, Any]
Get global conditions for the entire study.
- No-index:
- Returns:
Dictionary of global metadata (project, year, instrument, etc.).
- Return type:
dict[str,Any]
Examples
>>> study.conditions {'project': 'QuantIQ-2025', 'instrument': 'Spectrometer-X', 'year': 2025}
- property details: dict[str, Any]
Get additional details for the study.
- No-index:
- Returns:
Dictionary of additional context (PI, funding, objectives, etc.).
- Return type:
dict[str,Any]
Examples
>>> study.details {'pi': 'Dr. Jane Smith', 'funding': 'NSF Grant 12345', 'description': '...'}
- __len__()[source]
Get number of experiments in this set.
- Returns:
Number of experiments.
- Return type:
Examples
>>> len(study) 3
- __iter__()[source]
Iterate over experiments in this set.
- Yields:
Experiment– Each experiment in order.
Examples
>>> for exp in study: ... print(exp.conditions["sample"]) S001 S002 S003
- __getitem__(index)[source]
Get experiment by index.
- Parameters:
index (
intorslice) – Index or slice to access experiments.- Returns:
Experiment at the given index, or tuple of experiments for slice.
- Return type:
Experimentortuple[Experiment,]
Examples
>>> study[0] <Experiment at 0x...> >>> study[0:2] (<Experiment at 0x...>, <Experiment at 0x...>)
- get_experiment_by_condition(**condition_filters)[source]
Get experiments matching specified conditions.
- Parameters:
**condition_filters (Any) – Keyword arguments specifying condition values to match. Only experiments where ALL specified conditions match the given values will be included.
- Returns:
List of experiments matching the conditions.
- Return type:
list[Experiment]
Examples
>>> # Get all experiments for sample S001 >>> s001_exps = study.get_experiment_by_condition(sample="S001") >>> len(s001_exps) 1 >>> s001_exps[0].conditions["sample"] 'S001' >>> >>> # Get experiments matching multiple conditions >>> dated_exps = study.get_experiment_by_condition( ... date="2025-10-18", ... sample="S002" ... )
Utilities
Metadata
Metadata utilities for managing, validating, extracting, and merging metadata.
This module provides utilities for working with metadata (conditions and details) across the data hierarchy. Metadata is separated into:
Conditions: Experimental parameters that define comparability between datasets (e.g., temperature, pressure, concentration)
Details: Contextual information that doesn’t affect experimental conditions (e.g., operator, date, notes)
The module supports: - Merging metadata from multiple sources with configurable conflict resolution - Separating conditions from details using explicit keys or heuristics - Validating metadata against schemas with type checking - Extracting metadata from filenames, paths, and file headers
- piblin_jax.data.metadata.merge_metadata(metadata_list, strategy='override')[source]
Merge multiple metadata dictionaries.
Combines metadata from multiple sources with configurable conflict resolution. Metadata dictionaries are processed in order, with later dictionaries having higher priority (for ‘override’ strategy).
- Parameters:
metadata_list (
list[dict[str,Any]]) – List of metadata dictionaries to merge (in priority order). Earlier dictionaries have lower priority for conflict resolution.strategy (
str, optional) –Conflict resolution strategy (default: “override”):
’override’: Later values override earlier ones
’keep_first’: Keep first value encountered
’raise’: Raise ValueError on conflicts
’list’: Collect conflicting values in a list (duplicates removed)
- Returns:
Merged metadata dictionary
- Return type:
dict[str,Any]- Raises:
ValueError – If strategy is ‘raise’ and conflicts are detected, or if strategy is unknown
Examples
>>> meta1 = {"temp": 20, "sample": "A1"} >>> meta2 = {"temp": 25, "pressure": 1.0} >>> merge_metadata([meta1, meta2]) {'temp': 25, 'sample': 'A1', 'pressure': 1.0}
>>> merge_metadata([meta1, meta2], strategy="keep_first") {'temp': 20, 'sample': 'A1', 'pressure': 1.0}
>>> merge_metadata([meta1, meta2], strategy="list") {'temp': [20, 25], 'sample': 'A1', 'pressure': 1.0}
- piblin_jax.data.metadata.separate_conditions_details(metadata, condition_keys=None)[source]
Separate metadata into conditions and details.
Conditions are experimental parameters that define comparability between datasets (e.g., temperature, pressure). Details are contextual information (e.g., operator, date, notes).
- Parameters:
metadata (
dict[str,Any]) – Combined metadata dictionarycondition_keys (
list[str] | None, optional) – Known condition keys (experimental parameters). If None, heuristics are used to identify conditions based on common experimental parameter names.
- Returns:
conditions (
dict[str,Any]) – Experimental conditions (parameters defining comparability)details (
dict[str,Any]) – Context information (non-experimental metadata)
- Return type:
Examples
>>> metadata = {"temp": 25, "pressure": 1.0, "operator": "John"} >>> conditions, details = separate_conditions_details( ... metadata, ... condition_keys=["temp", "pressure"] ... ) >>> conditions {'temp': 25, 'pressure': 1.0} >>> details {'operator': 'John'}
Using heuristics:
>>> metadata = {"temperature": 25, "strain": 0.1, "notes": "Trial 1"} >>> conditions, details = separate_conditions_details(metadata) >>> "temperature" in conditions True >>> "notes" in details True
- piblin_jax.data.metadata.validate_metadata(metadata, schema=None, required_keys=None)[source]
Validate metadata against a schema.
Performs type checking and required key validation. Validation is optional and can be configured with schema and required_keys parameters.
- Parameters:
metadata (
dict[str,Any]) – Metadata to validateschema (
dict[str,type | Callable[[Any],bool]] | None, optional) –Schema defining expected types or validation functions. Keys are metadata field names, values are either:
Type objects (e.g., float, str, int) for type checking
Callable validators that return True if valid
Example:
{'temperature': float, 'sample_id': str}required_keys (
list[str] | None, optional) – Keys that must be present in metadata
- Returns:
True if valid
- Return type:
- Raises:
ValueError – If validation fails (missing required keys, type mismatch, or custom validation function returns False)
Examples
Type checking:
>>> metadata = {"temp": 25.0, "sample": "A1"} >>> schema = {"temp": float, "sample": str} >>> validate_metadata(metadata, schema=schema) True
Required keys:
>>> validate_metadata(metadata, required_keys=["temp", "sample"]) True
Custom validation:
>>> schema = {"ph": lambda x: 0 <= x <= 14} >>> validate_metadata({"ph": 7.0}, schema=schema) True
- piblin_jax.data.metadata.parse_key_value_string(text, separator='=', delimiter=',')[source]
Parse key-value pairs from a string.
Extracts metadata from delimited key-value strings commonly found in filenames, headers, or configuration strings.
- Parameters:
- Returns:
Parsed metadata (all values are strings, convert as needed)
- Return type:
dict[str,str]
Examples
>>> parse_key_value_string("temp=25,pressure=1.0") {'temp': '25', 'pressure': '1.0'}
>>> parse_key_value_string("temp:25;pressure:1.0", separator=":", delimiter=";") {'temp': '25', 'pressure': '1.0'}
- piblin_jax.data.metadata.extract_from_filename(filename, pattern=None)[source]
Extract metadata from filename using regex pattern.
Parses filenames to extract metadata using either custom regex patterns or common heuristics for scientific data files.
- Parameters:
filename (
str | Path) – Filename or path (extension is removed before matching)pattern (
str | None, optional) – Regex pattern with named groups for extraction. If None, uses common heuristics for sample names, temperatures, and replicate numbers.
- Returns:
Extracted metadata (all values are strings)
- Return type:
dict[str,str]
Examples
Using heuristics:
>>> extract_from_filename("sample_A1_temp_25C_001.csv") {'sample': 'A1', 'temp': '25', 'replicate': '001'}
Using custom pattern:
>>> pattern = r"(?P<sample>\w+)_(?P<temp>\d+)C" >>> extract_from_filename("sample_A1_25C.csv", pattern) {'sample': 'A1', 'temp': '25'}
- piblin_jax.data.metadata.extract_from_path(filepath, level_names=None)[source]
Extract metadata from directory structure.
Parses directory hierarchy to extract metadata based on directory names at different levels.
- Parameters:
filepath (
str | Path) – File pathlevel_names (
list[str] | None, optional) – Names for each directory level (from deepest to root). Example:['sample', 'experiment', 'project']extracts sample from parent directory, experiment from grandparent, etc. If None, returns empty dict.
- Returns:
Extracted metadata
- Return type:
dict[str,str]
Examples
>>> extract_from_path( ... "/data/ProjectA/ExpB/SampleC/data.csv", ... ['sample', 'experiment', 'project'] ... ) {'sample': 'SampleC', 'experiment': 'ExpB', 'project': 'ProjectA'}
- piblin_jax.data.metadata.parse_header_metadata(header_lines, comment_char='#', separator=':')[source]
Parse metadata from file header comment lines.
Extracts metadata from comment lines in file headers, commonly used in scientific data files to store experimental conditions and context.
- Parameters:
- Returns:
Parsed metadata (all values are strings)
- Return type:
dict[str,str]
Examples
>>> lines = [ ... "# Temperature: 25", ... "# Pressure: 1.0", ... "# Sample: A1" ... ] >>> parse_header_metadata(lines) {'Temperature': '25', 'Pressure': '1.0', 'Sample': 'A1'}
With custom separators:
>>> lines = ["// Temp = 25", "// Sample = A1"] >>> parse_header_metadata(lines, comment_char="//", separator="=") {'Temp': '25', 'Sample': 'A1'}
Region of Interest (ROI)
Region of Interest (ROI) classes for piblin-jax.
This module provides classes for defining regions on independent variables: - LinearRegion: Contiguous region on a 1D independent variable - CompoundRegion: Container for multiple LinearRegion objects (union)
Regions are used with RegionTransform to apply transformations only within specified regions while preserving data outside those regions.
- class piblin_jax.data.roi.CompoundRegion(regions)[source]
Bases:
objectContainer for multiple LinearRegion objects (union of regions).
A CompoundRegion represents the union of multiple disjoint or overlapping LinearRegion objects. It generates combined masks that include all points in any of the constituent regions.
- Parameters:
regions (
list[LinearRegion]) – List of LinearRegion objects- Raises:
ValueError – If regions list is empty
TypeError – If any element is not a LinearRegion
Examples
>>> import numpy as np >>> from piblin_jax.data.roi import LinearRegion, CompoundRegion >>> # Define two disjoint regions >>> region1 = LinearRegion(x_min=1.0, x_max=2.0) >>> region2 = LinearRegion(x_min=4.0, x_max=5.0) >>> compound = CompoundRegion([region1, region2]) >>> # Generate combined mask >>> x_data = np.array([0, 1, 2, 3, 4, 5, 6]) >>> mask = compound.get_mask(x_data) >>> mask array([False, True, True, False, True, True, False]) >>> # Extract data from both regions >>> x_data[mask] array([1, 2, 4, 5])
Notes
The mask is the union (OR) of all constituent region masks
Regions can be disjoint or overlapping
Access individual regions using indexing: compound[0], compound[1], etc.
Get number of regions using len(compound)
Methods
get_mask(x_data)Generate combined boolean mask (union of all regions).
- __init__(regions)[source]
Initialize CompoundRegion.
- Parameters:
regions (
list[LinearRegion]) – List of LinearRegion objects- Raises:
ValueError – If regions list is empty
TypeError – If any element is not a LinearRegion
- get_mask(x_data)[source]
Generate combined boolean mask (union of all regions).
Creates a boolean array where True indicates points within any of the constituent regions.
- Parameters:
x_data (
np.ndarray) – Independent variable data- Returns:
Boolean mask (True for points in any region)
- Return type:
np.ndarray
Examples
>>> region1 = LinearRegion(x_min=1.0, x_max=2.0) >>> region2 = LinearRegion(x_min=4.0, x_max=5.0) >>> compound = CompoundRegion([region1, region2]) >>> x_data = np.array([0, 1, 2, 3, 4, 5, 6]) >>> compound.get_mask(x_data) array([False, True, True, False, True, True, False])
- __len__()[source]
Return number of regions.
- __getitem__(index)[source]
Get region by index.
- __repr__()[source]
Return string representation of CompoundRegion.
- class piblin_jax.data.roi.LinearRegion(x_min, x_max)[source]
Bases:
objectRepresents a contiguous region on a 1D independent variable.
A LinearRegion defines a contiguous range [x_min, x_max] (inclusive) on an independent variable. It can generate boolean masks to select data points within this range.
- Parameters:
- Raises:
ValueError – If x_min >= x_max
Examples
>>> import numpy as np >>> from piblin_jax.data.roi import LinearRegion >>> # Define region from 2.0 to 5.0 >>> region = LinearRegion(x_min=2.0, x_max=5.0) >>> # Generate mask for data >>> x_data = np.array([0, 1, 2, 3, 4, 5, 6, 7]) >>> mask = region.get_mask(x_data) >>> mask array([False, False, True, True, True, True, False, False]) >>> # Extract data within region >>> x_data[mask] array([2, 3, 4, 5])
Notes
Bounds are inclusive: both x_min and x_max are included in the region
Masks are generated using NumPy arrays for compatibility
Use with RegionTransform to apply selective transformations
Methods
get_mask(x_data)Generate boolean mask for data within region.
- __init__(x_min, x_max)[source]
Initialize LinearRegion.
- Parameters:
- Raises:
ValueError – If x_min >= x_max
- get_mask(x_data)[source]
Generate boolean mask for data within region.
Creates a boolean array where True indicates points within the region [x_min, x_max] (inclusive).
- Parameters:
x_data (
np.ndarray) – Independent variable data- Returns:
Boolean mask (True for points in region)
- Return type:
np.ndarray
Examples
>>> region = LinearRegion(x_min=2.0, x_max=5.0) >>> x_data = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) >>> region.get_mask(x_data) array([False, True, True, True, True, False])
- __repr__()[source]
Return string representation of LinearRegion.