This notebook takes you from "what is remote sensing?" to running a complete multi-parameter satellite retrieval pipeline. No prior knowledge of radiative transfer or remote sensing is assumed — every concept is built from first principles using lightweight toy models before connecting to the real BioSNICAR codebase.
| Act | Sections | Topic |
|---|---|---|
| I | 1–5 | What Satellites See — spectra, bands, SRFs, platforms, indices |
| II | 6–9 | Bridging Spectra and Bands — forward problem, information content, multi-platform |
| III | 10–14 | The Inverse Problem in Practice — degeneracy, log-space, uncertainty, masking |
| IV | 15–18 | Building and Using the Emulator — speed, default emulator, custom builds, retrieval API |
| V | 19–23 | Real-World Retrieval Scenarios — SSA, Sentinel-2, MCMC, multi-param, time-series |
| VI | 24–25 | Putting It All Together — capstone pipeline (optional) and decision guide |
All sections except the optional capstone (Section 24) run on standard scientific Python. BioSNICAR is only required for Section 24 and is detected automatically.
.ipynb
filegit clone https://github.com/jmcook1186/biosnicar-py.git && cd biosnicar-py/notebooks && jupyter notebook remote_sensing_with_biosnicar.ipynb
Before we begin, we define a parametric snow/ice spectrum that serves as our toy forward model throughout the notebook. It maps physical parameters to a 480-band spectral albedo — the same output shape as BioSNICAR's real forward model.
The primary surface parameter is the specific surface area (SSA), which combines grain size and density into a single variable that controls scattering:
$$\text{SSA} = \frac{3}{r_{\text{eff}} \cdot \rho_{\text{ice}}}$$where $r_{\text{eff}}$ is effective grain radius (m) and $\rho_{\text{ice}}$ = 917 kg m⁻³.
The spectral albedo uses the Kokhanovsky (2004) approximation for a semi-infinite snowpack:
$$\alpha(\lambda) \approx \exp\!\left(-\alpha_0\,\sqrt{\gamma(\lambda)\,r_{\text{eff}}}\right)$$where
$\gamma(\lambda) = 4\pi\kappa(\lambda)/\lambda$ is the bulk ice
absorption coefficient computed from the imaginary refractive index
of ice ($\kappa$, loaded from data/OP_data/k_ice_480.csv). Because we
use the same $\kappa(\lambda)$ data as BioSNICAR, our toy spectra have
the correct absorption band structure — the characteristic features at
1.03, 1.25, 1.5, and 2.0 µm appear in the right places with the right
relative depths.
A small Fresnel floor (~2% reflectance) prevents the albedo from reaching exactly zero at strongly absorbing wavelengths, matching the physical minimum set by surface reflection off ice grain facets.
Act I introduces the fundamentals of satellite remote sensing: what electromagnetic radiation reaches the sensor, why satellites divide the spectrum into discrete bands, and how spectral indices compress band data into diagnostic numbers.
Remote sensing is the measurement of electromagnetic radiation reflected or emitted by the Earth's surface, collected by a sensor that is not in direct contact with the surface. For snow and ice, we focus on the solar window (0.3–2.5 µm) where sunlight illuminates the surface and the reflected fraction — the albedo — encodes information about the surface's physical properties.
The basic geometry is:
Albedo ($\alpha$) is the fraction of incident radiation reflected:
$$\alpha(\lambda) = \frac{F^{\uparrow}(\lambda)}{F^{\downarrow}(\lambda)}$$A perfectly white surface has $\alpha = 1$; a perfectly absorbing surface has $\alpha = 0$. Fresh snow can exceed $\alpha = 0.95$ in the visible but drops below 0.1 in the thermal infrared — this strong wavelength dependence is what makes spectral remote sensing powerful.
Figure 1. (Left) Normalised top-of-atmosphere solar irradiance as a function of wavelength, approximated by a 5778 K black body. The shaded region marks the 0.3–2.5 µm solar window where reflected-light remote sensing operates. (Right) Spectral albedo for three clean snow/ice surfaces spanning SSA = 50 (fresh snow, r ≈ 65 µm) to SSA = 2 (coarse glacier ice, r ≈ 1635 µm). All three share high visible albedo (> 0.9) but diverge sharply in the near-infrared, where ice absorption bands at 1.03, 1.25, 1.5, and 2.0 µm deepen with decreasing SSA.
BioSNICAR connection — The real forward model in
biosnicar/drivers/run_model.pyproduces the same output: a 480-band spectral albedo on the grid defined bybiosnicar/bands/_core.py:14(WVL = np.arange(0.205, 4.999, 0.01)). Our toy function mimics this interface so that everything we build in this notebook transfers directly to the real model.
A spectrometer records albedo at hundreds of wavelengths, but satellite sensors typically measure in only 5–20 spectral bands. Why?
A band integrates the continuous spectrum over a wavelength interval. The simplest model is a rectangular (tophat) filter:
$$\alpha_{\text{band}} = \frac{\int_{\lambda_1}^{\lambda_2} \alpha(\lambda)\, F(\lambda)\, d\lambda}{\int_{\lambda_1}^{\lambda_2} F(\lambda)\, d\lambda}$$where $F(\lambda)$ is the solar flux. This flux-weighted average ensures that wavelengths contributing more energy count more.
Figure 2. (Left) A 480-band albedo spectrum (aged snow, SSA = 10) overlaid with five tophat spectral response functions (VIS, Red, NIR, SWIR1, SWIR2). Each coloured bar spans the band's wavelength range; the label gives the flux-weighted band-average albedo. (Right) The effect of bandwidth on the retrieved band value: narrow bands (¼ width) sample a smaller spectral region and can differ from the standard-width result, while wide bands (3× width) smooth out spectral features. The differences are largest for SWIR bands where the spectrum changes rapidly.
Real satellite bands aren't perfect rectangles. Each detector has a spectral response function (SRF) that describes its sensitivity as a function of wavelength. The SRF might look roughly Gaussian, or have side lobes, or be deliberately shaped by optical filters.
The convolution formula generalises to:
$$\alpha_{\text{band}} = \frac{\sum_i \alpha(\lambda_i)\, \text{SRF}(\lambda_i)\, F(\lambda_i)}{\sum_i \text{SRF}(\lambda_i)\, F(\lambda_i)}$$This is exactly the formula implemented in
biosnicar/bands/_core.py:56
(srf_convolve()). The key point: the SRF and solar flux together
determine the weighting — wavelengths where both the SRF and the flux
are large dominate the average.
Figure 3. (Left) Two spectral response function shapes — a tophat and a Gaussian of equal FWHM — for the NIR band centred at 0.842 µm. (Centre) The convolution integrand (albedo × SRF × solar flux) for each shape, showing the quantity that is integrated to produce the band value. The Gaussian extends beyond the tophat edges, including slightly different spectral regions. (Right) Band-averaged NIR albedo as a function of SSA for both SRF shapes, demonstrating that the tophat approximation introduces negligible error for this broad band.
data/band_srfs/. These can be
replaced with manufacturer-measured SRFs when available.BioSNICAR connection —
biosnicar/bands/_core.py:21(load_srf()) reads SRF CSVs and caches them.srf_convolve()(line 56) implements the flux-weighted convolution. The tophat CSVs live indata/band_srfs/sentinel2_msi.csv, etc.
BioSNICAR supports 8 satellite/model platforms, each with a different number of bands and spectral coverage:
| Platform | Bands | Range (µm) | Method | Type |
|---|---|---|---|---|
| Sentinel-2 MSI | 13 | 0.44–2.19 | SRF convolution | Satellite |
| Sentinel-3 OLCI | 21 | 0.40–1.02 | SRF convolution | Satellite |
| Landsat 8 OLI | 7 | 0.44–2.20 | SRF convolution | Satellite |
| MODIS | 7 (+3 broadband) | 0.47–2.13 | SRF convolution | Satellite |
| CESM2 (2-band) | 2 | 0.2–5.0 | Interval average | Climate model |
| CESM RRTMG | 14 | 0.2–12.2 | Interval average | Climate model |
| MAR | 4 | 0.25–4.0 | Interval average | Climate model |
| HadCM3 | 6 | 0.2–5.0 | Interval average | Climate model |
SRF convolution uses the platform's spectral response functions. Interval averaging computes flux-weighted means over fixed wavelength ranges — appropriate for climate models whose radiation schemes work in broad intervals.
Figure 4. The same snow/ice surface (SSA = 10, BC = 50 ppb) viewed through four different platforms. (Top left) Sentinel-2's 13 bands sample the spectrum densely from the visible to 2.2 µm. (Top right) Landsat 8's 7 bands provide similar SWIR coverage but with fewer VIS-NIR points. (Bottom left) MODIS's 7 land bands are sparser. (Bottom right) CESM2 reduces the entire spectrum to just two broadband values (VIS and NIR), losing all spectral structure but retaining the energy-budget split needed by climate models.
to_platform() call in
BioSNICAR.BioSNICAR connection — Satellite platforms live in
biosnicar/bands/platforms/(sentinel2, sentinel3, landsat8, modis) and GCM platforms inbiosnicar/bands/gcm/(cesm, mar, hadcm3). Both callsrf_convolve()orinterval_average()from_core.py. The_register()function (biosnicar/bands/__init__.py:65) maps string names (e.g.,"sentinel2") to platform functions. Sentinel-2's 13 bands are defined inbiosnicar/bands/platforms/sentinel2.py:29–32.
A spectral index combines two or more bands into a single number that highlights a specific surface property. The most common form is a normalised difference:
$$\text{Index} = \frac{B_a - B_b}{B_a + B_b}$$which ranges from −1 to +1 and is insensitive to overall brightness (e.g., illumination geometry). Key indices for snow and ice:
| Index | Formula | What it measures |
|---|---|---|
| NDSI | (Green − SWIR1) / (Green + SWIR1) | Snow vs non-snow (NDSI > 0.4 ≈ snow) |
| NDVI | (NIR − Red) / (NIR + Red) | Vegetation (algae on ice → NDVI > 0) |
| Impurity Index (II) | Green / NIR-A | Darkening in visible → impurity load |
These are fast, cheap diagnostics — but they cannot separate underlying causes. A low NDSI could mean dirty snow, wet snow, or thin snow over rock. For that, you need a physical retrieval (Acts III–V).
Figure 5. Spectral indices for six surface scenarios ranging from clean snow to a dirty glacier. (Left) NDSI vs NDVI scatter plot: clean/aged snow clusters at high NDSI and near-zero NDVI, while biological impurities (algae) push NDVI positive and reduce NDSI. (Centre) NDSI bar chart showing that all snow/ice surfaces exceed 0.4 (the standard snow-detection threshold) except the most heavily polluted. (Right) Impurity Index (B3 / B8A): clean surfaces are near 1.0; increasing visible-band darkening from impurities drives the index below 1.0.
BioSNICAR connection — Sentinel-2 indices (NDSI, NDVI, II) are computed in
biosnicar/bands/platforms/sentinel2.py:42–51and stored as named attributes on theBandResultobject. Access them viaresult.NDSI,result.NDVI,result.II.
Act II connects the forward model to satellite bands and explores how much information survives the spectral-to-band compression.
The forward problem maps physical parameters to observables. In remote sensing, there are two levels:
The complete pipeline is:
$$\text{(rds, ρ, BC, algae, ...)} \xrightarrow{\text{forward model}} \alpha(\lambda) \xrightarrow{\texttt{to\_platform()}} (B_1, B_2, \ldots, B_N)$$At each arrow, information is lost: the forward model is many-to-one (different parameter combinations can give similar spectra), and band convolution compresses 480 values into N ≪ 480.
Figure 6. The forward-problem pipeline visualised for a moderately impure glacier surface (SSA = 8, BC = 20, GA = 2000). (Left) The full 480-band spectral albedo showing the characteristic ice absorption features and visible-band darkening from impurities. (Right) The same spectrum compressed to Sentinel-2's 13 bands, each bar positioned at the band centre wavelength. The compression ratio is 37:1 — the 13 band values must encode all retrievable information about the surface.
Not all 13 Sentinel-2 bands are independent — adjacent bands are correlated because the underlying spectrum changes smoothly. Principal Component Analysis (PCA) reveals how many independent dimensions of information the bands actually carry.
The number of PCA components needed to explain >99% of variance gives the effective degrees of freedom (DOF) — roughly how many parameters we can hope to retrieve from the band data.
Figure 7. Information content of Sentinel-2's 13 bands, assessed over 2000 randomly generated snow/ice surfaces. (Left) Band-to-band correlation matrix: visible bands (B1–B4) are highly correlated (red), while SWIR bands (B11, B12) add independent information (blue/white off-diagonal). (Right) PCA variance explained: ~3 principal components capture > 99% of the total variance (red curve), indicating that the 13 bands carry only ~3–4 independent degrees of freedom. This sets the practical ceiling for multi-parameter retrieval.
The same surface looks different through different satellites because each one samples different parts of the spectrum with different bandwidths. Understanding these differences is critical for multi-platform studies.
Figure 8. The same surface spectrum viewed through four platforms with different band configurations. Each panel overlays the 480-band spectrum (grey) with the platform's band-averaged values. The comparison illustrates how spectral resolution and coverage vary: Sentinel-2 and Landsat 8 extend into the SWIR (critical for SSA retrieval), MODIS provides daily global coverage with 7 land bands, and CESM 2-band reduces everything to a VIS/NIR split for climate energy budgets.
to_platform()
Entry Points¶BioSNICAR provides three equivalent ways to convert spectra to bands, depending on your workflow:
| Entry point | When to use |
|---|---|
biosnicar.to_platform(albedo, "sentinel2", flx_slr) |
Standalone function — works with any albedo array |
outputs.to_platform("sentinel2") |
After run_model() — uses stored albedo and flux |
sweep_result.to_platform("sentinel2") |
After sweep() — batch-converts all spectra |
All three return a BandResult object with named band attributes (.B1,
.B2, ...) and index attributes (.NDSI, .NDVI, .II).
# Example usage (requires biosnicar):
result = biosnicar.to_platform(albedo, "sentinel2", flx_slr=flux)
print(result.B3) # Green band albedo
print(result.NDSI) # Normalised Difference Snow Index
print(result.as_dict()) # All bands + indices as a dictionary
BioSNICAR connection — The three entry points are:
biosnicar/bands/__init__.py:69(to_platform()),Outputs.to_platform()inbiosnicar/classes/outputs.py, andSweepResult.to_platform()inbiosnicar/drivers/sweep.py. TheBandResultcontainer is defined atbiosnicar/bands/__init__.py:18.
The inverse problem is: given satellite observations, what are the surface properties? This is harder than the forward problem because the mapping is not one-to-one. Act III builds the mathematical tools for inversion.
The forward problem is well-posed: given parameters, compute the spectrum. The inverse problem asks: given an observed spectrum (or band values), what parameters produced it?
The standard approach is optimisation: find parameters that minimise a cost function measuring the mismatch between model and observation:
$$J(\mathbf{x}) = \sum_i \left(\frac{\alpha^{\text{obs}}_i - \alpha^{\text{model}}_i(\mathbf{x})}{\sigma_i}\right)^2$$where $\sigma_i$ is the measurement uncertainty in band $i$.
The challenge: the cost landscape may be flat (parameters to which the spectrum is insensitive), multi-modal (multiple parameter sets giving equally good fits), or degenerate (different parameters trading off against each other).
Figure 9. A single-parameter retrieval of SSA from a noisy 480-band observation. (Left) Cost function (sum of squared residuals) as a function of SSA, showing a well-defined minimum near the true value (green dashed line). The parabolic shape near the minimum indicates the problem is well-posed. (Right) Spectral comparison: the noisy observation (grey dots), the true spectrum (green), and the best-fit model (red dashed) are nearly indistinguishable, confirming a successful retrieval.
BioSNICAR connection — The cost function is implemented in
biosnicar/inverse/cost.py:19(spectral_cost()) for 480-band mode andbiosnicar/inverse/cost.py:76(band_cost()) for satellite-band mode. Both compute chi-squared with optional uncertainty weighting. Theretrieve()function (biosnicar/inverse/optimize.py:71) wraps the optimiser and dispatches to the appropriate cost function.
Some parameters produce degenerate spectral signatures — different combinations give nearly identical spectra. The classic example in snow remote sensing is the grain radius – density trade-off.
The specific surface area can be written as:
$$\text{SSA} = \frac{3}{r_{\text{eff}} \cdot \rho}$$where $\rho$ is the bulk density of the medium (kg m⁻³). For solid ice ($\rho = 917$) this simplifies to $3 / (r \cdot 917)$. But for porous snow or bubbly glacier ice, both grain radius and bulk density matter: the same SSA can arise from small grains in dense ice or larger grains in less dense snow.
If you try to retrieve rds and ρ jointly, the optimiser sees a valley in cost space along constant-SSA hyperbolas — it can trade rds against ρ without improving the fit. The result: both parameters are poorly constrained, even though their combination (SSA) is well-determined.
The solution: retrieve SSA directly.
Figure 10. Why SSA, not (rds, ρ), should be the retrieval target. (Left) Cost surface in (grain radius, density) space for a fixed observation: the deep valley (dark colours) runs along the iso-SSA hyperbola (cyan dashed line) where SSA = 3/(r·ρ) is constant, meaning any (rds, ρ) pair on this curve fits the data equally well. (Centre) Box plot comparing retrieval errors over 80 random trials: retrieving SSA directly (green, 1 parameter) gives consistently low errors, while jointly retrieving rds and ρ (red, 2 parameters) produces large errors on both because the optimiser slides along the cost valley. (Right) Spectra for four (rds, ρ) pairs that all give SSA = 10 — they are identical, confirming the degeneracy.
Best practice: Always retrieve SSA, not grain radius. If you need rds, derive it from SSA using ancillary density information.
BioSNICAR connection — The
RetrievalResultclass (biosnicar/inverse/result.py:72) automatically computes SSA from retrieved parameters using_compute_ssa()(line 16). The emulator's_make_ssa_emulator_fn()inoptimize.pybuilds a dedicated SSA forward function that bypasses the degeneracy entirely.
Impurity concentrations span orders of magnitude (0.1–10,000 ppb for BC). Optimising in linear space wastes steps: a change from 1 to 2 ppb matters much more than 5001 to 5002. The solution is to optimise in log space:
$$x_{\text{opt}} = \log_{10}(x + 1)$$The +1 avoids the singularity at $x = 0$. This
transform compresses the
dynamic range and makes the cost surface more symmetric.
A second challenge: not all impurities are equally detectable. Dust has very low spectral sensitivity — its spectrum overlaps with grain-size effects, making it the hardest impurity to retrieve.
Figure 11. Log-space optimisation and impurity sensitivity. (Left) Cost function vs BC concentration in linear space — the landscape is highly asymmetric, steep near zero and flat at high concentrations. (Centre) The same cost function in log₁₀(x + 1) space — the landscape is more symmetric, improving gradient-based optimiser convergence. (Right) Spectral sensitivity |Δα/Δc| per unit concentration for four impurity types: black carbon and algae produce strong, spectrally distinct signatures, while dust sensitivity is an order of magnitude weaker and spectrally similar to grain-size effects.
BioSNICAR connection —
_LOG_SPACE_PARAMSis defined atbiosnicar/inverse/optimize.py:32:{"black_carbon", "snow_algae", "glacier_algae", "dust", "ssa"}. Theretrieve()function automatically applies the log₁₀(x+1) transform for these parameters during optimisation.
Three increasingly sophisticated cost functions:
Figure 12. Three cost-function formulations for a 2-parameter (SSA, BC) retrieval from a spectrum with wavelength-dependent noise. Each panel shows contours of log₁₀(cost) in (SSA, BC) space, with the truth (cyan star) and grid-search minimum (green square). (Left) Unweighted sum of squares — the minimum is biased because noisy SWIR bands dominate. (Centre) Chi-squared weighting by 1/σ² — the minimum tightens and moves closer to the truth. (Right) Regularised cost with Gaussian priors — the minimum is further stabilised but pulled slightly toward the prior values.
Some spectral regions carry more information about certain parameters than others. Wavelength masking selects a subset of bands or wavelengths to:
Figure 13. Wavelength masking for SSA retrieval. (Left) Spectral sensitivity |dα/dSSA| as a function of wavelength: sensitivity peaks in the NIR (0.9–1.3 µm) where ice absorption is moderate, and is near zero in the visible. Grey shading marks atmospheric water-vapour absorption bands (1.4, 1.9 µm) that should be excluded from ground-truth retrievals. The green fill shows an optimal mask targeting high-sensitivity, low-contamination regions. (Right) SSA retrieval error (m² kg⁻¹) for three masking strategies, showing that targeted masking can match or improve upon using the full spectrum.
BioSNICAR connection — The
retrieve()function accepts awavelength_maskparameter to restrict the fit to specific wavelength regions. For band-mode retrieval, the platform's band set acts as an implicit mask.
Physical forward models are expensive (~0.5 s per evaluation). Optimisation requires thousands of evaluations. The solution: train a machine-learning emulator that reproduces the forward model in microseconds.
A retrieval typically requires:
At 0.5 seconds per evaluation:
| Method | Evaluations | Wall time |
|---|---|---|
| L-BFGS-B | 200 | 100 s |
| DE + L-BFGS-B | 5,000 | 42 min |
| MCMC (short) | 50,000 | 7 hours |
| MCMC (full) | 500,000 | 3 days |
An emulator trained on ~1,000 forward-model evaluations can predict in ~10 µs — making MCMC feasible in seconds rather than days.
Figure 14. Execution speed comparison on a logarithmic scale. The toy forward model evaluates in ~10² µs, the toy emulator in ~10³ µs (limited by the nearest-neighbour lookup), and BioSNICAR's real forward model requires ~5 × 10⁵ µs (~0.5 s). The real BioSNICAR emulator (PCA + MLP) achieves ~10 µs — a 50,000× speedup that makes MCMC-scale sampling (10⁵ evaluations) feasible in seconds.
BioSNICAR ships with a pre-trained default emulator for glacier ice with 8 parameters:
| Parameter | Range | Description |
|---|---|---|
rds |
50–5000 µm | Effective grain radius |
rho |
300–917 kg/m³ | Ice density |
dz |
0.01–10 m | Ice thickness |
solzen |
0–89° | Solar zenith angle |
black_carbon |
0–2000 ppb | BC concentration |
snow_algae |
0–50000 cells/mL | Snow algae |
glacier_algae |
0–50000 cells/mL | Glacier algae |
dust |
0–10000 ppb | Mineral dust |
Note: the default emulator is parameterised by (rds, rho) rather than SSA
for maximum flexibility. When using retrieve(), you can request SSA
directly — the retrieval system builds a composite SSA forward function
internally, avoiding the rds–ρ degeneracy (see Section 11).
# Load and use the default emulator (requires biosnicar):
from biosnicar.emulator import Emulator
emu = Emulator.load("data/emulators/glacier_ice_8_param_default.npz")
spectrum = emu.predict(rds=500, rho=500, black_carbon=50, ...)
verification = emu.verify(n_test=200)
BioSNICAR connection —
Emulator.load()is atbiosnicar/emulator.py:687,.predict()at line 449,.verify()at line 527. The default emulator file isdata/emulators/glacier_ice_8_param_default.npz.
Sometimes you need an emulator for a different surface type, parameter
range, or subset of parameters. The Emulator.build() API handles this:
emu = Emulator.build(
params={"rds": (100, 2000), "black_carbon": (0, 500), "solzen": (30, 70)},
n_samples=1000, # Number of Latin hypercube training samples
seed=42,
)
emu.save("my_emulator.npz")
The params dict maps parameter names to (min, max) bounds. Any
keyword accepted by run_model() can be used. Additional parameters
can be held fixed via keyword arguments (e.g., rho=600).
For SSA-based retrieval, you can build an emulator parameterised
directly by SSA — or use an (rds, rho) emulator and let retrieve()
handle the SSA mapping internally.
The key tradeoff is n_samples: more samples → better accuracy but longer build time (each sample requires one forward-model evaluation).
Figure 15. Verification of a custom 3-parameter toy emulator (SSA, BC, solzen) trained on 300 Latin hypercube samples. (Left) Emulator prediction (red dashed) vs ground truth (black) for a single test case (SSA = 8, BC = 100, solzen = 45°), with the shaded region showing the residual. (Right) Histogram of maximum absolute error across 100 unseen test cases; the red dashed line marks the median. Most errors are small, confirming that the emulator generalises beyond its training data.
retrieve()¶The retrieve() function is the main entry point for inversions. It
accepts:
| Argument | Required | Description |
|---|---|---|
observed |
Yes | Observed spectrum (480 values) or band dict |
parameters |
Yes | List of parameter names to retrieve (e.g. ["ssa", "black_carbon"]) |
emulator |
Yes | Trained Emulator object |
bounds |
No | Dict of (lo, hi) per parameter |
fixed_params |
No | Dict of fixed parameter values |
platform |
No | String → triggers band-mode retrieval |
uncertainty |
No | "hessian" or "mcmc" |
wavelength_mask |
No | Boolean array for spectral mode |
Best practice is to retrieve SSA rather than grain radius, because SSA is what the spectrum actually constrains (Section 11).
It returns a RetrievalResult with:
.best_fit — dict of retrieved parameter values.predicted_albedo — model spectrum at the optimum.cost — final cost value.uncertainty — dict of parameter uncertainties (if requested).derived — dict of derived quantities (e.g., SSA from rds + ρ).converged — whether the optimiser converged.chains — MCMC chains (if uncertainty="mcmc")BioSNICAR connection —
retrieve()is atbiosnicar/inverse/optimize.py:71. It dispatches tospectral_cost()(line 19 ofcost.py) orband_cost()(line 76) depending on whetherplatformis specified. Binary parameters likedirectcannot be optimised (guarded by_BINARY_PARAMSat line 26 ofoptimize.py).
Figure 16. Demonstration of the toy retrieve() function
recovering SSA and BC from a noisy 480-band observation. (Left)
Spectral fit: the observed spectrum (grey dots), the true spectrum
(green), and the retrieved model (red dashed) overlap closely.
(Right) Parameter comparison: side-by-side bars for truth (green)
and retrieved values (blue) for SSA and BC, showing small absolute
errors on both parameters.
Five progressively complex scenarios showing how the tools from Acts I–IV combine for practical remote-sensing applications.
The simplest retrieval: estimate SSA from a high-resolution spectrometer measurement (480 bands).
Setup:
ssaSSA is the best-constrained single parameter because it controls the depth of every NIR ice absorption band simultaneously (Section 11).
Figure 17. Single-parameter SSA retrieval across four surface types, from fresh snow (SSA = 50) to coarse glacier ice (SSA = 2). Each panel shows the noisy 480-band observation (grey dots), the true spectrum (green), and the retrieved fit (red dashed). Subplot titles give the true SSA and equivalent grain radius. The retrieval captures both the overall albedo level and the depth of near-infrared absorption features, with accuracy depending on the signal-to-noise ratio in the SSA-sensitive wavelength region.
Satellite data comes as band values, not continuous spectra. The retrieval must work in band space:
to_platform("sentinel2") converts to 13 bands.The key question: how many bands do we need? With only 13 observations,
we can't constrain many parameters — fixed_params is essential for
under-constrained problems.
Figure 18. Band-mode SSA and BC retrieval from synthetic Sentinel-2 observations. (Left) The 13 Sentinel-2 band values: green bars are the truth, red dots are the noisy observations used for retrieval. (Right) Retrieval error (%) vs the number of bands used (3, 7, or all 13). Adding more bands generally improves the SSA estimate by providing additional spectral constraints, though the improvement depends on which bands are included.
Two approaches to estimating parameter uncertainty:
Hessian (fast, Gaussian): Compute the curvature of the cost function at the optimum. The inverse Hessian gives the covariance matrix. Assumes the posterior is approximately Gaussian.
MCMC (rigorous, arbitrary shape): Markov Chain Monte Carlo samples the full posterior distribution. Captures non-Gaussian features (asymmetry, multimodality) but is slower.
In practice, Hessian is usually sufficient for well-constrained 1–2 parameter problems. MCMC is needed when degeneracies create non-Gaussian posteriors.
Figure 19. Two approaches to quantifying SSA uncertainty. (Left) Hessian-based Gaussian approximation: the inverse curvature of the cost function at the optimum defines a symmetric error bar (blue shading). (Centre) MCMC posterior histogram from 20,000 Metropolis-Hastings samples (after burn-in): the posterior is approximately Gaussian for this well-constrained single-parameter problem. (Right) Side-by-side comparison of 2σ error bars from both methods, showing they give consistent estimates when the posterior is unimodal.
As we saw in Section 7, satellite bands carry ~3–4 independent pieces of information. Attempting to retrieve more parameters than this leads to:
Rule of thumb: Retrieve at most 2–3 parameters; fix everything else using ancillary data (e.g., known solar geometry, reanalysis density).
Figure 20. The practical ceiling on multi-parameter retrieval. (Left) A 4-parameter retrieval (SSA, BC, glacier algae, solzen) from a single spectrum: at least one parameter shows > 20% error (red bars), demonstrating that the available spectral information cannot constrain all four simultaneously. (Right) A 2-parameter retrieval (SSA, BC) with glacier algae and solar zenith fixed to their true values: both parameters are recovered with much lower error, confirming that fixing known parameters is essential for accurate retrieval.
BioSNICAR connection — The
fixed_paramsargument toretrieve()(optimize.py:71) serves exactly this purpose. The_BINARY_PARAMSset (line 26) additionally preventsdirect(beam type) from being continuously optimised — it must be fixed.
Real-world monitoring involves:
Figure 21. Monitoring a simulated 90-day glacier melt season with two satellite platforms. (Top left) SSA evolution: the true trajectory (black) shows grain coarsening over time; Sentinel-2 (blue circles, 5-day revisit) and Landsat 8 (red triangles, 16-day revisit) retrievals track the trend. (Top right) Glacier algae bloom detection: algae appear after day 30 and are captured by both platforms. (Bottom left) CESM 2-band broadband albedo (VIS, NIR) shows the overall darkening trend but cannot resolve its cause. (Bottom right) Platform bias (L8 − S2 SSA) at coincident observation times, quantifying systematic differences between the two retrieval band sets.
This section requires BioSNICAR to be installed (pip install -e . from
the repository root). It demonstrates the complete end-to-end pipeline:
run_model() for a known surface.to_platform().If BioSNICAR is not installed, this section is skipped automatically.
Figure 22 (optional — requires BioSNICAR). Output of the full BioSNICAR adding-doubling forward model for a glacier surface (rds = 800 µm, ρ = 600 kg m⁻³, BC = 50 ppb). (Left) The 480-band spectral albedo showing all ice absorption features at their physically correct depths. (Right) The same spectrum convolved to Sentinel-2's 13 bands, ready for comparison with satellite observations.
| Act | Topic | Key takeaway |
|---|---|---|
| I | What Satellites See | Spectra → bands via flux-weighted SRF convolution |
| II | Bridging Spectra and Bands | ~3–4 independent DOF from 13 S2 bands |
| III | The Inverse Problem | Retrieve SSA (not rds); log-space; masking; regularisation |
| IV | Emulators | 10⁴× speedup enables practical retrieval |
| V | Scenarios | Fix what you know; 2–3 free params max |
| VI | Full Pipeline | run_model() → to_platform() → retrieve() |
Do you need per-band spectral detail?
├─ YES → Do you need SWIR coverage?
│ ├─ YES → Sentinel-2 (13 bands, 0.44–2.19 µm)
│ │ or Landsat 8 (7 bands, 0.44–2.20 µm)
│ └─ NO → Sentinel-3 OLCI (21 VIS-NIR bands, high revisit)
│ or MODIS (7 bands, daily global)
└─ NO → Broadband albedo for energy budget?
├─ YES → CESM2 2-band (VIS + NIR)
│ or MAR (4 bands) / HadCM3 (6 bands)
└─ NO → CESM RRTMG (14 bands, radiative transfer detail)
What to retrieve?
├─ SSA only → L-BFGS-B (fast, reliable)
├─ SSA + 1 impurity → Hybrid DE + L-BFGS-B (global → local)
├─ SSA + 2 impurities → Possible if bands are informative; test carefully
├─ 4+ params → Reduce! Fix some with ancillary data.
└─ Uncertainty?
├─ Quick estimate → Hessian (seconds)
└─ Full posterior → MCMC (minutes with emulator)
Always retrieve SSA, not grain radius. SSA is the optically meaningful variable; rds and ρ are degenerate (Section 11).
biosnicar/
├── drivers/run_model.py # Forward model entry point
├── bands/
│ ├── __init__.py # to_platform(), BandResult
│ ├── _core.py # WVL, srf_convolve(), interval_average()
│ ├── platforms/ # sentinel2.py, sentinel3.py, landsat8.py, modis.py
│ └── gcm/ # cesm.py, mar.py, hadcm3.py
├── emulator.py # Emulator.build(), .predict(), .verify()
└── inverse/
├── optimize.py # retrieve(), _LOG_SPACE_PARAMS
├── cost.py # spectral_cost(), band_cost()
└── result.py # RetrievalResult, _compute_ssa()