Stable Isotope Mixing Models for Dietary Reconstruction

Understanding Mixing Models

Chronologies

11 April 2026

Understanding Mixing Models

Module 2

Module 2 — Learning outcomes

By the end of this session you should be able to:

  • Distinguish simple linear mixing models from Bayesian routed approaches
  • Explain why isotopic routing matters for collagen-based dietary reconstruction
  • Understand the ReSources workflow and its advantages over unrouted models

Why Simple Endpoint Models Are Not Enough

The two-endpoint interpolation approach

The simplest approach: linear interpolation between two known dietary endpoints.

\[ \% \text{ marine} = \frac{\delta^{13}C_{\text{consumer}} - \delta^{13}C_{\text{terrestrial}}}{\delta^{13}C_{\text{marine}} - \delta^{13}C_{\text{terrestrial}}} \times 100 \]

Appealing because:

  • Transparent algebra
  • Immediately interpretable
  • Useful pedagogical starting point

(Schoeninger & DeNiro 1984; Sealy et al. 1987)

Problems only become apparent when we plot a whole population — and add the second isotope axis.

Two-endpoint interpolation — visualised

Figure 1

Two-endpoint model in biplot space

Figure 2

Four failure modes — overview

Problem Why it matters
Endpoint uncertainty Measurement error ignored in a point estimate
Third-source problem Unlisted sources make consumers appear impossible
Trophic level invisible Carbon alone cannot distinguish diet quality
Non-uniqueness (equifinality) Multiple source combos → same δ¹³C
Routing ignored Collagen reflects protein C preferentially


The Endpoint Uncertainty Problem

When you use a simple linear mixing model to estimate % marine diet from, say, δ13C, the basic approach is: % marine = (sample − terrestrial endpoint) / (marine endpoint − terrestrial endpoint) The critical assumption buried in this is that the endpoints are known constants. In practice, they are not — they are themselves estimates. —

Four failure modes — detail

Figure 3

What the bootstrapping histogram shows

The bootstrapping exercise shown in the figure propagates endpoint uncertainty properly. On each iteration, it:

Draws a terrestrial endpoint value from its distribution (mean ± SD) Draws a marine endpoint value from its distribution Computes % marine for the target individual using those sampled endpoints Repeats hundreds or thousands of times

The resulting distribution of % marine estimates — ranging here from roughly 38-90% — reflects the true uncertainty in the dietary estimate given realistic endpoint variability. The median lands at 66%, but the spread is enormous.

This is particularly acute in radiocarbon-based dietary mixing models (e.g.marine reservoir determinations), where the marine ¹⁴C endpoint varies spatially and temporally, but the same logic applies to stable isotope two-source mixing models using δ¹³C or δ¹⁵N.

What mixing models add

Working in two or more isotope dimensions simultaneously constrains the solution space far more tightly than a single axis.

  1. Multi-isotope — δ¹³C and δ¹⁵N used jointly.
  2. All known sources included (not forced binary).
  3. Posterior distributions — honest uncertainty, not hidden in a point estimate.
  4. Full error propagation — source SD, TEF uncertainty, analytical error.
  5. Routing (Generation 3 only) — protein vs. energy routing to collagen.

Two-endpoint interpolation is not wrong in all contexts — for a first pass with genuinely only two well-separated sources and tight endpoint distributions it can be useful. The problem is uncritical application to more complex real-world datasets, which in archaeology is almost always the case.

Linear Mixing Models vs. Bayesian Routed Approaches

Three generations of mixing models

Generation 1 (1980s–2000s)
Simple linear models

  • Hand calculations
  • IsoSource
  • ✓ Transparent algebra
  • ✗ Bulk-diet assumption
  • ✗ Point estimates only

Generation 2 (2008–2015)
Bayesian unrouted

  • SIAR
  • Early MixSIAR
  • ✓ Uncertainty quantified
  • ✓ Priors incorporated
  • ✗ Still bulk-diet

Generation 3 (2014–present)
Bayesian routed

  • FRUITS
  • ReSources
  • ✓ Protein vs. energy routing
  • ✓ Collagen-specific
  • ✓ Full Bayesian uncertainty

Why routing matters

The routing problem

A population consuming 70% maize (C₄) by energy but only 20% animal protein will have bone collagen δ¹³C closer to the animal protein value than to the maize value — because collagen is synthesised from amino acids sourced preferentially from dietary protein.

Unrouted models (SIAR, early MixSIAR) would infer ~70% C₄ diet. ❌
Routed models (ReSources, FRUITS) correctly reconstruct the protein/energy balance. ✓

Bone collagen is not synthesised from your whole diet proportionally. It is built primarily from amino acids, and the body sources those amino acids preferentially from dietary protein rather than from carbohydrates or fats. This is because:

Carbohydrates and lipids are used predominantly for energy Amino acids from dietary protein are used preferentially for tissue synthesis, including collagen

So the isotopic signal in collagen reflects the protein routing from diet, not the total energy budget.

The Maize Example Unpacked Take a population eating:

70% of their calories from maize (a C₄ plant, so isotopically distinct, high δ¹³C) 20% of their protein from animal sources (isotopically different again)

An unrouted model assumes collagen δ¹³C simply reflects the weighted average of all dietary inputs by their proportional contribution — so it would see 70% C₄ signal and infer a heavily maize-based diet. But collagen doesn’t work that way. The amino acids that actually get incorporated into collagen come disproportionately from that 20% animal protein fraction, because that’s where the routing goes. The maize carbon, mostly burned for energy, leaves a much weaker imprint on collagen than its caloric dominance would suggest. The result is that the collagen δ¹³C sits much closer to the animal protein endpoint than to the maize endpoint — and a naive model catastrophically overestimates maize consumption.

MixSIAR: A Bayesian non-Routed Mixing Model

MixSIAR — key characteristics

MixSIAR is a fully Bayesian stable isotope mixing model framework implemented in R.

It estimates:

  • Posterior probability distributions of dietary source contributions
  • Uncertainty propagation from all sources (isotope variability, TEF, consumer variation)

Why MixSIAR over Simpler Approaches?

Feature IsoSource SIAR MixSIAR
Uncertainty in sources
Uncertainty in TEFs
Residual + process error
Individual-level effects Partial
Covariates
Full posterior output
Model comparison (LOO-IC)

MixSIAR is particularly valuable where trophic enrichment factor (TEF) uncertainty is substantial — as is typical in archaeological dietary reconstruction.

Data Requirements

Three input files are required (all .csv):

1. Consumer (mixture) data

  • One row per individual
  • Columns for each isotope ratio (e.g. d13C, d15N)
  • Optional: grouping factors, continuous covariates

2. Source data

  • Mean and SD of each isotope per source
  • Sample size per source
  • Can be aggregated from raw source data in R

3. Discrimination (TEF) data

  • Mean and SD of trophic enrichment per source per isotope
  • Often the greatest source of uncertainty in the model

R Workflow: Loading Data

library(MixSIAR)
library(R2jags)

# Load consumer (mixture) data
mix <- load_mix_data(
  filename     = "consumer.csv",
  iso_names    = c("d13C", "d15N"),
  factors      = NULL,
  fac_random   = NULL,
  fac_nested   = NULL,
  cont_effects = NULL
)

# Aggregate and load source data
source <- load_source_data(
  filename       = "sources.csv",
  source_factors = NULL,
  conc_dep       = FALSE,
  data_type      = "means",
  mix
)

# Load discrimination (TEF) data
d <- load_discr_data("discrimination.csv", mix)

Interpreting Outputs: Posterior Distributions

Dietary source contributions are expressed as full posterior probability distributions, not point estimates.

Key quantities to report:

  • 95% credible interval (2.5–97.5%): uncertainty bounds
  • Distributions skewed toward zero are expected for minor dietary sources

ReSources: A Bayesian Routed Mixing Model

ReSources — key characteristics

Based on FRUITS Fernandes et al. (2016)

Feature Detail
Framework Bayesian inference
Routing Protein vs. energy routing to collagen
Uncertainty Full posterior distributions
Priors Uninformative → strongly informative
Interface R package; Shiny App
Target tissue Bone/dentine collagen
Efficiency Faster convergence than FRUITS

Key improvement on previous models: the routing concept

Figure 4

Key insight: A diet 70% maize by calories but 30% animal protein produces collagen reflecting more protein than maize.

ReSources Workflow: A Practical Example

Dataset — late medieval urban population

Consumer data (bone collagen):

ID δ¹³C (‰) δ¹⁵N (‰) Context
Burial 12 −19.8 ± 0.2 +10.5 ± 0.3 High-status
Burial 23 −20.1 ± 0.2 +9.8 ± 0.3 Common
Burial 34 −19.5 ± 0.2 +11.2 ± 0.3 Monastic

TEF values used:

  • Δ¹³C (protein-routed): +1.0 ± 0.3 ‰
  • Δ¹³C (energy-routed): +0.5 ± 0.5 ‰
  • Δ¹⁵N: +4.0 ± 1.0 ‰
  • λ (routing): 0.74 ± 0.08

(O’Connell et al. 2012; Styring et al. 2010)

Dietary source end-members

Protein sources (from faunal assemblage and historical records):

Source δ¹³C (‰) δ¹⁵N (‰) Evidence
Terrestrial meat (cattle, sheep, pig) −21.3 ± 0.7 +6.8 ± 1.1 Faunal (n = 45)
Freshwater fish −24.2 ± 1.3 +12.5 ± 1.6 Fish bones (n = 12)
Marine fish (cod, herring) −14.8 ± 0.9 +14.2 ± 1.4 Historical records
Legumes (peas, beans) −26.1 ± 1.0 +2.5 ± 1.5 Charred seeds (n = 23)

Energy sources:

Source δ¹³C (‰) Evidence
C₃ cereals (wheat, barley) −25.8 ± 0.9 Charred grain (n = 67)
Dairy products −22.0 ± 1.2 Historical records; lipid residues

Step 3 — Interpret results

Example posterior for Burial 12 (high-status):

Protein sources:

Source Median 95% CI
Terrestrial meat 0.42 [0.28, 0.58]
Freshwater fish 0.08 [0.01, 0.21]
Marine fish 0.35 [0.18, 0.53]
Legumes 0.15 [0.06, 0.27]

Energy sources:

Source Median 95% CI
C₃ cereals 0.68 [0.52, 0.82]
Dairy products 0.32 [0.18, 0.48]

Interpretation: Mixed diet dominated by terrestrial + marine animal protein (~77% combined). High marine fish (35%) consistent with high social status and access to imported preserved fish.

Visualising posterior distributions

Figure 5

Limitations and Assumptions

Assumption 1 — All sources known and measured

The “unknown unknowns” problem

If you omit a source contributing 20% of dietary protein, the model will spuriously inflate the contributions of all included sources. There is no statistical test for this — it requires archaeological and zooarchaeological expertise.

Checklist before running ReSources:
☐ Faunal reference collection from the same site/period
☐ Botanically identified plant macrofossils with isotope measurements
☐ Evidence ruling out major unlisted sources (e.g., C₄ plants, marine mammals)

Assumption 2 — Isotopic distinctness of sources

Figure 6

When sources overlap: (1) add a third tracer (e.g., δ³⁴S), (2) pool similar sources, or (3) apply informative priors from zooarchaeological abundance.

Assumption 3 — Routing parameter (λ)

ReSources requires a prior distribution for λ from controlled feeding experiments:

Population λ (mean ± SD)
Human adults 0.74 ± 0.08 (Fernandes et al. 2012)
Omnivores (pigs) 0.76 ± 0.07
Carnivores ≈ 0.85

Sensitivity analysis is mandatory

λ has a large effect on estimated dietary proportions. Always test alternative values (e.g., 0.70, 0.74, 0.80) and report the full range of plausible results.

Problem 4 — Equifinality

Three protein sources all at δ¹³C ≈ −21 ‰ but different δ¹⁵N:

Source δ¹³C δ¹⁵N
Sheep −21.2 +6.0
Cattle −20.8 +7.0
Pig −21.5 +8.5

A consumer at δ¹³C = −20.2 ‰, δ¹⁵N = +10.8 ‰ is consistent with:

  • Solution A: 60% sheep, 20% cattle, 20% pig
  • Solution B: 20% sheep, 60% cattle, 20% pig
  • Solution C: 20% sheep, 20% cattle, 60% pig

ReSources handles this by reporting very wide posteriors — which is the correct answer, not a failure.

Problem 5 — Sample size effects

Source reference specimens:

Source type Minimum n Target n
Domesticated animals (low variability) 5 10–15
Wild animals (high variability) 10 15–20
Fish (very high variability) 15 20–30
Plants (moderate variability) 10 15–20

Consumer groups for comparison:

  • n ≥ 10 per group — exploratory analyses
  • n ≥ 20 per group — publication-quality comparisons

ReSources vs. Other Bayesian Models

Software comparison

Model Routing Framework Key use case
ReSources ✓ Prot/energy Bayesian Archaeological bone collagen
MixSIAR ✗ Bulk diet Bayesian Ecology; not ideal for collagen
SIAR ✗ Bulk diet Bayesian Legacy; superseded by MixSIAR
IsoSource ✗ Bulk diet Linear (deterministic) Teaching; fast exploration

Use MixSIAR (not ReSources) when analysing muscle tissue, blood, or other tissues where routing is less pronounced.

Learning Outcomes Checklist

Module 2 — what you should now be able to do

  1. Explain why two-endpoint interpolation fails — name ≥ 4 specific problems
  2. Distinguish simple linear / Bayesian unrouted / Bayesian routed models and explain why routing matters for collagen
  3. Explain why SIAR/early MixSIAR can produce biased estimates for bone collagen
  4. Critically evaluate a mixing model’s validity: are all sources known? isotopically distinct? λ appropriate?