Stable Isotope Mixing Models for Dietary Reconstruction

Understanding Mixing Models

Chronologies

11 April 2026

Understanding Mixing Models

Module 2

Module 2 — Learning outcomes

By the end of this session you should be able to:

Distinguish simple linear mixing models from Bayesian routed approaches
Explain why isotopic routing matters for collagen-based dietary reconstruction
Understand the ReSources workflow and its advantages over unrouted models

Why Simple Endpoint Models Are Not Enough

The two-endpoint interpolation approach

The simplest approach: linear interpolation between two known dietary endpoints.

\[ \% \text{ marine} = \frac{\delta^{13}C_{\text{consumer}} - \delta^{13}C_{\text{terrestrial}}}{\delta^{13}C_{\text{marine}} - \delta^{13}C_{\text{terrestrial}}} \times 100 \]

Appealing because:

Transparent algebra
Immediately interpretable
Useful pedagogical starting point

(Schoeninger & DeNiro 1984; Sealy et al. 1987)

Problems only become apparent when we plot a whole population — and add the second isotope axis.

Two-endpoint interpolation — visualised

Two-endpoint model in biplot space

Four failure modes — overview

Problem	Why it matters
Endpoint uncertainty	Measurement error ignored in a point estimate
Third-source problem	Unlisted sources make consumers appear impossible
Trophic level invisible	Carbon alone cannot distinguish diet quality
Non-uniqueness (equifinality)	Multiple source combos → same δ¹³C
Routing ignored	Collagen reflects protein C preferentially

The Endpoint Uncertainty Problem

When you use a simple linear mixing model to estimate % marine diet from, say, δ13C, the basic approach is: % marine = (sample − terrestrial endpoint) / (marine endpoint − terrestrial endpoint) The critical assumption buried in this is that the endpoints are known constants. In practice, they are not — they are themselves estimates. —

Four failure modes — detail

What the bootstrapping histogram shows

The bootstrapping exercise shown in the figure propagates endpoint uncertainty properly. On each iteration, it:

Draws a terrestrial endpoint value from its distribution (mean ± SD) Draws a marine endpoint value from its distribution Computes % marine for the target individual using those sampled endpoints Repeats hundreds or thousands of times

The resulting distribution of % marine estimates — ranging here from roughly 38-90% — reflects the true uncertainty in the dietary estimate given realistic endpoint variability. The median lands at 66%, but the spread is enormous.

This is particularly acute in radiocarbon-based dietary mixing models (e.g.marine reservoir determinations), where the marine ¹⁴C endpoint varies spatially and temporally, but the same logic applies to stable isotope two-source mixing models using δ¹³C or δ¹⁵N.

What mixing models add

Working in two or more isotope dimensions simultaneously constrains the solution space far more tightly than a single axis.

Multi-isotope — δ¹³C and δ¹⁵N used jointly.
All known sources included (not forced binary).
Posterior distributions — honest uncertainty, not hidden in a point estimate.
Full error propagation — source SD, TEF uncertainty, analytical error.
Routing (Generation 3 only) — protein vs. energy routing to collagen.

Two-endpoint interpolation is not wrong in all contexts — for a first pass with genuinely only two well-separated sources and tight endpoint distributions it can be useful. The problem is uncritical application to more complex real-world datasets, which in archaeology is almost always the case.

Linear Mixing Models vs. Bayesian Routed Approaches

Three generations of mixing models

Generation 1 (1980s–2000s)
Simple linear models

Hand calculations
IsoSource
✓ Transparent algebra
✗ Bulk-diet assumption
✗ Point estimates only

Generation 2 (2008–2015)
Bayesian unrouted

SIAR
Early MixSIAR
✓ Uncertainty quantified
✓ Priors incorporated
✗ Still bulk-diet

Generation 3 (2014–present)
Bayesian routed

FRUITS
ReSources
✓ Protein vs. energy routing
✓ Collagen-specific
✓ Full Bayesian uncertainty

Why routing matters

The routing problem

A population consuming 70% maize (C₄) by energy but only 20% animal protein will have bone collagen δ¹³C closer to the animal protein value than to the maize value — because collagen is synthesised from amino acids sourced preferentially from dietary protein.

Unrouted models (SIAR, early MixSIAR) would infer ~70% C₄ diet. ❌
Routed models (ReSources, FRUITS) correctly reconstruct the protein/energy balance. ✓

Bone collagen is not synthesised from your whole diet proportionally. It is built primarily from amino acids, and the body sources those amino acids preferentially from dietary protein rather than from carbohydrates or fats. This is because:

Carbohydrates and lipids are used predominantly for energy Amino acids from dietary protein are used preferentially for tissue synthesis, including collagen

So the isotopic signal in collagen reflects the protein routing from diet, not the total energy budget.

The Maize Example Unpacked Take a population eating:

70% of their calories from maize (a C₄ plant, so isotopically distinct, high δ¹³C) 20% of their protein from animal sources (isotopically different again)

An unrouted model assumes collagen δ¹³C simply reflects the weighted average of all dietary inputs by their proportional contribution — so it would see 70% C₄ signal and infer a heavily maize-based diet. But collagen doesn’t work that way. The amino acids that actually get incorporated into collagen come disproportionately from that 20% animal protein fraction, because that’s where the routing goes. The maize carbon, mostly burned for energy, leaves a much weaker imprint on collagen than its caloric dominance would suggest. The result is that the collagen δ¹³C sits much closer to the animal protein endpoint than to the maize endpoint — and a naive model catastrophically overestimates maize consumption.

MixSIAR: A Bayesian non-Routed Mixing Model

MixSIAR — key characteristics

MixSIAR is a fully Bayesian stable isotope mixing model framework implemented in R.

It estimates:

Posterior probability distributions of dietary source contributions
Uncertainty propagation from all sources (isotope variability, TEF, consumer variation)

Why MixSIAR over Simpler Approaches?

Feature	IsoSource	SIAR	MixSIAR
Uncertainty in sources	✗	✓	✓
Uncertainty in TEFs	✗	✓	✓
Residual + process error	✗	✗	✓
Individual-level effects	✗	Partial	✓
Covariates	✗	✗	✓
Full posterior output	✗	✓	✓
Model comparison (LOO-IC)	✗	✗	✓

MixSIAR is particularly valuable where trophic enrichment factor (TEF) uncertainty is substantial — as is typical in archaeological dietary reconstruction.

Data Requirements

Three input files are required (all .csv):

1. Consumer (mixture) data

One row per individual
Columns for each isotope ratio (e.g. d13C, d15N)
Optional: grouping factors, continuous covariates

2. Source data

Mean and SD of each isotope per source
Sample size per source
Can be aggregated from raw source data in R

3. Discrimination (TEF) data

Mean and SD of trophic enrichment per source per isotope
Often the greatest source of uncertainty in the model

R Workflow: Loading Data

library(MixSIAR)
library(R2jags)

# Load consumer (mixture) data
mix <- load_mix_data(
  filename     = "consumer.csv",
  iso_names    = c("d13C", "d15N"),
  factors      = NULL,
  fac_random   = NULL,
  fac_nested   = NULL,
  cont_effects = NULL
)

# Aggregate and load source data
source <- load_source_data(
  filename       = "sources.csv",
  source_factors = NULL,
  conc_dep       = FALSE,
  data_type      = "means",
  mix
)

# Load discrimination (TEF) data
d <- load_discr_data("discrimination.csv", mix)

Interpreting Outputs: Posterior Distributions

Dietary source contributions are expressed as full posterior probability distributions, not point estimates.

Key quantities to report:

95% credible interval (2.5–97.5%): uncertainty bounds
Distributions skewed toward zero are expected for minor dietary sources

ReSources: A Bayesian Routed Mixing Model

ReSources — key characteristics

Based on FRUITS Fernandes et al. (2016)

Feature	Detail
Framework	Bayesian inference
Routing	Protein vs. energy routing to collagen
Uncertainty	Full posterior distributions
Priors	Uninformative → strongly informative
Interface	R package; Shiny App
Target tissue	Bone/dentine collagen
Efficiency	Faster convergence than FRUITS

Key improvement on previous models: the routing concept

Key insight: A diet 70% maize by calories but 30% animal protein produces collagen reflecting more protein than maize.

Panel 1 — Dietary protein (g/day) This is the actual diet: 70% of protein grams coming from protein sources (meat, fish) and 30% from energy sources (cereals). Note this is already expressed as protein contribution, not total calories — so the energy dominance of maize is even more extreme than this panel suggests. Panel 2 — Collagen C contribution The carbon signal in collagen has shifted to 75% protein sources / 25% energy sources. So even though cereals contributed 30% of dietary protein, they only account for 25% of the carbon incorporated into collagen. The routing effect is already visible but relatively modest for carbon. Panel 3 — Collagen N contribution This is where the routing effect is most dramatic. Collagen nitrogen is 100% from protein sources, with zero contribution from the energy/carbohydrate fraction. This makes biochemical sense — nitrogen in collagen comes entirely from amino acids, which are sourced from dietary protein. Cereals and tubers, even when they contribute some protein, are metabolically treated as energy substrates and their nitrogen barely registers in collagen.

ReSources Workflow: A Practical Example

Dataset — late medieval urban population

Consumer data (bone collagen):

ID	δ¹³C (‰)	δ¹⁵N (‰)	Context
Burial 12	−19.8 ± 0.2	+10.5 ± 0.3	High-status
Burial 23	−20.1 ± 0.2	+9.8 ± 0.3	Common
Burial 34	−19.5 ± 0.2	+11.2 ± 0.3	Monastic

TEF values used:

Δ¹³C (protein-routed): +1.0 ± 0.3 ‰
Δ¹³C (energy-routed): +0.5 ± 0.5 ‰
Δ¹⁵N: +4.0 ± 1.0 ‰
λ (routing): 0.74 ± 0.08

(O’Connell et al. 2012; Styring et al. 2010)

Dietary source end-members

Protein sources (from faunal assemblage and historical records):

Source	δ¹³C (‰)	δ¹⁵N (‰)	Evidence
Terrestrial meat (cattle, sheep, pig)	−21.3 ± 0.7	+6.8 ± 1.1	Faunal (n = 45)
Freshwater fish	−24.2 ± 1.3	+12.5 ± 1.6	Fish bones (n = 12)
Marine fish (cod, herring)	−14.8 ± 0.9	+14.2 ± 1.4	Historical records
Legumes (peas, beans)	−26.1 ± 1.0	+2.5 ± 1.5	Charred seeds (n = 23)

Energy sources:

Source	δ¹³C (‰)	Evidence
C₃ cereals (wheat, barley)	−25.8 ± 0.9	Charred grain (n = 67)
Dairy products	−22.0 ± 1.2	Historical records; lipid residues

Step 3 — Interpret results

Example posterior for Burial 12 (high-status):

Protein sources:

Source	Median	95% CI
Terrestrial meat	0.42	[0.28, 0.58]
Freshwater fish	0.08	[0.01, 0.21]
Marine fish	0.35	[0.18, 0.53]
Legumes	0.15	[0.06, 0.27]

Energy sources:

Source	Median	95% CI
C₃ cereals	0.68	[0.52, 0.82]
Dairy products	0.32	[0.18, 0.48]

Interpretation: Mixed diet dominated by terrestrial + marine animal protein (~77% combined). High marine fish (35%) consistent with high social status and access to imported preserved fish.

Visualising posterior distributions

Limitations and Assumptions

Assumption 1 — All sources known and measured

The “unknown unknowns” problem

If you omit a source contributing 20% of dietary protein, the model will spuriously inflate the contributions of all included sources. There is no statistical test for this — it requires archaeological and zooarchaeological expertise.

Checklist before running ReSources:
☐ Faunal reference collection from the same site/period
☐ Botanically identified plant macrofossils with isotope measurements
☐ Evidence ruling out major unlisted sources (e.g., C₄ plants, marine mammals)

Assumption 2 — Isotopic distinctness of sources

When sources overlap: (1) add a third tracer (e.g., δ³⁴S), (2) pool similar sources, or (3) apply informative priors from zooarchaeological abundance.

Assumption 3 — Routing parameter (λ)

ReSources requires a prior distribution for λ from controlled feeding experiments:

Population	λ (mean ± SD)
Human adults	0.74 ± 0.08 (Fernandes et al. 2012)
Omnivores (pigs)	0.76 ± 0.07
Carnivores	≈ 0.85

Sensitivity analysis is mandatory

λ has a large effect on estimated dietary proportions. Always test alternative values (e.g., 0.70, 0.74, 0.80) and report the full range of plausible results.

Problem 4 — Equifinality

Three protein sources all at δ¹³C ≈ −21 ‰ but different δ¹⁵N:

Source	δ¹³C	δ¹⁵N
Sheep	−21.2	+6.0
Cattle	−20.8	+7.0
Pig	−21.5	+8.5

A consumer at δ¹³C = −20.2 ‰, δ¹⁵N = +10.8 ‰ is consistent with:

Solution A: 60% sheep, 20% cattle, 20% pig
Solution B: 20% sheep, 60% cattle, 20% pig
Solution C: 20% sheep, 20% cattle, 60% pig

ReSources handles this by reporting very wide posteriors — which is the correct answer, not a failure.

Problem 5 — Sample size effects

Source reference specimens:

Source type	Minimum n	Target n
Domesticated animals (low variability)	5	10–15
Wild animals (high variability)	10	15–20
Fish (very high variability)	15	20–30
Plants (moderate variability)	10	15–20

Consumer groups for comparison:

n ≥ 10 per group — exploratory analyses
n ≥ 20 per group — publication-quality comparisons

ReSources vs. Other Bayesian Models

Software comparison

Model	Routing	Framework	Key use case
ReSources	✓ Prot/energy	Bayesian	Archaeological bone collagen ✓
MixSIAR	✗ Bulk diet	Bayesian	Ecology; not ideal for collagen
SIAR	✗ Bulk diet	Bayesian	Legacy; superseded by MixSIAR
IsoSource	✗ Bulk diet	Linear (deterministic)	Teaching; fast exploration

Use MixSIAR (not ReSources) when analysing muscle tissue, blood, or other tissues where routing is less pronounced.

Learning Outcomes Checklist

Module 2 — what you should now be able to do

Explain why two-endpoint interpolation fails — name ≥ 4 specific problems
Distinguish simple linear / Bayesian unrouted / Bayesian routed models and explain why routing matters for collagen
Explain why SIAR/early MixSIAR can produce biased estimates for bone collagen
Critically evaluate a mixing model’s validity: are all sources known? isotopically distinct? λ appropriate?