Understanding Mixing Models
11 April 2026
Module 2
By the end of this session you should be able to:
The simplest approach: linear interpolation between two known dietary endpoints.
\[ \% \text{ marine} = \frac{\delta^{13}C_{\text{consumer}} - \delta^{13}C_{\text{terrestrial}}}{\delta^{13}C_{\text{marine}} - \delta^{13}C_{\text{terrestrial}}} \times 100 \]
Appealing because:
(Schoeninger & DeNiro 1984; Sealy et al. 1987)
Problems only become apparent when we plot a whole population — and add the second isotope axis.
| Problem | Why it matters |
|---|---|
| Endpoint uncertainty | Measurement error ignored in a point estimate |
| Third-source problem | Unlisted sources make consumers appear impossible |
| Trophic level invisible | Carbon alone cannot distinguish diet quality |
| Non-uniqueness (equifinality) | Multiple source combos → same δ¹³C |
| Routing ignored | Collagen reflects protein C preferentially |
The Endpoint Uncertainty Problem
When you use a simple linear mixing model to estimate % marine diet from, say, δ13C, the basic approach is: % marine = (sample − terrestrial endpoint) / (marine endpoint − terrestrial endpoint) The critical assumption buried in this is that the endpoints are known constants. In practice, they are not — they are themselves estimates. —
The bootstrapping exercise shown in the figure propagates endpoint uncertainty properly. On each iteration, it:
Draws a terrestrial endpoint value from its distribution (mean ± SD) Draws a marine endpoint value from its distribution Computes % marine for the target individual using those sampled endpoints Repeats hundreds or thousands of times
The resulting distribution of % marine estimates — ranging here from roughly 38-90% — reflects the true uncertainty in the dietary estimate given realistic endpoint variability. The median lands at 66%, but the spread is enormous.
This is particularly acute in radiocarbon-based dietary mixing models (e.g.marine reservoir determinations), where the marine ¹⁴C endpoint varies spatially and temporally, but the same logic applies to stable isotope two-source mixing models using δ¹³C or δ¹⁵N.
Working in two or more isotope dimensions simultaneously constrains the solution space far more tightly than a single axis.
Two-endpoint interpolation is not wrong in all contexts — for a first pass with genuinely only two well-separated sources and tight endpoint distributions it can be useful. The problem is uncritical application to more complex real-world datasets, which in archaeology is almost always the case.
Generation 1 (1980s–2000s)
Simple linear models
Generation 2 (2008–2015)
Bayesian unrouted
Generation 3 (2014–present)
Bayesian routed
The routing problem
A population consuming 70% maize (C₄) by energy but only 20% animal protein will have bone collagen δ¹³C closer to the animal protein value than to the maize value — because collagen is synthesised from amino acids sourced preferentially from dietary protein.
Unrouted models (SIAR, early MixSIAR) would infer ~70% C₄ diet. ❌
Routed models (ReSources, FRUITS) correctly reconstruct the protein/energy balance. ✓
Bone collagen is not synthesised from your whole diet proportionally. It is built primarily from amino acids, and the body sources those amino acids preferentially from dietary protein rather than from carbohydrates or fats. This is because:
Carbohydrates and lipids are used predominantly for energy Amino acids from dietary protein are used preferentially for tissue synthesis, including collagen
So the isotopic signal in collagen reflects the protein routing from diet, not the total energy budget.
The Maize Example Unpacked Take a population eating:
70% of their calories from maize (a C₄ plant, so isotopically distinct, high δ¹³C) 20% of their protein from animal sources (isotopically different again)
An unrouted model assumes collagen δ¹³C simply reflects the weighted average of all dietary inputs by their proportional contribution — so it would see 70% C₄ signal and infer a heavily maize-based diet. But collagen doesn’t work that way. The amino acids that actually get incorporated into collagen come disproportionately from that 20% animal protein fraction, because that’s where the routing goes. The maize carbon, mostly burned for energy, leaves a much weaker imprint on collagen than its caloric dominance would suggest. The result is that the collagen δ¹³C sits much closer to the animal protein endpoint than to the maize endpoint — and a naive model catastrophically overestimates maize consumption.
MixSIAR is a fully Bayesian stable isotope mixing model framework implemented in R.
It estimates:
| Feature | IsoSource | SIAR | MixSIAR |
|---|---|---|---|
| Uncertainty in sources | ✗ | ✓ | ✓ |
| Uncertainty in TEFs | ✗ | ✓ | ✓ |
| Residual + process error | ✗ | ✗ | ✓ |
| Individual-level effects | ✗ | Partial | ✓ |
| Covariates | ✗ | ✗ | ✓ |
| Full posterior output | ✗ | ✓ | ✓ |
| Model comparison (LOO-IC) | ✗ | ✗ | ✓ |
MixSIAR is particularly valuable where trophic enrichment factor (TEF) uncertainty is substantial — as is typical in archaeological dietary reconstruction.
Three input files are required (all .csv):
1. Consumer (mixture) data
d13C, d15N)2. Source data
3. Discrimination (TEF) data
library(MixSIAR)
library(R2jags)
# Load consumer (mixture) data
mix <- load_mix_data(
filename = "consumer.csv",
iso_names = c("d13C", "d15N"),
factors = NULL,
fac_random = NULL,
fac_nested = NULL,
cont_effects = NULL
)
# Aggregate and load source data
source <- load_source_data(
filename = "sources.csv",
source_factors = NULL,
conc_dep = FALSE,
data_type = "means",
mix
)
# Load discrimination (TEF) data
d <- load_discr_data("discrimination.csv", mix)Dietary source contributions are expressed as full posterior probability distributions, not point estimates.
Key quantities to report:
Based on FRUITS Fernandes et al. (2016)
| Feature | Detail |
|---|---|
| Framework | Bayesian inference |
| Routing | Protein vs. energy routing to collagen |
| Uncertainty | Full posterior distributions |
| Priors | Uninformative → strongly informative |
| Interface | R package; Shiny App |
| Target tissue | Bone/dentine collagen |
| Efficiency | Faster convergence than FRUITS |
Key insight: A diet 70% maize by calories but 30% animal protein produces collagen reflecting more protein than maize.
Consumer data (bone collagen):
| ID | δ¹³C (‰) | δ¹⁵N (‰) | Context |
|---|---|---|---|
| Burial 12 | −19.8 ± 0.2 | +10.5 ± 0.3 | High-status |
| Burial 23 | −20.1 ± 0.2 | +9.8 ± 0.3 | Common |
| Burial 34 | −19.5 ± 0.2 | +11.2 ± 0.3 | Monastic |
TEF values used:
(O’Connell et al. 2012; Styring et al. 2010)
Protein sources (from faunal assemblage and historical records):
| Source | δ¹³C (‰) | δ¹⁵N (‰) | Evidence |
|---|---|---|---|
| Terrestrial meat (cattle, sheep, pig) | −21.3 ± 0.7 | +6.8 ± 1.1 | Faunal (n = 45) |
| Freshwater fish | −24.2 ± 1.3 | +12.5 ± 1.6 | Fish bones (n = 12) |
| Marine fish (cod, herring) | −14.8 ± 0.9 | +14.2 ± 1.4 | Historical records |
| Legumes (peas, beans) | −26.1 ± 1.0 | +2.5 ± 1.5 | Charred seeds (n = 23) |
Energy sources:
| Source | δ¹³C (‰) | Evidence |
|---|---|---|
| C₃ cereals (wheat, barley) | −25.8 ± 0.9 | Charred grain (n = 67) |
| Dairy products | −22.0 ± 1.2 | Historical records; lipid residues |
Example posterior for Burial 12 (high-status):
Protein sources:
| Source | Median | 95% CI |
|---|---|---|
| Terrestrial meat | 0.42 | [0.28, 0.58] |
| Freshwater fish | 0.08 | [0.01, 0.21] |
| Marine fish | 0.35 | [0.18, 0.53] |
| Legumes | 0.15 | [0.06, 0.27] |
Energy sources:
| Source | Median | 95% CI |
|---|---|---|
| C₃ cereals | 0.68 | [0.52, 0.82] |
| Dairy products | 0.32 | [0.18, 0.48] |
Interpretation: Mixed diet dominated by terrestrial + marine animal protein (~77% combined). High marine fish (35%) consistent with high social status and access to imported preserved fish.
The “unknown unknowns” problem
If you omit a source contributing 20% of dietary protein, the model will spuriously inflate the contributions of all included sources. There is no statistical test for this — it requires archaeological and zooarchaeological expertise.
Checklist before running ReSources:
☐ Faunal reference collection from the same site/period
☐ Botanically identified plant macrofossils with isotope measurements
☐ Evidence ruling out major unlisted sources (e.g., C₄ plants, marine mammals)
When sources overlap: (1) add a third tracer (e.g., δ³⁴S), (2) pool similar sources, or (3) apply informative priors from zooarchaeological abundance.
ReSources requires a prior distribution for λ from controlled feeding experiments:
| Population | λ (mean ± SD) |
|---|---|
| Human adults | 0.74 ± 0.08 (Fernandes et al. 2012) |
| Omnivores (pigs) | 0.76 ± 0.07 |
| Carnivores | ≈ 0.85 |
Sensitivity analysis is mandatory
λ has a large effect on estimated dietary proportions. Always test alternative values (e.g., 0.70, 0.74, 0.80) and report the full range of plausible results.
Three protein sources all at δ¹³C ≈ −21 ‰ but different δ¹⁵N:
| Source | δ¹³C | δ¹⁵N |
|---|---|---|
| Sheep | −21.2 | +6.0 |
| Cattle | −20.8 | +7.0 |
| Pig | −21.5 | +8.5 |
A consumer at δ¹³C = −20.2 ‰, δ¹⁵N = +10.8 ‰ is consistent with:
ReSources handles this by reporting very wide posteriors — which is the correct answer, not a failure.
Source reference specimens:
| Source type | Minimum n | Target n |
|---|---|---|
| Domesticated animals (low variability) | 5 | 10–15 |
| Wild animals (high variability) | 10 | 15–20 |
| Fish (very high variability) | 15 | 20–30 |
| Plants (moderate variability) | 10 | 15–20 |
Consumer groups for comparison:
| Model | Routing | Framework | Key use case |
|---|---|---|---|
| ReSources | ✓ Prot/energy | Bayesian | Archaeological bone collagen ✓ |
| MixSIAR | ✗ Bulk diet | Bayesian | Ecology; not ideal for collagen |
| SIAR | ✗ Bulk diet | Bayesian | Legacy; superseded by MixSIAR |
| IsoSource | ✗ Bulk diet | Linear (deterministic) | Teaching; fast exploration |
Use MixSIAR (not ReSources) when analysing muscle tissue, blood, or other tissues where routing is less pronounced.
Module 2: Understanding Mixing Models