Cohort-Component Model — Full Pipeline

Atlanta MSA Population Forecast 2025–2035 · Every step with formulas

Section 1 — Data loading and filtering
Load Excel workbook
Final_Data_for_Modeling.xlsx · 3 sheets
pop_estimate_components
Annual: YEAR, POP_ESTIMATE, BIRTHS, DEATHS,
INTERNATIONAL_MIG, DOMESTIC_MIG, NET_MIG, RESIDUAL
Coverage: 2000–2025
pop_by_agesex
Age × sex panel: YEAR (code 1–6), AGE (0–85),
TOT_POP, TOT_MALE, TOT_FEMALE
Coverage: 2019–2024 (codes → calendar years)
mortality_rate_2020_census
National 2020 Census: age, total resident population,
births, deaths by single year of age
Used for Gompertz mortality estimation
Filter and align
CBSA = 12060 (Atlanta MSA only) · YEAR codes mapped: code + 2018 → calendar year
Panel indexed by (YEAR, AGE) — 86 age groups × 5 years = 430 rows
Section 2 — Three parallel estimation pipelines (run independently, merged before projection)
Define age-group fertility weights wa
Ages 15–19: w = 0.025 (low, teen fertility)
Ages 20–24: w = 0.200
Ages 25–29: w = 0.300 (peak fertility)
Ages 30–34: w = 0.250
Ages 35–39: w = 0.150
Ages 40–44: w = 0.050
Ages 45–49: w = 0.025 (decline)
Normalise to probabilities pa
pa = wa / Σwa
Property: Σpa = 1

Expand 5-yr group probabilities to single-year ages within [15, 49]
(each age in a group inherits the group's pa)
Merge with female population Fa,t
Use Fa,2020 from pop_by_agesex (YEAR=2020)
Restrict to ages a ∈ [15, 49]

Weighted fertile population:
W = Σa (Fa · pa)

Higher-fertility ages (25–34) contribute more to W
Calibrate scaling factor k
k = Bobs / W

Bobs = observed births from pop_estimate_components

Metro estimate: k ≈ 0.325
National proxy check: k ≈ 0.334
Close agreement validates metro ≈ national fertility
Birth formula (used each projection year)
Bt = k · Σa∈[15,49] (Fa,t−1 · pa)

Bt assigned to P0,t (age 0 entry)
Fa,t−1 from previous year — births lag female pop by 1 yr
Compute observed death rates da
da = Deathsa / Popa

Source: 2020 Census national mortality table
Covers ages 0–75 from observed data
Metro mortality ≈ national mortality (reasonable for large MSA)
Log-transform (Gompertz linearisation)
Empirical mortality rises exponentially with age:
mx ≈ A·eβx (Gompertz law)

Taking log linearises:
log(da) = α + β·a

Allows OLS regression on log scale
OLS fit on ages 50–75
Fit: log(mx) = α̂ + β̂·x
Fitted parameters: β̂ = 0.0621, α̂ = −5.584

Interpretation: mortality doubles every
ln(2)/β̂ ≈ 11 years of age

Smoothed rate: m̂x = e(α̂ + β̂x)
Extrapolate ages 76–84 and 85+
Ages 76–84: predict via fitted Gompertz model
x = e(α̂ + β̂x) for x = 76, 77, …, 84

Age 85+ (open-ended group):
Use representative age = 88 in the model
85+ = e(α̂ + β̂·88)
Incremental cohort mortality m(x)
m(0) = m̂0
m(x) = m̂x − m̂x−1 for x ≥ 1

Captures marginal mortality risk increase from one age to the next
Used in cohort transition: Dx,t = m(x) · Px,t
Validation passed
Σx m(x)·Px,t ≈ observed total deaths
Years 2020–2024 all within acceptable range
Confirms Gompertz extrapolation is valid for this MSA
Compute annual migration differential
DIFFt = NET_MIGt + RESIDUALt

NET_MIG = domestic + international migration
RESIDUAL = unexplained pop change component

Historical range: ~22K (2011) to ~120K (2006)
Covers 2000–2024 (26 years)
Age-specific population weights wx,t
Total population per year:
Nt = Σx Px,t

Age weight:
wx,t = Px,t / Nt

Age-weighted migration:
Mx,t = wx,t · DIFFt

Distributes total migration proportionally by age share
Per-age mean and std (2000–2024)
x = E[Mx,t] across all historical years
σx = std[Mx,t] across all historical years

x used as baseline in projection
σx used to construct uncertainty scenarios

Future migration assumed to revert to historical mean
Female ratio for sex split
From 2021–2024 data, ages 15–49:
Female ratio = Σ TOT_FEMALE / Σ TOT_POP
≈ 0.511 (stable across years)

Applied each projection year:
Ft = 0.511 · Pt
Mt = Pt − Ft
Age-85+ geometric growth rate g85
Open-ended group cannot be modelled by cohort shift

Geometric trend from observed data:
g85 = (P85,2024/P85,2020)1/4 − 1

Applied each year:
P85,t = P85,t−1 · (1 + g85)
Merge all three pipelines into projection panel
Panel: (YEAR, AGE) → TOT_POP, TOT_FEMALE, mortality_rate, fertility_prop, avg_ageweighted_diff, std_ageweighted_diff
Historical years 2020–2024 populated · Future years 2025–2035 initialised as NaN, filled by loop
Section 3 — Annual projection loop
for t = 2025, 2026, … 2035 (iterate over all ages x = 0, 1, … 85)
Uses output of year t−1 as input · 5 steps per year
Step 1 — Births (age-0 cohort entry)
Bt = k · Σa∈[15,49] (Fa,t−1 · pa)
Set P0,t = Bt · female ratio determines F0,t and M0,t
Step 2 — Cohort aging (ages x = 1 to 84)
Px,t = Px−1,t−1 − m(x−1)·Px−1,t−1 + M̄x−1

Each cohort shifts forward one age · Deaths subtracted using incremental mortality m(x) · Mean migration added
Step 3 — Age 85+ (open-ended group)
P85,t = P85,t−1 · (1 + g85)
Geometric extrapolation — open interval cannot receive a cohort shift from age 84
Step 4 — Sex split
Fx,t = 0.511 · Px,t    Mx,t = Px,t − Fx,t
Applied uniformly across all ages · Ratio validated as stable (2021–2024)
Step 5 — Record deaths for this year
Dx,t = m(x) · Px,t
Stored for output and validation · Not fed back into next iteration (deaths already subtracted in Step 2)
↺ t = t + 1 → return to loop start if t ≤ 2035
Output of year t becomes input Px,t−1 for year t+1
Section 4 — Uncertainty scenarios (migration is the dominant source of variance)
Why migration drives uncertainty
Fertility and mortality patterns are relatively stable year-to-year
Migration fluctuated from ~22K (2011 recession trough) to ~120K (2006 boom) — a 5× swing
σx from the historical distribution is used to construct plausible future bounds
Very low
x − 2σx

Severe sustained out-migration scenario
2035 population: ~6.69M
Low
x − σx

Below-average migration
2035 population: ~6.88M
Base
x

Historical average migration
2035 population: ~7.07M
High
x + σx

Above-average migration
2035 population: ~7.26M
Very high
x + 2σx

Sustained high in-migration
2035 population: ~7.45M
Section 5 — Outputs
Population totals 2025–2035
Annual Σx Px,t for base and all scenarios
Base: 6.49M (2025) → 7.07M (2035)
Decade increase ≈ 580,000 (base scenario)
Saved to population_forecast.xlsx
Forecast fan chart
Historical Census (2000–2024) + base forecast
±1σ shaded band (dark)
±2σ shaded band (light)
Bands widen over time — compounding migration variance
Log mortality chart
Observed log(da) scatter vs age
Fitted Gompertz line: y = 0.0621x − 5.584
Extrapolated segment beyond age 75 shown
Validates β̂ = 0.0621 (doubles every ≈11 yrs)
Key assumptions summary
· Fertility: metro ≈ national pattern (validated: kmetro ≈ knational)
· Mortality: metro ≈ national; Gompertz extrapolation for ages 76+ (validated vs. observed deaths)
· Migration: reverts to 2000–2024 historical mean; uncertainty from σ across that period
· Female ratio: stable at 0.511 through projection horizon
· Age 85+: geometric trend g85 from 2020–2024; no cohort-specific mortality modelled
10-year forecast complete — 2025–2035