2021 Census Population Estimates

This post updates analysis performed last year when the Census released 2019-2020 population data. Now we have the COVID story told by the data collected during 2020-2021.

What Happened During COVID?

Let us begin with the COVID effect. To understand population flows during COVID, dear reader, we regressed the difference in population growth in 2019-2020 compared to 2020-2021, controlling for county population size and state. Below we’ve plotted the states which had statistically significant coefficients, one way or the other.

Interestingly, New York was the only state with a statistically significant negative coefficient. Very rough, but also indicative of the value of performing this at the county level. A state like California had a coefficient of 0.2%, reflecting in part chance (:) not the strongest signal) but also intra-state COVID flight patterns. Delaware was the other surprise to me, a state many forget exists. Otherwise the results are intuitive.

Updates at the County and Divisional Levels from Last Year

Many of the same stories are in tact. This year, TX continued to reign supreme with recent population growth and the Mountain Division outperformed in terms of median population county growth. St. Louis looks like it is hurting badly on the county plot, and the NYC story, we know from the COVID notes above, will likely reverse. San Francisco, well, that remains to be seen.

Worth noting, though, was the relative position changes for the bottom of the divisional splits. At the very bottom of the heap, the East North Central (Big 10 country; IL, IN, MI, OH, WI) switched places with the Mid-Atlantic (NY, NJ, PA) to drop into last place. The Central divisions saw shuffling too. Last year, the West South Central (AR, LA, OK, TX) was the fastest grower, followed by West North Central (IA, KS, MN, MO, ND, NE, SD), with the East South Central (AL, KY, MI, TN) bringing up the rear. This year, though, it was West South (thanks TX), East South (TN doing work), and then the West North (… sad).

Thus concludes our update. As ever, you can find all the data (and more), at this Census link.

library(data.table)
library(tidyverse)
library(ggrepel)
library(extrafont)
library(scales)
library(ggpmisc)
library(noncensus)
library(rcartocolor)

# set theme

theme_set(
  theme_minimal(base_family = "Gill Sans MT") +
    theme(axis.line = element_line(color = "black"),
          axis.text = element_text(color = "black"),
          axis.title = element_text(color = "black"),
          panel.grid.minor = element_blank(),
          plot.background = element_rect(fill = "white", color = "white"),
          plot.caption = element_text(hjust = 0)
    )
)

# here's the link to the census data and we import the states data from noncensus

states <- tibble(
  name = as.character(state.name),
  division = as.character(state.division),
  region = as.character(state.region)
)

import_census_old <- fread("https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv")
import_census <- fread("https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/counties/totals/co-est2021-alldata.csv")

# if the column appears in both, delete it from old unless it is in ID cols

old_cols <- colnames(import_census_old)[colnames(import_census_old) %in% colnames(import_census) & !colnames(import_census_old) %in% c("COUNTY", "STNAME", "CTYNAME", "STATE")]
import_census_old[, c(old_cols) := NULL]
import_census1 <- merge(import_census_old, import_census, by = c("COUNTY", "STNAME", "CTYNAME", "STATE"), all.x = T)

# clean to get one year and three year changes

cty_chg <- import_census1 %>% 
  select(
    county = CTYNAME, 
    state = STNAME, 
    `2015` = POPESTIMATE2015,
    `2016` = POPESTIMATE2016,
    `2017` = POPESTIMATE2017,
    `2018` = POPESTIMATE2018,
    `2019` = POPESTIMATE2019, 
    `2020` = POPESTIMATE2020,
    `2021` = POPESTIMATE2021
  ) %>% 
  pivot_longer(
    -c(county, state), 
    names_to = "year", 
    values_to = "pop_est"
  ) %>% 
  filter(county != state) %>% 
  arrange(year) %>% 
  group_by(county, state) %>% 
  summarize(
    pct_one = pop_est[year == "2021"] / pop_est[year == "2020"] - 1,
    pct_three = (pop_est[year == "2021"] / pop_est[year == "2018"]) ^ (1/3) - 1,
    pct_five = (pop_est[year == "2021"] / pop_est[year == "2016"]) ^ (1/5) - 1,
    pct20192020 = (pop_est[year == "2020"] / pop_est[year == "2019"]) ^ (1/5) - 1,
    pop2021 = pop_est[year == "2021"],
    .groups = "drop"
  )

# get summary stats & identify outliers

summary_stat <- cty_chg %>% 
  filter(pop2021 > 100000) %>% 
  summarize(across(c(pct_one, pct_five), median))

# create plot data

cty_chg_pdata <- cty_chg %>% 
  left_join(states, by = c("state" = "name")) %>% 
  mutate(county = str_to_title(county),
         cty_state = sprintf("%s (%s)", county, state.abb[match(state, state.name)])) %>% 
  filter(pop2021 > 100000)

# who chged the most mod


rd <- cty_chg_pdata %>% 
  mutate(up_down = pct_one - pct20192020)

mod <- lm(up_down ~ log(pop2021) + state, data = rd)

broom::tidy(mod) %>% 
  filter(p.value <= 0.05) %>% 
  filter(!term %in% c("(Intercept)", "log(pop2021)")) %>% 
  ggplot(aes(estimate, fct_reorder(term, estimate))) +
  geom_vline(xintercept = 0) +
  geom_col(fill = "#9aacbc") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) + 
  scale_y_discrete(labels = \(x) gsub("state", "", x, fixed = T)) +
  labs(
    x = "Estimated Swing in 1y Population Growth ('19-'20 vs. '20-'21)",
    y = "",
    title = "Which States Had Strong Signals for COVID Migration?",
    subtitle = "Significant State Coefficients, Controlling for Pop. Size, Comparing Change in Growth Rate '19-'20 vs. '20-'21",
    caption = "verbumdata.netlify.com\nSource: U.S. Census Bureau") 
  )

-0.004705*log(2)

-0.0051905*log(2)

# a look at recent growth vs 5y cagr

ggplot(cty_chg_pdata, aes(pct_one, pct_five, label = cty_state, color = region)) +
  geom_vline(xintercept = summary_stat$pct_one, lty = 2, color = "grey") +
  geom_hline(yintercept = summary_stat$pct_five, lty = 2, color = "grey") +
  geom_point(size = 3, alpha = 0.5) +
  stat_dens2d_filter(
    geom = "text_repel", 
    keep.fraction = 0.03, 
    size = 3, 
    force = 1,
    force_pull = 1,
    nudge_y = -2,
    family = "Gill Sans MT", 
    max.overlaps = 10,
    segment.alpha = 0.5,
    fontface = "bold") +
  scale_color_carto_d(palette = "Safe", direction = -1) +
  scale_x_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  scale_y_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  labs(x = "% Change Last Year",
       y = "5-Year CAGR %",
       title = 'U.S. County Population Growth: Most Recent vs. 5-Year CAGR',
       subtitle = 'Counties with > 100,000 People | - - - Dotted Lines Show Median Readings',
       color = "",
       caption = "verbumdata.netlify.com\nSource: U.S. Census Bureau") +
  theme(legend.position = "top")

# now for a finer parsing at the division breakout

ggplot(cty_chg_pdata, aes(pct_five, fct_reorder(division, pct_five, .fun = median), color = division)) +
  geom_vline(xintercept = 0, lty = 2) +
  geom_boxplot(size = 1, show.legend = FALSE) +
  scale_color_carto_d(palette = "Safe") +
  scale_x_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  labs(x = "Average Population Growth Rate 2016-2021",
       y = "",
       title = "Census Division Population Growth Distributions - 5 Year Avg Growth Rates",
       subtitle = "Counties with > 100,000 People | - - - Dotted Line at 0%",
       caption = "verbumdata.netlify.com\nSource: U.S. Census Bureau") +
  theme(plot.title.position = "plot",
        plot.caption.position = "plot")