4 min read

2020 Census Population Estimates

First post in some time. The inspiration for today’s drivel is the release of the provisional 2020 Census county level population estimates. If you are interested in playing with the data you can find it all (and much more) here.

My goal here is to provide a brief review of population dynamics at the county level. Code at the bottom for the curious.

Top Takeaway

Texas continues to reign supreme with counties across the state sporting top decile population growth over the past year and over the past five years. This is all the more impressive when you consider the wild volatility of the natural resource complex over that time.

Also worth noting is the broad strength of the South–of which more below.

Finally, you’ll no doubt notice San Francisco County in the least desirable quadrant (low left, i.e. below median population growth in both time frames).

You might skeptically chortle, “Sure, people have left SF for neighboring counties!” But it is not so.

In fact, neighboring counties San Mateo, Santa Clara, and Santa Cruz, and Marin are all in the top 10 for lowest growth rate (decline, actually) over the last year and over the past 5 years.

Where is California growing? The top county in both time periods is Placer, which borders Nevada on the east and is close to the growth engine that is Reno/Carson City. San Joaquin–the only “Bay Area Adjacent” major grower–and Riverside round out the top 3.

What else? Divisional Differences Dominate

County level data is interesting, no doubt, but given the color patterns on the plot above, a closer look at Regional/Divisional growth patterns made sense. To avoid the basis of COVID, I’m looking here at 5Y CAGRs.

Here the trends emerge more starkly. The Southern and the Mountain divisions are posting impressive numbers! Meanwhile, the Northeast (New England and the Mid Atlantic) are sucking wind.

That’s all for today folks. A brief review of the newest Census Population data, which, mind you, will likely be revised substantially in time, shows clear dominance of the West (ex. California) and the South in terms of growth.

Post Script

For reference here are the official Census Regions/Divisions:

Code for the interested

# load up libraries

library(data.table)
library(tidyverse)
library(ggrepel)
library(extrafont)
library(scales)
library(ggpmisc)
library(noncensus)
library(rcartocolor)

# set theme

theme_set(
  theme_minimal(base_family = "Gill Sans MT") +
    theme(axis.line = element_line(color = "black"),
          axis.text = element_text(color = "black"),
          axis.title = element_text(color = "black"),
          panel.grid.minor = element_blank(),
          plot.caption = element_text(hjust = 0)
    )
)

# here's the link to the census data and we import the states data from noncensus

data(states)
import_census <- fread("https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv")

# clean to get one year and three year changes

cty_chg <- import_census %>% 
  select(
    county = CTYNAME, 
    state = STNAME, 
    `2015` = POPESTIMATE2015,
    `2017` = POPESTIMATE2017,
    `2019` = POPESTIMATE2019, 
    `2020` = POPESTIMATE2020
    ) %>% 
  pivot_longer(
    -c(county, state), 
    names_to = "year", 
    values_to = "pop_est"
    ) %>% 
  filter(county != state) %>% 
  arrange(year) %>% 
  group_by(county, state) %>% 
  summarize(
    pct_one = pop_est[year == "2020"] / pop_est[year == "2019"] - 1,
    pct_three = (pop_est[year == "2020"] / pop_est[year == "2017"]) ^ (1/3) - 1,
    pct_five = (pop_est[year == "2020"] / pop_est[year == "2015"]) ^ (1/5) - 1,
    pop2020 = pop_est[year == "2020"],
    .groups = "drop"
    )

# get summary stats & identify outliers

summary_stat <- cty_chg %>% 
  filter(pop2020 > 100000) %>% 
  summarize(across(c(pct_one, pct_five), median))

# create plot data

cty_chg_pdata <- cty_chg %>% 
  left_join(states, by = c("state" = "name")) %>% 
  mutate(county = str_to_title(county),
         cty_state = sprintf("%s (%s)", county, state.abb[match(state, state.name)])) %>% 
  filter(pop2020 > 100000)

# a look at recent growth vs 5y cagr

ggplot(cty_chg_pdata, aes(pct_one, pct_five, label = cty_state, color = region)) +
  geom_vline(xintercept = summary_stat$pct_one, lty = 2, color = "grey") +
  geom_hline(yintercept = summary_stat$pct_five, lty = 2, color = "grey") +
  geom_point(size = 3, alpha = 0.5) +
  stat_dens2d_filter(
    geom = "text_repel", 
    keep.fraction = 0.03, 
    size = 3, 
    force = 1,
    force_pull = 1,
    nudge_y = -2,
    family = "Gill Sans MT", 
    max.overlaps = 10,
    segment.alpha = 0.5,
    fontface = "bold") +
  scale_color_carto_d(palette = "Safe", direction = -1) +
  scale_x_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  scale_y_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  labs(x = "% Change Last Year",
       y = "5-Year CAGR %",
       title = 'U.S. County Population Growth: Most Recent vs. 5-Year CAGR',
       subtitle = 'Counties with > 100,000 People | - - - Dotted Lines Show Median Readings',
       color = "",
       caption = "verbumdata.netlify.com\nSource: U.S. Census Bureau") +
  theme(legend.position = "top")

# now for a finer parsing at the division breakout

ggplot(cty_chg_pdata, aes(pct_five, fct_reorder(division, pct_five, .fun = median), color = division)) +
  geom_vline(xintercept = 0, lty = 2) +
  geom_boxplot(size = 1, show.legend = FALSE) +
  scale_color_carto_d(palette = "Safe") +
  scale_x_continuous(labels = percent_format(accuracy = 1), breaks = pretty_breaks(10)) +
  labs(x = "Average Population Growth Rate 2015-2020",
       y = "",
       title = "Census Division Population Growth Distributions - 5 Year Avg Growth Rates",
       subtitle = "Counties with > 100,000 People | - - - Dotted Line at 0%",
       caption = "verbumdata.netlify.com\nSource: U.S. Census Bureau") +
  theme(plot.title.position = "plot",
        plot.caption.position = "plot")