Today I’d like to draw attention to a super cool dataset highlighted by the super cool data-focused newsletter, Data is Plural.
If this post increases awareness of Jeremy Singer-Vine’s fantastic Data is Plural newsletter by even one person, I consider that a victory. #Rstats enthusiasts might recognize the name from #tidytuesday. Last’s week’s Dairy data challenge was found via Data is Plural.
The highest honor I feel I can pay to Data is Plural is a little plot based on their data set. Let’s dive into the Foreign Gifts and Contracts Report from the Office of Federal Student Aid.
Here is the link to the data: Foreign Gifts and Contracts Report
As usual, we have to import libraries and set options.
library(tidyverse)
library(readxl)
library(janitor)
extrafont::loadfonts(device = "win")
theme_set(theme_minimal(base_family = "Gill Sans MT") +
theme(plot.subtitle = element_text(size = 9),
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black"),
panel.grid.minor = element_blank()))
For those unaware, until readxl
can process links directly, here is a handy way to read an .xls
or .xlsx
file into R
without saving it to a permanent directory.
# create temp file
temp <- tempfile()
# direct saving to tempfile
download.file("https://studentaid.ed.gov/sa/sites/default/files/fsawg/datacenter/library/ForeignGifts.xls",
mode = "wb",
destfile = temp)
# read in and clean names
fg_import <- read_xls(temp,
skip = 1) %>%
clean_names()
unlink(temp)
One could ask many questions of this data. My first, and simplest question, was to understand which universities received the most $ from non-United States citizens. This data includes both gifts and contracts so it makes sense to split out each category.
# find well heeled institutions (from foreign sources)
by_uni <- fg_import %>%
group_by(institution_name) %>%
summarize(total_received = sum(foreign_gift_amount),
contract_amt = sum(foreign_gift_amount[gift_type == "Contract"]),
gift_amt = sum(foreign_gift_amount[gift_type != "Contract"]),
n = n()) %>%
gather(gift_type, amt, contract_amt:gift_amt) %>%
mutate(gift_type = ifelse(gift_type == "gift_amt", "Gift", "Contract")) %>%
arrange(-total_received)
It isn’t the cleanest data frame but it will get the job done. Let’s plot the top 20.
ggplot(by_uni[1:40,], aes(fct_reorder(institution_name, total_received),
amt,
fill = gift_type)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
scale_fill_manual(values = c("lightsalmon", "slategrey")) +
labs(x = "",
y = "Foreign Gifts Since 2012",
title = "When it comes to foreign contracts & giving, the usual suspects are at the top",
subtitle = "Total gifts/contracts from foreign sources, Title IV-eligible domestic institutions since 2012",
fill = "Gift Type",
caption = "Source: Office of Federal Student Aid") +
geom_curve(aes(x = 16, y = 8e8, xend = 18, yend = 5.5e8),
color = "black",
size = 0.70,
ncp = 10,
arrow = arrow(type = "closed",
length = unit(0.25, "cm"))) +
annotate("text",
x = 15.3, y = 6e8,
label = str_wrap("MIT hits the top of the tables with gifts alone!", 25),
color = "black",
hjust = 0,
size = 3,
family = "Gill Sans MT") +
theme(plot.title = element_text(size = 9.5),
plot.subtitle = element_text(size = 8))
Not too much surprising here. The most interesting thing is the institutions which seem to be heavily contract dependent. My suspicion is that the majority of this comes from non-U.S. campuses. Doha’s Education City plays (or is it pays) home to Carnegie Melon, Texas A&M, Georgetown, Northwestern, and others. For that reason we see those universities on this list. NYU maintains a campus in Abu Dhabi and so earns its share of “contract” revenue.
Notably, though, MIT and Stanford make the top 20 with gifts alone!
I this post has motivated readers to investigate the Data is Plural newsletter. It is a terrific resource and should be maintained and harvested regularly.