Towards the end of 2020, after a fumble-heavy day, we penned this post which looked at modeling fumble counts for a day of NFL football.
This post is a short follow up which evaluates fumble persistence. That is, are fumbles truly as random as they are regularly suggested to be?
In short, YES!
A look at the data
Using NFL regular season games, excluding overtime, we can see if there is a relationship between fumble counts by team for odd number and even number games. If there was a persistent relationship, i.e. if fumbles were unlikely random, then we should expect to see some reasonable relationship between the count of fumbles in the odd game sample and the even game sample.
The plot below shows just such a relationship. The dotted grey line represents a perfect relationship. However, as should be apparent, in hardly any season was there much resembling this strong upward linear relationship.
In some cases (especially 2015), it almost appears as though the relationship is negative–teams which fumbled more in the odd numbered games fumbled less in the even numbered games.
To put some numbers to the relationship, we offer up a few usual measures (Pearson’s correlation and R-squared), but we also bring along for the ride the newly conceived \(\xi\) (“Xi”) correlation coefficient. The introductory paper is quite cool!
This newer metric works to detect, asymmetrically, whether a variable \(Y\) is a function of a variable \(X\) in the case of \(\xi_n(X,Y)\). Importantly, the function need not be linear, which makes \(\xi\) correlation potentially very useful! If we want to detect symmetrically whether \(X\) or \(Y\) is a function of the other, we simply need to take \(\max\{\xi(X,Y),\xi(Y,X)\}\). As the paper states, this coefficient converges to 0 if \(X\) and \(Y\) are independent and to 1 if at least one of \(X\) and \(Y\) is a function of the other.
Season | Xi Corr | Pearson Cor | Rsq |
---|---|---|---|
2012 | 0.24 | 0.35 | 0.12 |
2013 | 0.07 | 0.28 | 0.08 |
2014 | -0.06 | 0.26 | 0.07 |
2015 | -0.23 | -0.19 | 0.04 |
2016 | 0.11 | -0.08 | 0.01 |
2017 | 0.11 | 0.20 | 0.04 |
2018 | -0.20 | -0.13 | 0.02 |
2019 | 0.00 | 0.15 | 0.02 |
2020 | 0.08 | 0.22 | 0.05 |
2021 | 0.04 | 0.22 | 0.05 |
As we can see, in no case do any of the measures of relationship rise very high. Fumbles, indeed, appear to be random!
For those curious, code is below.
library(tidyverse)
library(extrafont)
library(XICOR)
library(data.table)
theme_set(
theme_minimal(base_family = "Gill Sans MT") +
theme(axis.line = element_line(color = "black"),
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black"),
legend.position = "top",
panel.grid.minor = element_blank(),
plot.caption = element_text(hjust = 0)
)
)
# read data
fastRlink_root <- "https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_"
pbp <- rbindlist(
lapply(2012:2021, \(x) fread(paste0(fastRlink_root, x, ".csv.gz")))
) %>%
as_tibble()
# filter and select cols
fum_cols <- c("game_id", "game_date", "week", "home_team", "away_team", "posteam", "ydstogo", "down", "game_seconds_remaining",
"fumbled_1_team", "fumbled_1_player_name", "fumbled_2_team", "fumbled_2_player_name", "fumble_lost", "fumble_forced",
"fumble", "epa", "wpa", "season")
fumbles <- pbp[pbp$season_type == "REG" & !is.na(pbp$posteam) & !is.na(pbp$down) & pbp$qtr %in% 1:4, fum_cols]
# get fumble count and then join to full data set
fc_start <- fumbles %>%
group_by(season, week, fumbled_1_team, fumbled_2_team) %>%
count() %>%
pivot_longer(c(fumbled_1_team, fumbled_2_team), names_to = "indicator", values_to = "fumble_team") %>%
group_by(season, week, fumble_team) %>%
summarize(n = sum(n), .groups = "drop") %>%
filter(!is.na(fumble_team))
fc <- expand(fc_start, season, week, fumble_team) %>%
left_join(fc_start, by = c("season", "week", "fumble_team")) %>%
mutate(
n = replace_na(n, 0),
odd_wk = week %% 2
)
fc_split <- fc %>%
group_by(season, odd_wk, fumble_team) %>%
summarize(n = sum(n), .groups = "drop") %>%
pivot_wider(names_from = odd_wk, values_from = n, names_prefix = "odd_wk_")
ggplot(fc_split, aes(odd_wk_0, odd_wk_1)) +
geom_abline(slope = 1, lty = 2, color = "darkgrey") +
geom_point(color = "dodgerblue4") +
facet_wrap(~ season, ncol = 5) +
labs(
x = "Sum of Fumbles in Odd Weeks",
y = "Sum of Fumbles in Even Weeks",
title = "Fumble Persistence in the NFL?",
subtitle = "Count of fumbles in odd weeks vs. even weeks",
caption = "@isbrutussick | verbumdata.netlify.com\nSource: nflfastR"
) +
theme(
strip.text = element_text(face = "bold")
)
# look at relationship
rels <- fc_split %>%
group_by(season) %>%
summarize(
xicor = pmax(calculateXI(odd_wk_0, odd_wk_1), calculateXI(odd_wk_1, odd_wk_0)),
cor = cor(odd_wk_0, odd_wk_1),
rsq = cor ^ 2
) %>%
mutate(across(!c(season), ~ round(.x, 2))) %>%
setNames(c("Season", "Xi Corr", "Pearson Cor", "Rsq"))