If you are familiar with the topic, you know that there are a bunch of distributions in natural and engineered systems that seem to follow a power law. But ever since people have claimed to find power law distributions, other people have said, “Not so fast”. There are two persistent problems with power law distributions:
- They generally don’t fit the left part of the distribution, only the tail.
- They don’t fit the whole tail; in the extreme, the data usually drop off faster than a real power law.
On the other hand, a lognormal distribution often fits the left part of these distributions, but it’s not always good model for the tail. Well, what if there was another simple, well-known model that fits the entire distribution? I think there is, at least for some datasets: the Student-t distribution
For a long time I have recommended using CDFs to compare distributions. If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model. Now I want to amend my advice. CDFs give you a good view of the distribution between the 5th and 95th percentiles, but they are not as good for the tails.
In this post I’ll walk through an example of how to convert between currencies. A challenge is that the conversion rate is constantly changing. If you have historical data you’ll want the conversion to be based on what the exchange rate was at the time…For my example I’ll use the priceR package which provides an R interface to the exchangerate.host API. To limit the number of API hits required I first create a lookup table with all unique currency conversions and dates required and then use this table to convert between currencies
My conservative estimate is that if I bike to work in New York City most days over a twenty-year career, then I have a 2.4% chance of severe injury or death from my commute. This calculation is conservative in the sense that I think it’s an upper bound on my risk. It makes a lot of assumptions which may not hold in practice, but it is based on city-reported data, and I believe it is a reasonable first approximation.
Much of the discussion around these models has centered around nicely-behaved Gaussian-type data of the kind you’d analyze with OLS. For simplicity, we’ll stay close to that paradigm in this post. However, we’ve also benefited from the rise of multilevel models over the past few decades and it turns out both the change-score and ANCOVA models can be largely reproduced within a multilevel model framework. The goal of this post is to introduce the change-score and ANCOVA models, introduce their multilevel-model counterparts, and compare their behavior in a couple quick simulation studies. Spoiler alert: The multilevel variant of the ANCOVA model is the winner.
Seth Klarman: Opportunities and Pitfalls for Investors in 2022 Harvard Business School Interview