Building a Bayesian Model: Part 5

In Part 4, we took our simple ballast model and re-derived it as a Bayesian model complete with priors, posteriors, hyperparameters, loglikelihoods…the whole shebang. We ended with an interpretation of the prior alpha parameter as a measure of both the signal to noise ratio in the emerging data and the quality of our prior. Of course, the global average as a prior is not great…can we come up with a better one?

Let’s think again about the Miami example. The goal is to predict the 3-point percentage allowed by Miami in this game. We’ve spent 4 articles and 4200 words on how to use Miami’s season-to-date 3 point % allowed as a predictor. To the extent that the season-to-date stats can’t be used because they’re too noisy, we’ve relied on the global average to fill in the gap. Every other piece of information other than the season-to-date stats and the global average is being ignored. Let’s focus on two of those pieces of information.

First, we can look at the team’s 3 point % allowed in the prior season. Teams will generally experience some turnover in their rosters from season to season, but there might be enough similarity that there’s some predictive value? It’s a full year number so there are no sample size concerns.

Second, we can look at the opponent (in this case Orlando) and how they’ve been shooting for the season to date. We’re going to run into sample size issues here especially early in the season, so we can use our good old ballast method to smooth the numbers out.

So our Level 5 model will be the same as our Level 4, model, with one important change. Instead of the global average, our prior will be some function of the global average, the team’s prior season average and the opponent’s season-to-date shooting. Hmm…”some function of” a bunch of predictors…sounds kinda like a regression, right?

Yep. We’re going to take our Bayesian model and weld a regression onto it.

Welding GIFs | Tenor

Now, there are two ways to do this. The easy way would be to do it in two separate steps – first use a regression to fit the prior, then use the prior to fit the Bayesian piece. But the magic of conjugate priors together with the functionality of Excel solver allows us to fit the whole thing in one step. This is more important than just a time savings…to the extent that there is correlation between the predictors and the season-to-date stats (which there usually is), it can actually make a significant difference in the final model. In the two-step approach, the predictors get first priority and any residual signal gets allocated to the observed data. In the single step approach, we let the data decide how much signal gets allocated to the predictors vs the observed data.

I’ve done a lot of research and I have not found this thing described in any textbook or academic paper. I’ve had email exchanges with several prominent statistics professors, and they’ve all said that it makes sense and it’s similar to a bunch of things (Bayesian hierarchical models, latent variable models) but they’re not aware of any literature that puts it together in this specific way. So either I haven’t looked hard enough (email me if you find anything!) or I’ve made an error somewhere, or I’ve invented something new. If it’s new, I’m calling it “the conjugate regression”.

Let’s build up our end-boss, “Level 5” model, a conjugate regression, on 3P% allowed. Because 3P% lives between 0 and 1, it will actually be a logistic conjugate regression. And because the global average of 0.355 is such a strong anchor point, I’m going to use it as an offset:

LN(prior mean/(1-prior mean)) = ln(0.355 / (1-0.355)) + (a bunch of stuff that should be centered around zero).

We’ve identified two predictors to make up the “bunch of stuff”, the team’s prior year 3P% allowed and the opposing team’s ballasted season-to-date 3P% made.

We’re modeling using 2018-19, so the prior year is 2017-18 and the global average 3P% for that season was 0.361. So we need to transform the prior year 3P% to be on a logit scale and centered around zero:

Prior year logit differential = LN(prior year 3P% / (1 – prior year 3P%)) – LN(0.361 / (1-0.361)).

For 2017 Miami, the 3P% allowed was 36.0% so Miami’s prior year logit differential is -0.006.

We need do three things to the offense season-to-date stats – ballast them, logit transform them and center them:

Offense ballasted average = (Opponent season to date 3PM + Offense 3pt ballast * 0.355) / (Opponent season to date 3PA + Offense 3pt ballast)

Offense logit differential = LN(Offense ballasted average / (1 – Offense ballasted average)) – LN(0.355 / (1 – 0.355))


LN(prior mean/(1-prior mean)) = ln(0.355 / (1-0.355)) + (prior year coefficient * prior year logit differential) + (offense coefficient * offense logit differential)

Prior beta = prior alpha * (1 – prior mean) / prior mean.

Posterior alpha = prior alpha + season-to-date 3PTM against.

Posterior beta = prior beta + season-to-date 3PA against – season-to-date 3PTM against.

Loglikelihood = ln(beta_binomial_pmf(3PTM against in game, 3PTA in game, posterior alpha, posterior beta))

When you put it all together, there are four parameters that are not derived from other parameters: prior alpha, offense 3pt ballast, prior year coefficient and offense coefficient. So we are solving for the combination of those four parameters that maximizes loglikelihood.

Let’s take a quick time out and discuss one of the major limitations of the Excel Solver tool. It can tend to get stuck on local maximums when searching for global maximums, especially with more complex functions.

Fortunately Solver comes with a function called “MultiStart”. What it does is it starts the search in a bunch of different places, finds a bunch of solutions, and picks the best of the bunch. In order to use MultiStart you have to give Solver a search space by specifying upper and lower bounds for each of your parameters. Because of the way I defined prior year coefficient and offense coefficient, they should be between 0 and 1. Prior alpha and offense 3pt ballast should be between 1 and 9,999,999.


An interesting note about the free throw model – the prior alpha is now effectively infinity and the prior year coefficient is almost zero. The offensive team and the global average are the only pieces that are left. So it took us a while to get there, but we eventually demonstrated what we all suspected – that “free throw defense” is not a measurable skill that a team can have.

Squared errors and loglikelihoods:

You can see how much improvement there is between level 4 and level 5!

And finally, the back test…

Not Great, Bob | Know Your Meme

We’ve come all this way only to end up with a losing model. This is actually fortunate for you; if the model had been profitable, I would not be giving it away to you for free.

So what happened? I still have a very high degree of confidence in our estimates of opponent 3P%, 2P% and FT% coming out of the model. I think the error is in our application of that model. The assumption that the market is using raw season-to-date percentages is naïve and wrong. Also, there are all kinds of things going on from game to game – home/away splits, rest, injuries – that is not being accounted for.

So, dear reader, unfortunately this journey does not end with a plug-and-play winning NBA model dropped into your lap. However, this is something that I think could be incorporated as part of a wider-scope NBA model. As well, everything in this series – from ballast models to Bayesian models to conjugate regressions – has all kinds of applications in a wide variety of modeling contexts. It’s not the destination, it’s the journey!

I hope you’ve enjoyed reading this half as much as I’ve enjoyed writing it. If you have any questions, you can try to email or tweet me and I’ll do my best to answer (no promises!)

Happy modeling!

4 thoughts on “Building a Bayesian Model: Part 5

  1. I’m still a bit confused on how these two equations fit together?
    1) ln(prior mean/(1-prior mean)) = ln(.355/(1-.355)) + (prior year coeff * prior year logit) + (offense coeff * off logit)
    2) loglikelihood = ln(beta_binomial_pmf(3ptm against, 3pta against, posterior alpha, posterior beta))

    I get that they’re supposed to be solved together in a regression/bayes approach.

    Should the loglikelihood be solved first for the optimal prior alpha, but then how do we solve for the other 3 unknown parameters using the first equation?

    I’m fascinated by this approach but don’t quite understand this specific point.


  2. This is a tremendously interesting and informative series of posts. I’ve always been solid with Excel — for instance, I use Solver for NFL power ratings — but this is a level beyond and I look forward to seeing if I can apply it. Thanks for posting this. Look forward to your future posts.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with
Get started