Tuesday, April 12, 2016

New & Improved Projection Model

The projections for the 2017 season are now live. My goal is to try as best as I can to keep up with all the transfer and early-entry decisions, and run an update at least 2 or 3 times a week through the end of the academic year, after which everything will calm down until the start of practice in September. 

For now, I've made the decision to remove any player from the rosters who has declared himself eligible for the NBA draft, whether or not he's hired an agent. As the inevitable returning-to-school announcements are made, I'll add them back to the rosters, and try to flag any big movers on Twitter.

This year's projections should be better than ever.

For the past three seasons, I've used a similar-player model to find comparable players for every returning player, and then used those players' collective year-over-year changes to make predictions for how the returnees would improve or decline. Those individual projections were then run through a program which combined them with all the other players on each team to generate a team projection.

This method worked pretty well, especially for the teams in power conferences and other top-100 type teams, but it had a nasty habit of overestimating everyone else's prospects. And while it did an okay job of predicting conference wins, the fact that I was missing so badly on the mid- and low-major teams' predicted TAPE ratings, and that other systems were able to consistently do better at predicting conference records was enough to send me back to the drawing board.

The result is a system which is similar to the old method in that it uses individual projections to build team predictions, but differs in a couple of key areas. The big difference is that the comparable-player model has been scrapped. My hypothesis when building this model was that it would be a good one for identifying potential breakout candidates. If Player X looks a lot like these other players, the thinking went, and a lot of them broke out, then this guy should, too.

Three years later, it's clear that this just didn't work. With short seasons, small  sample sizes, and the inherent unevenness of player development inherent in 19- to 22-year-old basketball players, there was just too much noise there, and it wound up just being a model that said, in essence, everybody's going to probably get a little bit better. Problem is, it did a pretty lousy job at even predicting the extent of that improvement, especially among the bottom 2/3 of Division I players.

In its place now is a simpler regression model in which the year-zero performance of a similar cohort of players--"high-major sophomore big men who played starter minutes," for example--forms the basis of every returnee's projection. But whereas the old system merely had a sanity-check at the end which would, for example, nudge up Shaka Smart's players' steal rate if it wasn't sufficiently high based on historical norms, the new model uses team and coach development history as a much bigger factor at the front end of the process. Likewise, incoming players have a projection built on how, say, other true freshman consensus top-30 shooting guards coming into high-major programs have performed in the past, there's a further refinement based on the numbers that others coming into that same program have posted. 

The end result is a system which winds up being one that's far more impacted by a program's own history than the previous one was, while still, I think, accurately reflecting roster quality. Most importantly, it's one that will just do a better job of predicting how teams will play in the upcoming season. The average error for predicted 2015-2016 conference wins was 2.20 under the old system. Re-projecting the season under the new method (using only information that was in the database as of October 2015, of course) yielded an average error of 2.04 conference victories.

No comments: