Friday, October 12, 2012

North Carolina Isn't Going To Be Very Good (or: Matchup Zone's Projection System Explained)

There seems to be a pretty clear pattern when it comes to the way people approach the projections on the site. First off, they'll scan the page or do a Ctrl-F to find their favorite team. They'll almost always mutter under its breath that it's too low, that some stupid computer can't know how hard their guys are working, and how the chemistry on this year's team is gonna be better, and how that new recruit is going to take the league by storm. Then they'll scroll back up to the top of the list and start reading down. That team's too high, they'll think, and that other one's underrated, and New Mexico? Really? Ah, whatever.

They'll usually click around a little bit, visit a team site or two, and one or two conference sites. But, eventually, in all of that clicking and scrolling, they'll come across this line, and they'll start to question my sanity.

RANKTEAMCONF.TAPEAt LargeTop 25Win Conf.
88North CarolinaACC8.4619%3%1%

*The numbers on the projection might be slightly different by the time you're reading this; rankings and relative strengths and postseason probabilities are changing all the time as players transfer out, receive immediate eligibility at new schools, or are ruled ineligible by either the schools or the NCAA

They'll click through to the UNC team page to try and figure out what in the world could justify such a patently ridiculous prediction. What could make North Carolina a less than 1-in-5 shot to reach the NCAA Tournament? And what seems to jump out to most folks is that James Michael McAdoo--the guy who everybody just knows is going to have a breakout sophomore season--is projected to go for just 6.4 points and 3.1 rebounds in a little over 18 minutes per game.

That's crazy, they'll think. The guy averaged 6.1 points and 3.9 rebounds in 15 minutes per game last year. Surely he'll be better than that this year, and doesn't Mr. Matchup Zone know that McAdoo's gonna be The Man this year? This is stupid. I'm gonna go back to watching cat videos.

And so they leave the site. They don't come back. And they don't even tell their friends that they need to check out this clown site on the Internet that says UNC will be lucky to be an NIT bubble team come March, because it's just a bunch of numbers on a screen that don't tell a story.

So I guess I've got to tell the story of how the system arrived at its predictions. Follow me past the jump, and I'll try to explain this thing.



Build a Roster
The first step toward building a team projection is a simple one in theory, but poses some interesting practical challenges. I've got to find out just who is going to be playing. The starting point for that is simply last year's roster. Take away the seniors, and move everyone else up a class. Jeff Goodman's list of Division I transfers is a valuable reference for keeping track of who's moving, and his list from the previous year is great for checking on who'll be eligible for their new teams. Subscriptions to various national RSS feeds covering college basketball catch most of the late player movement happening at the high- and mid-major level. And the various recruiting services' listings of incoming freshmen help me fill out the rosters.

Even with all of that, I'm sure there's lots of player churn that's completely missed. If you see anyone listed on the team pages who you know won't be playing, let me know. Likewise, if I'm missing someone, let me know about that as well. The rosters are the foundation upon which the projection model is built, so the better that information is, the better the projections will be.

Build Individual Player Projections
Once I know who's going to be playing, the next step is to start making some predictions about how those players will perform in the coming year. There's no way to know exactly what an 18- to 22-year old is going to do over the course of 30 games. But what we can do is hazard an educated guess based on how other similar players have done in the past.

So, getting back to our North Carolina example of James Michael McAdoo, the computer looked for other freshman power forwards who'd been highly regarded recruits out of high school and played somewhere between 25% and 50% of their team's floor minutes. It then found out how those players' sophomore seasons differed from their freshman seasons. Using McAdoo's freshman numbers as a baseline, it then built a projection of what his sophomore season would look like. The computer goes through the same process of finding similar players and building projections for every returning player on every team in the country.

For freshmen, there are a couple of different ways to get projections. For the 100 or so freshmen listed in the RSCI, the process isn't all that different from the way returning players are projected. Other players are found with similar recruiting rankings and positions, and those previous players' performances are averaged to form the expectation for the incoming freshman. The remaining freshmen and junior college players are projected based on how freshmen at their positions in their conferences have performed, with some minor tweaking based on their reputation scores (stars, grades, etc.) aggregated from various recruiting services.

These are crude projections, though, because the numbers they're based on--both from the returning players and from the historical comps--have been stripped of all their context. They're all adjusted rates, meaning that they show how the players would perform against a standardized level of competition, and surrounded by average teammates. This is a really important feature of the system, actually. It's what makes it possible to project how a guy transferring from an ACC team to one in the MAC will perform, and it allows us to compare individuals and teams across widely varying levels of competition in an apples-to-apples way.

Build a Team Projection
What we need to do now is to put some of that context back in by assembling the team and determining how all these individual pieces will interact with one another. The first order of business here is dividing the players' minutes up so that the sum of all players' minutes on each team in 200 minutes per game. (In the real world this never happens; since no team has every player play in every game, the sum of players' minutes per game is always somewhere north of 200. What we're really doing is figuring out what percentage of the team's minutes each player will play over the course of the full season.) This is accomplished by determining first how deep of a rotation each coach likes to play by checking previous seasons. Then, using a model based on that rotation size, individual players' projected effectiveness relative to his teammates, and the players' minutes played the previous season, the computer spits out everyone's predicted share of minutes played.

With that done, it's on to the business of figuring out how all these players will interact with one another. Most guys will see their projected share of field goal attempts adjusted up or down, depending on who they'll be sharing the floor with. Their shooting percentages will change based on whether or not they'll be playing with others who are good at setting up their teammates' shots. And their rebound rates will change a little bit depending on whether they play with teammates who either cannibalize or run away from rebounding opportunities.

Once all that is done, it's simply a matter of adding everything back up to arrive at each team's projected effectiveness. At this point, we have a complete, working model of how each team will play, all the way down to turnover rates and defensive block percentages and everything else one would need to project individual matchups, just like we do here in the regular season. Pretty cool.

Build In Some Error
We're not quite done yet, though. What we've actually gotten ourselves is a series of snapshots of what each team is most likely to do, with that roster and that rotation. But they're not going to play that exact rotation--some guys are going to get hurt and miss chunks of the season, others are going to get themselves kicked off the team at some point, some freshmen will get homesick and quit--and those players aren't all going to have exactly that output.

What's cool about using a similar-player model for projecting players is that we can not only make some pretty good predictions about how guys are going to perform, but we can also have a pretty good idea of how much error there is in those projections. And on the individual level, there's a lot of error. For an individual player, we're talking about something on the order of 3 or 4 points of value per 100 possessions that we know we're likely to miss by. On the team level, though, all that error tends to even out. This guy overperforms, this other guy doesn't develop at all. So even though the individual predictions might be off, there's much less error in the team's projection. And teams with lots of seniors, obviously, tend to have a lot less expected error than those full of freshmen.

To account for the possibility of injuries and other player losses, the system removes each player from his team's projection completely, and then makes note of the difference between the team's performance with and without the player. For the best players on relatively thin teams--guys like Aaron Craft at Ohio State, Erick Green at Virginia Tech, and Lorenzo Brown at NC State--this downside risk can be as high as 4 projected TAPE wins. For the really good players who are expected to carry their mid-major teams--Pierce Hornung at Colorado State, C.J. McCollum at Lehigh, Doug McDermott at Creighton--the number can be as high as 5 or 6. Since we can't know when or whom the injury bug is going to bite, the model aggregate's the individual discounts and applies them to the team.

Putting It All Together
So that's pretty much it. We've now got a median performance expectation for each team. We know the associated expected error, and can build in a discount for the expectation of missed time due to player attrition. All of that together gives us the ability to identify the full range of best-case to worst-case scenarios for each and every team in Division I.


But You Said Carolina's Not Going To Be Good. Please To Explain.
I will, thanks. The Tar Heels have a median TAPE expectation of 8.38, which means that over the course of 18 games against average BCS-league competition they'd have a win expectancy of 8.38 games. On its face, that seems absurd. I get that. I mean, they're North-friggin'-Carolina, right? Everybody knows they're gonna be good. But looking deeper, it makes a certain amount of sense.

Carolina's returning players (and their incoming players, for that matter) were all highly-regarded prospects coming out of high school. They're thus just sort of assumed to be very good players. But, so far at least, they haven't done much at the collegiate level to justify their reputations.

The Tar Heels' backcourt consists of two shooting guards who can't shoot and a freshman point guard. Leslie McDonald, who missed all of 2012, has hit just 38% of his career two-point attempts and 33% of his threes; P.J. Hairston hit 39% of his twos and 27% of his threes as a freshman. And, most concerning for UNC fans, those numbers were put up while sharing the floor with an all-time great assist man. Any gains in experience and age are likely to be offset by the loss of Kendall Marshall.

In the frontcourt, much is expected of James Michael McAdoo, but he suffers from the exact same problem as his backcourt mates face. He put up middling numbers as a freshman, and those came while surrounded by much more impressive teammates than he'll share the floor with in 2013. McAdoo was a good defensive player as a freshman, but he was well below average at the offensive end of the floor. Based on the performance of similar players in their sophomore years, McAdoo is most likely to improve to simply average on offense.

But what if the projection for McAdoo is wrong. What if he turns into Tyler Hansbrough on offense. Well, I ran a dummy projection for UNC, substituting Hansbrough's junior season  numbers for McAdoo's (I left McAdoo's defensive numbers there, because he's already a better defender than Hansbrough ever was). And, it should come as no surprise, Carolina did a lot better in that model:

RANKTEAMCONF.TAPEAt LargeTop 25Win Conf.
48North CarolinaACC10.5155%21%6%

You'll notice, though, that even in this scenario, where McAdoo turns into a beast the likes of which we've not seen in some time, and undergoes just a ridiculously off-the-charts breakout improvement, Carolina still is just a bubble team. In order for UNC to have the kind of season that everybody seems to expect of them, they're going to have to see tremendous improvement from all of their returnees, and their incoming freshmen--especially point guard Marcus Paige--are going to have to be better than their recruiting expectation expects them to be.

That could all happen. It's a lot that has to go right, though, and that's why this system gives the Heels such long odds to enjoy the sort of success they're accustomed to.

No comments: