Friday, March 3, 2017

Introducing Résumé REPORT

TAPE is a predictive ratings system. If I want to know how a team is likely to perform against any other team in Division I, it can do a pretty good job of telling you the chances of winning that matchup. By extension, it's also pretty good at sorting teams relative to how they'd be likely to perform against a hypothetical benchmark team.

Against that benchmark team, TAPE thinks only 33 teams in the whole country would be more likely to win a game than Clemson. On a neutral floor, TAPE thinks that there are only 32 teams which would be favored to beat the Tigers.

There are 42 teams which TAPE thinks would be more likely to beat that same hypothetical TAPE Index team than Maryland would. There are 42 teams which TAPE would favor in a neutral-site matchup with the Terps.

If Maryland and Clemson were to play each other on a neutral floor, TAPE predicts that Clemson would have about a 54% chance of winning the game, and would be favored by about 1.4 points. In other words, simply put, according to the parameters by which TAPE evaluates basketball teams, the Clemson Tigers are a better basketball team than the Maryland Terrapins.

That's great in theory, but here in the real world of the 2017 college basketball season, Maryland fans are having a whole lot better time than Clemson fans. Even with a tough Februrary, the Terps have lost only 6 times all season, are tied for second place in their conference, and are pretty close to a lock for the NCAA Tournament. The Tigers, on the other hand, are a game over .500 overall, 5-12 in league play, sit in 12th place in the ACC standings, and are, I think, generously listed with a 34% chance of earning a bid to the NCAA Tournament by the STAPLE algorithm.

It doesn't take a computer program to tell you that Maryland, even if they might not be as "good" as Clemson, has sure had a better season. And you wouldn't find many folks who would argue that Clemson would be more deserving of a bid to the NCAA Tournament than Maryland.

Louisville, Notre Dame, Duke, Florida State, West Virginia, Minnesota, VCU, and Valparaiso all have identical 23-7 records against Division I opponents. We know intuitively that not all of those 23-7 records mean the same thing, that the first six of those are probably more impressive than the other two. But how do we quantify the differences? How do we scale those records in order to make a true apples-to-apples comparison among those teams?

For nearly four decades the NCAA Tournament Selection Committee has used the RPI as a blunt instrument to compare teams. Long since discredited as a tool with the ability to rate or rank teams precicsely, its continued use has been justified as a "sorting tool" by which teams themselves are not judged so much as their schedules are judged against one another. Yet the RPI is inadequate even for this purpose, and its days, finally, appear to be numbered.

While the RPI is simplistic in its design, and that design was deeply flawed, I believe that the concept of the RPI--the thing that it set out to accomplish--is essential for its successor to strive for. Namely, in a sport where upwards of 350 teams of vastly differing levels of quality play a short season of 25 to 30 contests each, we need to find a way to translate every team's on-court wins and losses into something approximating one standard. We need to be able to say that Minnesota's 23-7 is better than Valpo's 23-7 (but maybe not quite as good as Duke's), and to be able to say, with confidence, just how much better that it is.

The good news is that the proliferation of predictive ratings systems for college basketball teams have made that possible. The concept of Wins Above Bubble (WAB), I think, gets us most of the way there. WAB is the difference between the number of games a team has won and the number of games that a bubble-quality team would have been expected to win against that team's schedule. It's simple to understand, relatively simple to compute, and a pretty elegant solution to the problem.

WAB has a couple of key shortcomings, though. The first is one of quantity. Since WAB is a "counting" stat, it's possible for a team's net wins relative to a bubble team to be, at least partly, a function of the number of games that team has played. Second, like RPI, WAB is agnostic as to who the wins an losses came against. The inputs are simply a team's record and its schedule strength; there is no accounting for who the wins and losses actually came against.

To that end, I've added a résumé feature to the site: Results Expressed as a Percentage Outcome Relative to TAPE index (or REPORT for short). Each team has been assigned a Schedule Factor--actually three different schedule factors, one each for home, neutral, and road games--which is the probability that said team would beat a TAPE index team in a given game. The Schedule Factor for neutral-site games is just a team's TAPE rating with the weighting for recent games stripped out; the home and away factors are the same rating with each team's home court advantage or road disadvantage applied.

For each game a team plays, a win will be multiplied by that factor. So a win against a team with a schedule factor of 600--in other words, a team which a TAPE index team would lose to 60% of the time--is worth 0.6 wins. A loss to that same team would be worth 0.4 losses.

Each team's schedule page now features this information for each game. Wins and losses are highlighed in red and green, with darker colors indicating more unlikely results--better wins and worse losses. Games against teams with a schedule factor greater than 500 (in other words, games against NCAA Tournament quality teams) are highlighted in green on the REPORT side of the ledger.

By way of explanation, here's Indiana's first 9 games:
 Note the dark green for the Kansas and UNC wins. Those were games that a TAPE index team would have lost 78.9% and 76.2% of the time, respectively, and so they're great wins for the Hoosiers. Since a TAPE index team would have won at Fort Wayne 62.6% of the time, that's a fairly dark red loss. The remaining games, all wins, are nearly white since they're all games that a TAPE index team would have won more than 90% of the time.

There's also a REPORT page, where all 351 teams are listed with their three Schedule Factors, total win quality, total loss quality, and net wins (WAB). In order to resolve the two shortcomings identified above, teams are ranked and sorted by the REPORT column, which is total win quality divided by the sum of win quality and loss quality.

Expressing the records in this manner obviates the issue of the number of games played; all teams have a rating between zero (all winless teams will be 0) and one (all undefeated teams will have a rating of 1). 

Furthermore, expressing the REPORT as a percentage has the advantage of giving teams extra credit for big wins, as well as penalizing them for bad losses. Think of two teams who had played identically difficult schedules. Each played 5 games against teams with 200 schedule factors, and one against a team with a 500 schedule factor. Team A won the first five games and lost the sixth, while Team B lost the first game and won the next five. Each would have 0.5 net wins, but Team A would have a REPORT of 0.667 (1 net win divided by the sum of 1 net win and 0.5 net losses), while Team B's REPORT would be 0.619 (1.3 net wins divided by the sum of 1.3 net wins and 0.8 net losses).

Team A won all the games it should have won and then lost a coin-flip game, while Team B lost a game it really shouldn't have and then won a coin-flip game. It makes sense to reward Team A for that.

No comments: