Showing posts with label long-winded explanations. Show all posts
Showing posts with label long-winded explanations. Show all posts

Friday, December 2, 2011

About That Florida-Syracuse Projection

Astute readers will notice that the projection for tonight's Florida-Syracuse game to be played at the Carrier Dome has Florida as a six point favorite. That seems weird, right? I mean, the Vegas line is the reverse of that, with Syracuse a 6.5-point favorite as of this writing, and my own future games report has the Orange as almost a 5-point favorite. So what gives?

My first thought, of course, was that I'd screwed up somewhere. Maybe I'd called Florida the home team, or had reversed home-court advantage in my calculations somewhere. I went back and looked at the code, and everything was right. So I double-checked by running a projection of the teams around Florida in the rankings to see if they'd also be favored at the Carrier Dome. But #8 Marquette would be a 5.5-point underdog according to the model and the Cuse would be favored by 10.5 over #10 North Carolina. So there must be something unique about the Florida matchup that will give Syracuse problems.

The future games report, for the next two days at least, is still based on a simpler method for predicting games that uses only the teams' adjusted offensive and defensive efficiencies, adjusted pace of play, and a home-court advantage constant as its inputs. This is because the preseason TAPE ratings only included this information, and the current TAPE ratings still include at least part of those preseason ratings through this weekend. This much less sophisticated projection model allowed for season projections from the preseason all the way up to now to be at least reasonably valid. It's done pretty well, too. Through yesterday's games, it's been correct on 712 of the 919 games projected based on that method, and that 77.5% success rate is slightly better than the 76.4% predicted success rate.

As more games have been played, though, the current season's data is coming into focus. For any game in which both teams have played at least five games against D-I competition, the projection in the Today's Games post has been based solely on the current season's data. (You can tell which games are which by hovering over the score predictions for each team; if the popup text has projected Four Factors numbers, it's a current-season-only prediction.) Those projections are based on a true projection model, which includes much more granular data than simply offensive and defensive efficiencies. It takes each facet of the game--shooting, rebounding, turnovers, etc.--to model the expected outcomes of each possession, which then gets aggregated into a game projection which includes a full line score. The projected line score for Florida-Syracuse looks like this:

70.9 possessions
Win% PTS FGM FGA 3GM 3GA FTM FTA OR REB PF AST TO STL BLK
(9) Florida
67.1% 86.4 29.8 65.7 16.1 34.9 10.7 17.5 18.0 40.2 16.9 19.0 14.3 9.8 1.6
(7) Syracuse
32.9% 80.2 31.1 63.5 7.7 20.2 10.3 16.4 13.5 35.0 17.5 16.1 13.0 10.6 7.3

There are a couple important things to watch for in this game that a simple possession analysis would not have picked up on that makes this a favorable matchup for Florida. First, check out the three-point shooting. Florida has been the fifth-best team in the nation shooting threes so far this season, while Syracuse's matchup zone defense concedes more three point attempts than most defenses do (while allowing a success rate right in line with the national average). That's where the almost absurd prediction of over 16 made threes comes from. The second area of concern for Syracuse will be their defensive glass. Florida is a very good offensive rebounding team, and the model predicts that they should be able to board 45.5% of their misses against Syracuse's zone.

That level of granularity is what makes the TAPE system different from any others that I'm aware of. Starting on Monday, the Today's Games report will include projected line scores like the one above for every game.

Tuesday, February 27, 2007

AN INTRODUCTION TO PAPER

On the right side of the page you'll find a series of links to various statistics. Some of them are intuitive, others not so much. The one I'm most proud of is PAPER, which stands for Player Adjusted Probabilistic Effectiveness Rating. It is my attempt to condense all of an individual's box score numbers into one unified number.

PAPER represents the number of points a player would contribute to a league-average team over the course of a 16-game conference season, relative to the expected contribution of a hypothetical league-average player. Only statistics from conference games are used. PAPER does not simply assign a set value to the various statistics that individuals accumulate. Instead, it uses a model of a typical league possession and introduces the player's net contribution to determine what the expected scoring output would be.

WHAT'S IN A POSSESSION?
When a team gains possession of the ball, one of three things is going to happen: they will turn the ball over, a player will be fouled and sent to the free throw line, or they will take a field goal attempt. (Actually, there is a fourth possible outcome--the end of a period or game--but we're not going to concern ourselves with that right now.) Within all but the first, there are additional possible outcomes. Free throws and field goals can be made, in which case points are scored, and the possession ends. They can also be missed, in which case either the defense grabs the rebound, and the possession ends, or the rebound goes to the offense, in which case the possession is renewed.

The first step in calculating PAPER is to determine the frequency with which each of these events occur. This is simply a matter of dividing the number of times an event took place in all conference games by the total number of possessions in all conference games.

DETERMINING PLAYER CONTRIBUTION
The next step is to similarly find the frequency, on a per-possession basis, with which a player causes an event to occur. To determine offensive PAPER we'll need to know each of the following:
  • Turnover rate: how often the player turns the ball over
  • Foul rate: how often the player gets sent to the free throw line
  • Free throw percentage: self explanatory
  • Shot rate: the percentage of his team's shots a player takes
  • Field goal percentage: again, self explanatory
  • Make value: the average value of a player's make (if a player makes 20 shots from the field and 8 of them are 3-pointers, his make value is 2.4 points)
  • Setup rate: how often the player passes the ball to a teammate in position to make the shot (much more on this will follow in a later post)
  • Offensive rebound rate: the percentage of his team's offensive misses that a player rebounds.
Defensive PAPER makes use of the following individual statistics:
  • Steal rate: the percentage of defensive possessions on which a player records a steal
  • Team turnover share: all five players on the floor at the time of a non-steal turnover are assumed to deserve equal credit for its creation; this represents the frequency with which a turnover was forced while the player was in the game.
  • Foul rate: how often a player commits a defensive foul that results in free throw attempts
  • Player defensive unblocked FG%: the percentage of unblocked field goal attempts that opponents make while a player is on the court; much more on the philosophy behind this will follow in a later post. Unblocked FG% is used so as not to double-credit blocks.
  • Block rate: the percentage of opponent field goal attempts a player blocks
  • Defensive rebound rate: the percentage of his opponents' misses that a player rebounds
MAKING ADJUSTMENTS
There are two key adjustments to be made so that PAPER will accurately compare players to the league average.

Pace is built into the system, since all statistics are rate stats based on per-possession (or per-miss, for rebounding) rates. Possessions are estimated based on the formula Ken Pomeroy lays out at the bottom of this page. Hopefully in the near future play-by-play data will be posted by more schools, and I'll be able to use counted possessions rather than estimates. (I'm not counting on this, though; some schools still aren't posting the standard box score for their home games, and others will only post them in PDF format. This is annoying.)

Size is the second key adjustment. Size functions as a basic proxy for position in this analysis. For the purpose of PAPER, which attempts to place each player in the context of an average team, this is an important adjustment to make. A team on the floor is not made up simply of five 6'7" 205# players. A 6'2" point guard has different responsibilities than a 6'10" post player, and both should be judged by how they fill their roles rather than how they perform against a generic standard. More importantly, the other four players on the floor will have different profiles in each case.

PUTTING IT ALL TOGETHER
Once all the adjustments have been made, we're ready to construct the new model and find out how many points per possession the player would contribute to the average team. This can be found in the columns labeled RATE. Multiply RATE by the number of possessions a player would participate in, e voila! you've got PAPER.