Predicting College Basketball Games

As an avid reader of KenPom’s blog and as a student of statistics, I had the idea of trying to create a KenPom style rating system in order to get predictions for each college basketball game. The main purpose of this exercise was to work on web scraping as well as gaining the ability to experiment with different ideas about the game (such as the effect of home court advantage). The project was a relative success, and the predictions can be seen on the CBB section of this site (once I can automate rendering the pages for the site).

The box score data was taken from the NCAA Stats website, and more specifically the Game-by-Game pages of each team. After using a formula for possessions, efficiencies are calculated. These per-game efficiencies are compared with the pre-game expected efficiencies, and the ratings slighly adjust for these differences. Overall, the process is very similar to KenPom’s. This way, deviations from the expected scores are reflected evenly among the two participants.

My deviation from Ken Pomeroy’s methods come in a couple of areas, most notably in the calculation of win probabilities and the amount of home-court advantage. Instead of using a 3.75 point home-court advantage, a multiple logistic regression was run. The result was closer to a 3 point advantage for home teams. Overall, however, this was only a small deviation from the KenPom method. Expected win probabilities were another situation. After searching for hours on the KenPom blog for an explanation for his probability calculations (to no avail), I simply fit another multiple logistic regression model. This is the model used for the win probabilities seen in the CBB section. After using it for some time, I believe that some calibration will be necessary (especially in situations where the favored team has a 7.5-10 point expected advantage).

In the future, I would like to investigate different home court advantage values for each team (or groups of teams), since it seems like using a constant seems inaccurate. When considering the extremes of the spectrum (say Allen Fieldhouse vs. a WCC team), there are two completely different levels of crowd effect. However, it will be tough to separate the home-court effect from the actual performance of the teams (for our example, Kansas vs. a team like USD). Also, a focus of this next offseason will be finding an ideal starting point (essentially creating preseason ratings).