Idea

The idea behind this short project was a graphic that is displayed during ESPN’s bowl games. The graphic showed the records of each FBS conference in out of conference games throughout the season as well as the conferences’ bowl records. This was interesting at the time because the American Athletic Conference had started their “Power 6” slogan, asserting that they should be considered with the top 5 conferences, instead of the lower “Group of 5” conferences. Indeed, in this particular year, the American had outperformed the Big 12 in out-of-conference win percentage.

Methodology

Data

The data for this project came from the collegefootballdata.com API. I wanted to take a wider look at the question of interest, so I took the regular season games from the 2014-2018 seasons. 2014 was chosen as the start point because it was the first season under the current conference alignment.

I used a Ruby script to pull from the API.

require 'httparty'
require 'csv'

url = "https://api.collegefootballdata.com/games"
year = "2018"
## use following query to get bowl games/postseason games
# query = {"year" => year, "seasonType" => "postseason"}
query = {"year" => year}

page_data = HTTParty.get(url, :query => query)
responses = page_data.parsed_response

keys = ["id", "season", "week", "season_type", "start_date", "neutral_site",
  "conference_game", "attendance", "venue_id", "venue", "home_team",
  "home_conference", "home_points", "home_q1", "home_q2", "home_q3", "home_q4",
  "away_team", "away_conference", "away_points", "away_q1", "away_q2", "away_q3",
  "away_q4"]

games_filename = "#{year}_games.csv"
CSV.open(games_filename, "ab") do |csv|
  csv << keys
  responses.each do |response|
    csv << [response["id"], response["season"], response["week"],
            response["season_type"], response["start_date"],
            response["neutral_site"], response["conference_game"],
            response["attendance"], response["venue_id"], response["venue"],
            response["home_team"], response["home_conference"],
            response["home_points"], response["home_line_scores"][0],
            response["home_line_scores"][1], response["home_line_scores"][2],
            response["home_line_scores"][3], response["away_team"],
            response["away_conference"], response["away_points"],
            response["away_line_scores"][0], response["away_line_scores"][1],
            response["away_line_scores"][2], response["away_line_scores"][3]]
  end
end

Procedure

The methodology was fairly simple: take all of the out-of-conference games, and compute the win percentage for each conference. Later on, I decided to create a matrix so that each conference’s performance against other conferences could be measured at one time.

Analysis

setwd("./Materials/Power_6/")
## load data
games18<- read.csv("2018_games.csv", stringsAsFactors = F)
games17<- read.csv("2017_games.csv", stringsAsFactors = F)
games16<- read.csv("2016_games.csv", stringsAsFactors = F)
games15<- read.csv("2015_games.csv", stringsAsFactors = F)
games14<- read.csv("2014_games.csv", stringsAsFactors = F)
games<- rbind(games18, games17, games16, games15, games14)
rm(games18);rm(games17);rm(games16);rm(games15);rm(games14)
## clean data
#### remove FCS teams
games<- subset(games, home_conference != "" & away_conference != "")
#### select columns of interest
games<- games[,c("id", "season", "week", "home_team", "home_conference", 
                 "home_points", "away_team", "away_conference", "away_points")]
#### selecting just interconference games
interconference<- subset(games, home_conference != away_conference)
head(interconference)
##           id season week        home_team   home_conference home_points        away_team  away_conference away_points
## 13 401013086   2018    1           Tulane American Athletic          17      Wake Forest              ACC          23
## 17 401020673   2018    1 Western Michigan      Mid-American          42         Syracuse              ACC          55
## 19 401013089   2018    1             Duke               ACC          34             Army FBS Independents          14
## 20 401012880   2018    1   Michigan State           Big Ten          38       Utah State    Mountain West          31
## 21 401022512   2018    1         Stanford            Pac-12          31  San Diego State    Mountain West          10
## 22 401013354   2018    1        Wisconsin           Big Ten          34 Western Kentucky   Conference USA           3

Above, you can see what the data for analysis looked like. The columns of interest are the conferences and the scores, and in order to move further, columns had to be added to the data frame.

## convert points to numeric
interconference$home_points<- as.numeric(interconference$home_points)
interconference$away_points<- as.numeric(interconference$away_points)
## verify there are no NA values in the scores, results of "0" are hidden for readability
sum(is.na(interconference$home_points))
## [1] 0
sum(is.na(interconference$away_points))
## [1] 0
## add columns for analysis
interconference$winner<- character(nrow(interconference))
interconference$loser<- character(nrow(interconference))
interconference$mov<- numeric(nrow(interconference))

Other than adding the columns, I wanted to vefity that there were no NA values in the team scores. This would prevent the creation of a margin of victory column (called “mov” above) for further analysis. Next, values had to be assigned to each of these new columns for each row through iteration.

for (i in seq(nrow(interconference))){
  if (interconference$home_points[i] > interconference$away_points[i]){
    interconference$winner[i]<- interconference$home_conference[i]
    interconference$loser[i]<- interconference$away_conference[i]
    interconference$mov[i]<- interconference$home_points[i] - interconference$away_points[i]
  } else {
    interconference$winner[i]<- interconference$away_conference[i]
    interconference$loser[i]<- interconference$home_conference[i]
    interconference$mov[i]<- interconference$away_points[i] - interconference$home_points[i]
  }
}
head(interconference)
##           id season week        home_team   home_conference home_points        away_team  away_conference away_points  winner             loser mov
## 13 401013086   2018    1           Tulane American Athletic          17      Wake Forest              ACC          23     ACC American Athletic   6
## 17 401020673   2018    1 Western Michigan      Mid-American          42         Syracuse              ACC          55     ACC      Mid-American  13
## 19 401013089   2018    1             Duke               ACC          34             Army FBS Independents          14     ACC  FBS Independents  20
## 20 401012880   2018    1   Michigan State           Big Ten          38       Utah State    Mountain West          31 Big Ten     Mountain West   7
## 21 401022512   2018    1         Stanford            Pac-12          31  San Diego State    Mountain West          10  Pac-12     Mountain West  21
## 22 401013354   2018    1        Wisconsin           Big Ten          34 Western Kentucky   Conference USA           3 Big Ten    Conference USA  31

Now that we have the winning and losing conference for each game, the next step is to find the win percentage for each conference and creating the matrix.

matrix<- matrix(nrow=11, ncol=11)
colnames(matrix)<- unique(interconference$home_conference)
rownames(matrix)<- unique(interconference$home_conference)
diag(matrix)<- 0

for (conf in unique(interconference$home_conference)){
  # cat("Results for", conf, ":\n")
  other_conf<- unique(interconference$home_conference)[-match(conf, unique(interconference$home_conference))]
  matrix_i<- match(conf, unique(interconference$home_conference))
  for (o_conf in other_conf){
    of_interest<- rbind(subset(interconference, winner == conf & loser == o_conf), subset(interconference, winner == o_conf & loser == conf))
    winpct<- round(nrow(subset(interconference, winner == conf & loser == o_conf))/nrow(of_interest),3)
    matrix_j<- match(o_conf, unique(interconference$home_conference))
    #cat("Win pct for", conf, "vs", o_conf, ":", winpct, "(", nrow(of_interest), "games )", "\n")
    matrix[matrix_i, matrix_j]<- winpct
  }
  no_games<- nrow(subset(interconference, winner == conf | loser == conf))
  ooc_winpct<- round(nrow(subset(interconference, winner == conf))/no_games,3)
  cat("Out of conference win percentage for", conf, ":", ooc_winpct, paste0("(", no_games, " games)"), "\n")
}
## Out of conference win percentage for American Athletic : 0.45 (191 games) 
## Out of conference win percentage for Mid-American : 0.303 (195 games) 
## Out of conference win percentage for ACC : 0.618 (207 games) 
## Out of conference win percentage for Big Ten : 0.712 (205 games) 
## Out of conference win percentage for Pac-12 : 0.696 (138 games) 
## Out of conference win percentage for Mountain West : 0.346 (185 games) 
## Out of conference win percentage for SEC : 0.785 (205 games) 
## Out of conference win percentage for Big 12 : 0.652 (112 games) 
## Out of conference win percentage for Conference USA : 0.29 (210 games) 
## Out of conference win percentage for Sun Belt : 0.287 (167 games) 
## Out of conference win percentage for FBS Independents : 0.444 (261 games)
## making the matrix more readable with conference abbreviations
names<- c("AAC", "MAC", "ACC", "B1G", "P12", "MWC", "SEC", "B12", "CUSA", "SB", "Indep.")
colnames(matrix)<- names
row.names(matrix)<- names
matrix
##          AAC   MAC   ACC   B1G   P12   MWC   SEC   B12  CUSA    SB Indep.
## AAC    0.000 0.833 0.406 0.273 0.667 0.500 0.200 0.158 0.609 0.857  0.406
## MAC    0.167 0.000 0.050 0.245 0.000 0.533 0.067 0.385 0.333 0.500  0.500
## ACC    0.594 0.950 0.000 0.435 0.167 0.500 0.514 0.714 0.905 0.941  0.447
## B1G    0.727 0.755 0.565 0.000 0.450 1.000 0.571 0.562 0.944 0.909  0.500
## P12    0.333 1.000 0.833 0.550 0.000 0.750 0.333 0.667 1.000 1.000  0.680
## MWC    0.500 0.467 0.500 0.000 0.250 0.000 0.077 0.000 0.667 0.667  0.485
## SEC    0.800 0.933 0.486 0.429 0.667 0.923 0.000 0.538 0.911 0.943  0.895
## B12    0.842 0.615 0.286 0.438 0.333 1.000 0.462 0.000 0.952 1.000  0.571
## CUSA   0.391 0.667 0.095 0.056 0.000 0.333 0.089 0.048 0.000 0.632  0.565
## SB     0.143 0.500 0.059 0.091 0.000 0.333 0.057 0.000 0.368 0.000  0.667
## Indep. 0.594 0.500 0.553 0.500 0.320 0.515 0.105 0.429 0.435 0.333  0.000

The results are as expected: the only conferences with win percentages over 50% are the Power 5 conferences. There does, however, appear to be a separation between the American and the other “Group of 5” conferences. This does lend some credence to the “Power 6” mantra adopted by the AAC. Further analysis shows that the only Power 5 conference that the American had a win percentage over 50% for was the Pac-12.

for (conf in c("ACC", "Big Ten", "Pac-12", "SEC", "Big 12")){
  of_interest<- subset(interconference, (winner == conf | winner == "American Athletic") 
                       & (loser == conf | loser == "American Athletic"))
  winpct<- round(nrow(subset(of_interest, winner == "American Athletic"))/nrow(of_interest), 3)
  cat("Win percentage for AAC against", paste0(conf, ":"), winpct, 
      paste0("(", nrow(of_interest), " games)"), "\n")
}
## Win percentage for AAC against ACC: 0.406 (32 games) 
## Win percentage for AAC against Big Ten: 0.273 (22 games) 
## Win percentage for AAC against Pac-12: 0.667 (6 games) 
## Win percentage for AAC against SEC: 0.2 (15 games) 
## Win percentage for AAC against Big 12: 0.158 (19 games)

We can see that there were only six games against the Pac-12, and therefore the sample size may be insufficient to conclude that the AAC is superior to the Pac-12. Therefore, while we can say that the American is a head above the other “Group of 5” conferences, they are not yet on par with the “Power 5.”