GridironHistory
GridironHistory
GridironHistory
Gridironhistory
Gridironhistory.com
Gridironhistory
Welcome to Gridironhistory

SECRETS REVEALED - The process used when researching CFB History.

I've had several people ask me about the process that I use for auditing each school/conference collegiate football history, so I decided to write a basic step by step guideline of the entire process that I use. With as detailed as this may appear to be, do note that I'm NOT tipping my hand to everything, nor am I explaining the actual in-depth detail of each step.

Just an example, I basically take a School's media guide (I'll use Eastern Michigan - Linked PDF Media guide showing year by year results) from PDF all the way through to Eastern Michigan's GridironHistory.com profile (Linked Profile here on GH).

It's important to note that I'm using what I call "Short Names". This allows for efficient naming conventions when the URL is called up in a web browser (Something like "Saint Mary's of California" would potentially break the URL on some browsers, so I use "Saint Marys" as the Short Name, and on the actual school profile page, the full name of the University is used, whenever possible.

You should also be aware that the school names that we use in our database are the CURRENT names. At some point, I will begin researching the history of each school and allow people to track a School's name changes over the years.  See the second post after this one for a few examples.

Here's the process I use:

Audit Stage 1 - Individual member schools:
  • Extract the data from the PDF, brought into a Notepad file. Each line is tabbed out to organize data into columns.
  • Missing Game Dates and incorrect opponents filled in. Post Season (Bowl and playoff) games as well as Classics are indicated properly.
  • For some schools, there are often missing games from the Media Guide that are indicated in several other places.
  • Bring everything into an Excel Spreadsheet, optimize all Opponent names to standard "Short Name" conventions (See above).
  • Game Location (City) added to games that don't have this information. Game Location designators (Home, Away, Neutral, Classics, Bowls (both Bowls and Playoffs are considered "Bowls" for my purposes).
  • Wins, Ties and Losses are verified using mathematical calculations to indicate incorrectly flagged outcomes. Inconsistencies are investigated.
  • Forfeits and Vacated games indicated properly. Overtime games indicated. Any special notes (such as disputes on certain games) indicated.
  • In a separate table, indicate Bowl Wins, Conference Champions, National titles, etc.
  • Add opponents into the Schools Database table that do not currently exist (As of this post, I currently have 2,398 schools recorded in this DB)
  • Generate a master record of game counts on a year by year basis (this helps a LOT in Stage 2 and Stage 3).

- Depending on the amount of work in the first two steps, I can complete Audit Stage 1 in anywhere from 2-8 hours, per school.

Audit Stage 2 - Playing with data at a Conference level.
  • Merge all schools data from a single conference. Run a duplicate checker and remove all current duplicates.
  • Import the merged data into a Pre-Audit database.
  • Verify and correct any typos on school names against the Master Schools Database table
  • Generate a report for each school where the number of games per year does not equal the Master Game Counter table (from Stage 1).
  • Work through those and correct duplicate games. (This is where I catch Game date issues, conflicting scores, conflicting wins/losses/ties/game locations, etc).
  • Repeat the two above steps until there are NO more discrepancies between the Pre-Audit database table and the Master Game Counter table.
  • Indicate Conference games accordingly.

- Depending on the amount of corrections, this can take anywhere from 4 hours to 16 hours to complete. In many cases, I average 6 hours.

Audit Stage 3 - Bring the Stage 2 data into the previously audited records from other conferences.
  • Repeat all but the very last step from Stage 2, but with any other previously audited conference (in this case, all D1-A conferences and the 4 Independents, plus a few schools from DI-AA, DII, DIII, NAIA and NCCAA).
  • Once the current Audit Stage 2 conferences results in no conflicting data, re-run the game counter verification against all previously completed conferences and resolve those issues.
  • Once everything is correct, import the updated Conf Championships, Bowl/Playoff wins, National Championships into the Championship table (this will eventually be brought back into the Scores table to reduce the amount of queries needed on the database).
  • Launch the updated data into live production mode on the actual site itself.

- Much like Audit Stage 2, Stage 3 can take anywhere from 4 to 16 hours to verify.


I've got nearly 19 months and over 3500 hours of work into researching nearly 135 schools. When I started with the SEC, it took me nearly 3 months of work just to complete all of the steps above, except that it was done manually, without any scripts to help automate the audit process. As you can imagine, it was quite a headache to do all of the verifications.

People frequently ask where I get this information, since I'm not physically travelling to each School's library to look at their microfiche. 95% of the information comes directly from each School's media guide (I have nearly 180 media guides and archived HTML pages that came directly from the school itself). I also utilize Wikipedia as a reference point, especially when researching a School's name changes over the years. I also utilize up to 6 other CFB Historical websites to cross reference conflicting information. The key is to know WHAT you're searching for when using Bing or Google. I even have complete histories at my disposal that aren't on any of the other CFB History websites.

Does "Florida Southern University" sound familiar in terms of Collegiate Football? No? Not surprising. They only fielded a team from 1923 to 1933. :)

As I've worked through this process, and since I fancy myself a PHP/MySQL Developer, I started writing several scripts to help automate the steps above. The Admin tools on GridironHistory.com are pretty intuitive and powerfully complex, while making it *very* simple for even the most basic person to work through.

I do have to be careful, though. There is a LOT of reading and things you have to absolutely pay attention to, or you can pretty much obliterate an entire conference's data, or quite possibly the entire database itself.

With the tools that are now in place (and are constantly being adjusted to introduce new algorithms), I can now audit a 12 team conference in as little as 10 days, IF I'm left to work uninterrupted (HAHAHA!). I tend to average approximately 15-20 days, per conference.

Most of you won't even care about what it takes to be able to research the history of your favorite school. All you see, at the end of the day, is some pretty damn amazing stats.

If you managed to read through ALL of this... thank you! I hope this gives you an insight as to how detailed and how much work it actually takes to deliver the information you see on GridironHistory.com.

I hope you enjoyed learning about our audit process!
--Douglas Hazard
Lead Historian & Researcher for GH