Archive for September, 2009

Probability NFL Favorite Wins, Given Point Spread

I’m continuing my investigation of what we can infer from NFL point spreads.  I wrote a little code to compute the conditional probability of the favorite team winning, given the size of the point spread.  Here are the results.

probwin_exact_spread

As you can see, the game is essentially a tossup until the point spread is at least a fieldgoal.  Once the spread exceeds a touchdown, the favorite wins nearly 4/5 times.  In the rare instance that the point spread exceeds two touchdowns, then the favorite is a lock to win.

Here’s the raw data used to generate this plot.

    spread       wins      games    probwin
============================================
      0.00         21         47      44.68
      1.00         70        138      50.72
      1.50         60        123      48.78
      2.00         87        159      54.72
      2.50        126        249      50.60
      3.00        351        587      59.80
      3.50        225        360      62.50
      4.00        101        155      65.16
      4.50         71        120      59.17
      5.00         97        133      72.93
      5.50         99        142      69.72
      6.00        108        168      64.29
      6.50        154        226      68.14
      7.00        177        238      74.37
      7.50         85        123      69.11
      8.00         75         95      78.95
      8.50         70         85      82.35
      9.00         71         91      78.02
      9.50         79         98      80.61
     10.00         66         95      69.47
     10.50         55         67      82.09
     11.00         40         48      83.33
     11.50         23         27      85.19
     12.00         20         25      80.00
     12.50         28         34      82.35
     13.00         20         24      83.33
     13.50         34         44      77.27
     14.00         26         33      78.79
     14.50         14         16      87.50
     15.00          9          9     100.00
     15.50          9          9     100.00
     16.00         14         14     100.00
     16.50          3          3     100.00
     17.00          5          5     100.00
     17.50          6          7      85.71
     18.00          2          2     100.00
     18.50          1          1     100.00
     19.00          1          1     100.00
     19.50          2          2     100.00
     20.00          1          1     100.00
     21.00          1          1     100.00
     22.00          1          1     100.00
     24.00          2          2     100.00

Leave a Comment

Simple Backtesting of Football Pickem Strategy

I was curious to see how my simple strategy for the weighted fantasy football pickem would do in previous years.  I found an archive of NFL point spreads, over under lines, and game results on goldsheet.com.  I don’t know how accurate the data are, but I wrote a little Python script to parse it.  The formatting of the data is pretty rough, so I’ve put a CSV version up on Google Docs.

Here’s a tabular summary of the results.  The key column is Percent, which shows the percentage of available points (MaxPosib) that my strategy got in each year.

    Year      Min      Max      Avg    Total   MaxPosib    Percent
==================================================================
    1993       27      103    60.94     1097       1466      74.83
    1994       46       95    64.82     1102       1500      73.47
    1995       50      114    81.18     1380       1747      78.99
    1996       51      119    82.82     1408       1748      80.55
    1997       52      114    82.24     1398       1745      80.11
    1998       56      120    86.94     1478       1734      85.24
    1999       65      115    91.06     1548       1959      79.02
    2000       62      113    90.53     1539       1959      78.56
    2001       43      117    90.29     1535       1960      78.32
    2002       62      136    95.59     1625       2088      77.83
    2003       72      134    98.53     1675       2088      80.22
    2004       67      129    95.47     1623       2088      77.73
    2005       77      127   101.47     1725       2088      82.61
    2006       65      134    89.59     1523       2091      72.84
    2007       56      136   103.41     1758       2091      84.07
------------------------------------------------------------------
 average    56.73   120.40    87.66  1494.27    1890.13      78.96
std.dev.    12.20    11.84    11.67   189.42     211.33       3.39
     min    27.00    95.00    60.94  1097.00    1466.00      72.84
     max    77.00   136.00   103.41  1758.00    2091.00      85.24

So I expect to score 79% of the total points available this year if I use my strategy of assigning weights to the favored team, ordered by point spread.

The next step is obviously to try some other strategies and see what the results are.  First I want to see how poorly a completely random strategy does.

Comments (1)

Automated Fantasy Pro Football Pickem Weights

I’m participating in the Yahoo! Sports Fantasy Pro Football Pickem with some family and friends.   I don’t regularly play fantasy sports, but I was coaxed into donating money to the pool during a family trip this summer.  I blame the beer and port.

For the uninitiated, each week you must pick a winner in all 16 games, assigning a confidence weight from 1 to 16 to each game.  If your pick wins, then you are awarded points equal to the confidence weight that you assigned to that game.

Now that my money’s in, I want to make a solid showing.  My strategy is to leverage the gambling lines, mapping the point spreads and over under lines to my picks and weights.

I threw together a little Python script to parse the Yahoo odds web page.  For each game, the script averages the point spreads and over under lines.  The favored team is selected, and the picks are ranked according to point spread, with wider spreads getting a larger weight.  Ties on point spread are broken by the over under line, with a larger over under line mapping to a larger weight.

Now I’m wondering how to improve this simple algorithm.  I might break ties by giving a smaller over under a higher weight.  The reasoning would be that smaller over under lines will have less variance.  Surely there are other ways to improve the rankings.

Here’s the table that my little script prints out.

weight | winner                         loser                     spread    o/u
================================================================================
    16 | Minnesota Vikings         over Detroit Lions              -9.83  45.17
    15 | Washington Redskins       over St. Louis Rams             -9.58  37.00
    14 | Green Bay Packers         over Cincinnati Bengals         -9.08  42.00
    13 | Tennessee Titans          over Houston Texans             -6.50  40.92
    12 | Atlanta Falcons           over Carolina Panthers          -6.08  42.50
    11 | Buffalo Bills             over Tampa Bay Buccaneers       -4.42  42.00
    10 | New England Patriots      over New York Jets              -3.42  46.17
     9 | Jacksonville Jaguars      over Arizona Cardinals          -3.08  42.50
     8 | Indianapolis Colts        over Miami Dolphins             -3.08  42.17
     7 | San Diego Chargers        over Baltimore Ravens           -3.00  40.33
     6 | Denver Broncos            over Cleveland Browns           -3.00  38.83
     5 | Kansas City Chiefs        over Oakland Raiders            -3.00  38.50
     4 | Pittsburgh Steelers       over Chicago Bears              -2.92  37.50
     3 | Dallas Cowboys            over New York Giants            -2.75  45.00
     2 | San Francisco 49ers       over Seattle Seahawks           -1.10  39.58
     1 | New Orleans Saints        over Philadelphia Eagles        -1.00  46.08

Leave a Comment

Google Code Jam: I advanced to Round 2!

Last night I participated in Round 1 of the 2009 Google Code Jam.  I thought all of the questions were difficult, but I squeaked by and advanced to Round 2.

I first worked on the Multi-base happiness problem.  I had never heard of happy numbers before, and wasn’t sure how to determine that a number was unhappy.  How many times should I iterate summing the squares before stopping?  Fortunately Wikipedia told me the remarkable property that the iterations either reduce to 1 or enter a cycle.  The other key to this problem is memoization.

I skipped Crossing the Road and tackled the Collecting Cards problem.  I first coded up a simulation.  This failed to generate results with sufficient precision.  I actually succeeded in coding up an alternate solution, but I didn’t submit it in time.  My solution walks down the expectation calculation, passing along the conditional probability of reaching each point.  This works fine, but I feel there must exist a super clean solution for this problem.

In any case, I was fortunate enough to pass on to the next round, where I’ll surely be destroyed.

Leave a Comment

Climbing Two Fourteeners in Colorado

Yesterday I hiked to the 14,271 foot peak of Mt Quandary.  Quandary Peak was my second fourteener.  A couple months ago, on July 4th, I summited Gray’s Peak.

Hiking reference guides declare that both hikes are easy by fourteener standards.  The key phrase there is “fourteener standards,” because they’re difficult.  Granted, I did both hikes just one day after flying in from New York, so I wasn’t acclimated to the altitude.  Even so, with a pack on your back and loose rocks under your feet, it’s always tough going past 13,000 feet.

Catching my breath on Mt Quandary's peak

Catching my breath on Mt Quandary's peak

The hike provides a lot of metaphors for business and software development.

  1. Ultimately, you only reach the top by repeatedly putting one foot in front of the other.  To ensure success, break down the journey into a series of short term, easy to attain goals.  That’s how you get things done.
  2. Experience helps.  On my first climb, even though I had read some trip reports, I was surprised at how difficult it was to keep moving in high altitudes.  I had to take breaks every 10 feet.  I wondered if the summit was worth the effort.  On the second hike, I expected to go slow, and I knew the summit was an ample reward.
  3. Although the ascent and summit contain all the glory, the descent is both more difficult and equally important.  It’s exhilarating to release a new product or major feature, but you have to be prepared to maintain it and complete the life cycle.
  4. If you need to remember part of your journey, then you have to document it.  On both descents there were several points where I had egregious recall of key landmarks.  For example, I’d think “wow, I really thought the tree line was much closer to this big white rock.  I can’t even see the tree line from here!”  Fortunately trails were well marked, so this wasn’t a danger.  In business, you must document key problems and decisions.  Otherwise your mind will surely blur them out over time.  If the trail isn’t well marked, you’ll get lost.
  5. You can find excitement in every stage in the journey, from shopping for supplies, to standing at the peak, to tasting a celebratory beer.

Comments (1)

Analysis of Google Code Jam Qualifier Results

I wrote a quick Python script to pull down all of the results from the qualifying rounds of Google Code Jam from 2008 and 2009.  I have put the rank and time of each submitted solution for all participants in these two Google Spreadsheets:

I also put a collection of summary statistics in this Google Spreadsheet.

Here are some conclusions drawn from these data:

  1. The number of participants increased by 1,449 people in 2009, a 20% increase.
  2. There were 2,572 people who participated both years.  This means 36% of the 2008 participants came back for more.
  3. The 2008 Qualifier had more difficult problems.  After normalizing the point totals (2008 had a max total of 75 points but 2009 had a max of 99 points), the average score was 15% higher in 2009.
  4. Problem C in 2008 was extremely hard.  Only 14% of the participants solved Problem C with the small data set, and only 9% solved Problem C with the large data set.
  5. The large data set for Problem C in 2009 was hard for many people.  Only 36% of people solved it.  Furthermore, of the people who solved the small data set, only 57% of those were able to solve the large data set.
  6. People worked roughly the same amount of time both years.  The 2009 times are all slightly larger, but the round was also extended by 2 hours because of technical problems early in the round.
  7. On average, for each participant, there is about a 3.5 hour gap between the submission time of the first solution and the submission time of the last solution.  Of course some people (like me) probably choose to tackle the problems whenever they found free moments during their day.

If people are interested, I’ll post my code and all the data somewhere.

Leave a Comment

Google Code Jam 2009 Qualifier

I finished all 3 problem in this year’s Google Code Jam.  I found them to be much easier than last year’s qualifier.  (Although last year a friend told me about the qualifier less than two hours before it ended, so I was rushed).

Last year I wrote my solutions in Haskell because I was learning the language.  This year I wasn’t so adventurous; I wrote my solutions using Python.

For the Alien Language challenge, I encoded the lexicon using a tree built with python dictionaries.  I imagine this could be done using a regex, but I haven’t thought that through.

For the Watersheds challenge, I simply built two 2d arrays: one to hold the topology and the other to hold the labels.  From each coordinate on the map, I followed the flow down hill until I reached a sink or reached a labeled coordinate.  This memoization ensures that you only visit each node once.

For the Welcome to Code Jam challenge, I memoized a mapping from (string_index, substring) to the number of times that substring can be completed starting at the index.  Again, this was straightforward.

I’m looking forward to seeing how other people solved each challenge.  I wonder how 16 year old Neal Wu made such quick work of the problems, solving all three in under 26 minutes.  That’s amazing.

Comments (6)

Follow

Get every new post delivered to your Inbox.