With the Oscars being less than 36 hours away, it seems that practically every fan of cinema is making their predictions for the winners of each category. No longer being an undergraduate, I was experiencing a little Oscar withdrawal last night after not being able to take part in the annual Oscar debate with
The Tiger.
The one thing I never understood when reading Oscarwatch, TheFilmexperience.net, and other sites, is how people place so much weight on pre-Oscar awards (I'm talking SAG, Director's Guild, Golden Globes, etc) but no one has ever taken the time to do some basic data analysis to really see which awards are the best predictors of Oscar glory.
So, I decided to take a little time and see if I might uncover anything interesting. You'll be surprised, to say the least.
Using the five past years of Oscar data, I used
regression analysis to see if I might uncover a model to predict which movie might take home the Best Picture Oscar this year. (Please, please let The Departed win!)
There are a number of predictors I thought would be especially valid for predicting Oscar success.
Looking at every Oscar Best Picture nominee of the past five years, I collected data on the following categories:
The movie's pre-nomination box-office gross
Is the film's director also nominated for Best Director?
How many Oscar nominations does the film have among the best acting and screenplay categories?
Did the film win a Golden Globe for Best Comedy or Drama?
Did the film win the Screen Actors Guild, Directors Guild, Producers Guild or American Cinema Editor's award?
I collected this data for Oscars from 2002 to 2006. I then ran a regression analysis to see how each variable predicts Oscar success and if the variable is even a good predictor.
Linear regression Number of obs = 25 F( 9, 15) = 50.97 Prob > F = 0.0000 R-squared = 0.8259 Root MSE = .21547
------------------------------------------------------------------------------ | Robust wonbestpic~e | Coef. Std. Err. t P>|t| -------------+---------------------------------------------------------------- boxoffice | .0000422 .0004562 0.09 0.927 bestdirector| -.0955407 .0630659 -1.51 0.151 otherOscar noms | .0055256 .0269864 0.20 0.841 wonGGdrama | .0379726 .0940788 0.40 0.692 wonGGcomedy | .0054738 .1385837 0.04 0.969 wonSAG | .2675211 .1854943 1.44 0.170 wonDG | .8988662 .1319837 6.81 0.000 . wonPGA | -.5649189 .2538095 -2.23 0.042 won ACE | .4189429 .2361956 1.77 0.096 _cons | .0135011 .1108538 0.12 0.905 |
So what does this tell us? First off, this model predicts Best Picture winners with 83 percent accuracy. Secondly and surprisingly, many of the predictors that we all analyze so relentlessly mean very little for the outcome of the Best Picture Oscar. (P>|t| tells us how strong a predictor the variable is. Because I'm only using five years of data, I'm ruling anything greater than .2 is not a strong predictor. a P>|t| value of .1, for instance, would indicate that the variable will be inaccurate 10 percent of the time.)
Box office success, other Oscar nominations, and the Golden Globes mean practically nothing when we control for other pre-Oscar awards.
Also, you should know that Robust Coef. represents if the variable is positively or negatively correlated to winning a Best Picture Oscar, and how strong that relationship is. Larger numbers mean the variable is a better predictor. (If all the Robust Coefficients for a film add to 1.525, it essentially means a 100% chance of winning the Oscar.)
Knowing this, notice anything else unusual? Notice that negative sign next to the variable representing a PGA win? That's right, this analysis say that winning the Producer's Guild is NEGATIVELY correlated with winning the Best Picture Oscar. But how can that be right?
Just to take another look at this data, let's drop these variables and only use the SAG, DG, PGA and ACE to predict a Best Picture Oscar.
Linear regression Number of obs = 25
F( 4, 20) = 46.88
Prob > F = 0.0000
R-squared = 0.8186
Root MSE = .19048
------------------------------------------------------------------------------
| Robust
Won Best Picture | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------------
wonSAG | .2477876 .1471156 1.68 0.108
wonDG | .8849558 .1086208 8.15 0.000
wonPGA | -.5132743 .2193453 -2.34 0.030
won ACE| .3982301 .2138609 1.86 0.077
_cons | -.0353982 .0241972 -1.46 0.159
------------------------------------------------------------------------------
Even when dropping these variables, the model loses almost no predictive power. This regression also confirms the predictive power of the SAG, DG, PGA, and ACE.
That's right, the PGA continues to look like a huge kiss of death for any hopefuly Oscar winner. But wait, there's more. The DGA variable has a positive sign and a coefficient of .88 - that's practically a guarantee that the winning film of the DGA will win the Best Picture Oscar.
So, our model to predict best picture success is now:
Best Picture Winner Odds = SAG X .247 + DG X .88 + PGA X -.51 + ACE X .398 + -.035.
When a film wins the SAG, DG, PGA or ACE, substitute 1 for that value. If the film does not win that award, substitute 0.
For example, when applying this model to Crash, a film that won the SAG and ACE, it would looks like this.
Best Picture Winner Odds = 1 X .247 + 0 x .88 + 0 x -.51 + 1 x .398 + -.035. This leads to:
Best Picture Winner Odds = .61. This is the highest point total achieved by a film nominated for Best Picture that year, so the model predicted Crash as the favorite in 2006.
So how does this model stack up to results from the past five years? Let's take a look. (Again, a 1 represents a win, and a 0 represents not winning that award.)
Movie | Won SAG |
| Won DG | Won PGA |
| Ace | Points |
Crash | 1 |
| 0 | 0 |
| 1 | 0.61 |
Brokeback Mountain | 0 |
| 1 | 1 |
| 0 | 0.3369 |
Good Night and Good Luck | 0 |
| 0 | 0 |
| 0 | -0.035 |
Capote | 0 |
| 0 | 0 |
| 0 | -0.035 |
Munich | 0 |
| 0 | 0 |
| 0 | -0.035 |
Million Dollar Baby | 0 |
| 1 | 0 |
| 0 | 0.8499 |
The Aviator | 0 |
| 0 | 1 |
| 1 | -0.15 |
Finding Neverland | 0 |
| 0 | 0 |
| 0 | -0.035 |
Ray | 0 |
| 0 | 0 |
| 1 | 0.363 |
Sideways | 1 |
| 0 | 0 |
| 0 | 0.212 |
Return of the King | 1 |
| 1 | 1 |
| 1 | 0.9819 |
Lost in Translation | 0 |
| 0 | 0 |
| 0 | -0.035 |
Master and Commander | 0 |
| 0 | 0 |
| 0 | -0.035 |
Mystic River | 0 |
| 0 | 0 |
| 0 | -0.035 |
Seabiscuit | 0 |
| 0 | 0 |
| 0 | -0.035 |
Chicago | 1 |
| 1 | 1 |
| 1 | 0.9819 |
Gangs of New York | 0 |
| 0 | 0 |
| 1 | 0.363 |
The Hours | 0 |
| 0 | 0 |
| 0 | -0.035 |
The Two Towers | 0 |
| 0 | 0 |
| 0 | -0.035 |
The Pianist | 0 |
| 0 | 0 |
| 0 | -0.035 |
A Beautiful Mind | 0 |
| 1 | 0 |
| 0 | 0.8499 |
Gosford Park | 1 |
| 0 | 0 |
| 0 | 0.212 |
In the Bedroom | 0 |
| 0 | 0 |
| 0 | -0.035 |
Fellowship of the Ring | 0 |
| 0 | 0 |
| 0 | -0.035 |
Moulin Rouge | 0 |
| 0 | 1 |
| 1 | -0.15 |
Looking at the predictions for each film, the model correctly predicts the winner of the Best Picture Oscar for every year, 2002-2006. It even predicted Crash over Brokeback Mountain, and it had Million Dollar Baby as the easy winner over The Aviator. This certainly provides more evidence that the PGA is a negative indicator of Oscar success, and the DG is a strong indicator.
Now, let's apply this data to Sunday's nominees.
Movie | Won SAG | Won DG | Won PGA | Ace | Points |
Babel | 0 | 0 | 0 | 1 | 0.363 |
Little Miss Sunshine | 1 | 0 | 1 | 0 | -0.301 |
Letters from Iwo Jima | 0 | 0 | 0 | 0 | -0.035 |
The Departed | 0 | 1 | 0 | 1 | 1.2479 |
The Queen | 0 | 0 | 0 | 0 | -0.035 |
Little Miss Sunshine takes a huge hit from winning the PGA, and The Departed gets a huge boost from winning the DG. Babel is still in the mix, although trailing The Departed. The Queen and Letters from Iwo Jima have no chance. (The Departed's score is greater than 1 due to it's ACE tie with Babel. It's an unusual situation.)
Does Little Miss Sunshine really have the least chance of winning out of all of the nominees? There are certainly some problems with this model. There's a 12 percent chance of
homoscedasticity, which can lead to problems. There's also the problem of only using data from the last five years, but I'm not terribly concerned about this as I was looking for the latest Oscar trends. But based on the past five years of Oscars, it certainly looks like it's a two horse race between The Departed and Babel.
Hope you've enjoyed this alternate analysis of the Oscar race. If others ask for it, I might perform a similar analysis for the directing and acting categories before Sunday night. Please leave your comments and suggestions.
Hope everyone enjoys the awards, and here's to The Departed taking home the gold!
Addendum: changed the terminology from "percentage" to "points." Sorry for the confusion, and thanks to Tim for the pointer.
Addendum 2: I went back to the 1998 Oscar (Titanic) and ran regressions on that data. Things became a lot more convoluted, and the model's predictive power dropped down to about 50%. Still pretty good, but not great. With the new data, the PGA award basically became useless, as it doesn't explain Oscar sucess at all when using data from 1998 to 2006. Box office revenues did become significant and with positive explanatory power, although not that significant. Every $15 million a movie makes gives it a small, small boost in its chances. Nominations in other acting and screenwriting categories did become significant and useful. For every screenwriting or acting nomination a film gets, its about 7 percent more likely to win. I can post the full results tomorrow, if anyone desires.
What I take away from these results is that in the long-run, its much more difficult to predict Oscar success. However, I still think we can spot trends in human behavior, and the original model spots the trends over the last five years pretty well, so I'm going to stick with it. We'll see what happens tomorrow night.
Labels: Academy Awards, Oscars, predictions