Monday, February 26, 2007

4 Different Blog Posts on "your data"

Here are some interesting stories in the news today.


http://www.bloghash.com/2007/02/smooth-upgrade-to-wordpress-211-in-8-steps/ upgrading blog version without losing old content

http://amix.dk/blog/viewEntry/19092 using gmail for some backups

http://blog.totalkiss.com/2007/02/youtube_crashes.html youTube & google "crash"

http://thecontentwrangler.com/article/web_based_content_management_is_your_data_saasafe/
saas content safety

4 Different Blog Posts on "your data"

Here are some interesting stories in the news today.


http://www.bloghash.com/2007/02/smooth-upgrade-to-wordpress-211-in-8-steps/ upgrading blog version without losing old content

http://amix.dk/blog/viewEntry/19092 using gmail for some backups

http://blog.totalkiss.com/2007/02/youtube_crashes.html youTube & google "crash"

http://thecontentwrangler.com/article/web_based_content_management_is_your_data_saasafe/
saas content safety

Saturday, February 24, 2007

Predicting The Best Picture Winner



With the Oscars being less than 36 hours away, it seems that practically every fan of cinema is making their predictions for the winners of each category. No longer being an undergraduate, I was experiencing a little Oscar withdrawal last night after not being able to take part in the annual Oscar debate with The Tiger.

The one thing I never understood when reading Oscarwatch, TheFilmexperience.net, and other sites, is how people place so much weight on pre-Oscar awards (I'm talking SAG, Director's Guild, Golden Globes, etc) but no one has ever taken the time to do some basic data analysis to really see which awards are the best predictors of Oscar glory.

So, I decided to take a little time and see if I might uncover anything interesting. You'll be surprised, to say the least.

Using the five past years of Oscar data, I used regression analysis to see if I might uncover a model to predict which movie might take home the Best Picture Oscar this year. (Please, please let The Departed win!)

There are a number of predictors I thought would be especially valid for predicting Oscar success.

Looking at every Oscar Best Picture nominee of the past five years, I collected data on the following categories:

The movie's pre-nomination box-office gross
Is the film's director also nominated for Best Director?
How many Oscar nominations does the film have among the best acting and screenplay categories?
Did the film win a Golden Globe for Best Comedy or Drama?
Did the film win the Screen Actors Guild, Directors Guild, Producers Guild or American Cinema Editor's award?

I collected this data for Oscars from 2002 to 2006. I then ran a regression analysis to see how each variable predicts Oscar success and if the variable is even a good predictor.

Linear regression Number of obs = 25
F( 9, 15) = 50.97
Prob > F = 0.0000
R-squared = 0.8259
Root MSE = .21547

------------------------------------------------------------------------------
| Robust
wonbestpic~e | Coef. Std. Err. t P>|t|
-------------+----------------------------------------------------------------
boxoffice | .0000422 .0004562 0.09 0.927
bestdirector| -.0955407 .0630659 -1.51 0.151
otherOscar noms | .0055256 .0269864 0.20 0.841
wonGGdrama | .0379726 .0940788 0.40 0.692
wonGGcomedy | .0054738 .1385837 0.04 0.969
wonSAG | .2675211 .1854943 1.44 0.170
wonDG | .8988662 .1319837 6.81 0.000 .
wonPGA | -.5649189 .2538095 -2.23 0.042
won ACE | .4189429 .2361956 1.77 0.096
_cons | .0135011 .1108538 0.12 0.905


So what does this tell us? First off, this model predicts Best Picture winners with 83 percent accuracy. Secondly and surprisingly, many of the predictors that we all analyze so relentlessly mean very little for the outcome of the Best Picture Oscar. (P>|t| tells us how strong a predictor the variable is. Because I'm only using five years of data, I'm ruling anything greater than .2 is not a strong predictor. a P>|t| value of .1, for instance, would indicate that the variable will be inaccurate 10 percent of the time.)

Box office success, other Oscar nominations, and the Golden Globes mean practically nothing when we control for other pre-Oscar awards.

Also, you should know that Robust Coef. represents if the variable is positively or negatively correlated to winning a Best Picture Oscar, and how strong that relationship is. Larger numbers mean the variable is a better predictor. (If all the Robust Coefficients for a film add to 1.525, it essentially means a 100% chance of winning the Oscar.)

Knowing this, notice anything else unusual? Notice that negative sign next to the variable representing a PGA win? That's right, this analysis say that winning the Producer's Guild is NEGATIVELY correlated with winning the Best Picture Oscar. But how can that be right?

Just to take another look at this data, let's drop these variables and only use the SAG, DG, PGA and ACE to predict a Best Picture Oscar.

Linear regression Number of obs = 25

F( 4, 20) = 46.88

Prob > F = 0.0000

R-squared = 0.8186

Root MSE = .19048

------------------------------------------------------------------------------

| Robust

Won Best Picture | Coef. Std. Err. t P>|t|

-------------+----------------------------------------------------------------

wonSAG | .2477876 .1471156 1.68 0.108

wonDG | .8849558 .1086208 8.15 0.000

wonPGA | -.5132743 .2193453 -2.34 0.030

won ACE| .3982301 .2138609 1.86 0.077

_cons | -.0353982 .0241972 -1.46 0.159

------------------------------------------------------------------------------


Even when dropping these variables, the model loses almost no predictive power. This regression also confirms the predictive power of the SAG, DG, PGA, and ACE.

That's right, the PGA continues to look like a huge kiss of death for any hopefuly Oscar winner. But wait, there's more. The DGA variable has a positive sign and a coefficient of .88 - that's practically a guarantee that the winning film of the DGA will win the Best Picture Oscar.

So, our model to predict best picture success is now:

Best Picture Winner Odds = SAG X .247 + DG X .88 + PGA X -.51 + ACE X .398 + -.035.
When a film wins the SAG, DG, PGA or ACE, substitute 1 for that value. If the film does not win that award, substitute 0.

For example, when applying this model to Crash, a film that won the SAG and ACE, it would looks like this.

Best Picture Winner Odds = 1 X .247 + 0 x .88 + 0 x -.51 + 1 x .398 + -.035. This leads to:

Best Picture Winner Odds = .61. This is the highest point total achieved by a film nominated for Best Picture that year, so the model predicted Crash as the favorite in 2006.

So how does this model stack up to results from the past five years? Let's take a look. (Again, a 1 represents a win, and a 0 represents not winning that award.)

Movie Won SAG
Won DG Won PGA
Ace Points
Crash 1
0 0
1 0.61
Brokeback Mountain 0
1 1
0 0.3369
Good Night and Good Luck 0
0 0
0 -0.035
Capote 0
0 0
0 -0.035
Munich 0
0 0
0 -0.035
Million Dollar Baby 0
1 0
0 0.8499
The Aviator 0
0 1
1 -0.15
Finding Neverland 0
0 0
0 -0.035
Ray 0
0 0
1 0.363
Sideways 1
0 0
0 0.212
Return of the King 1
1 1
1 0.9819
Lost in Translation 0
0 0
0 -0.035
Master and Commander 0
0 0
0 -0.035
Mystic River 0
0 0
0 -0.035
Seabiscuit 0
0 0
0 -0.035
Chicago 1
1 1
1 0.9819
Gangs of New York 0
0 0
1 0.363
The Hours 0
0 0
0 -0.035
The Two Towers 0
0 0
0 -0.035
The Pianist 0
0 0
0 -0.035
A Beautiful Mind 0
1 0
0 0.8499
Gosford Park 1
0 0
0 0.212
In the Bedroom 0
0 0
0 -0.035
Fellowship of the Ring 0
0 0
0 -0.035
Moulin Rouge 0
0 1
1 -0.15




Looking at the predictions for each film, the model correctly predicts the winner of the Best Picture Oscar for every year, 2002-2006. It even predicted Crash over Brokeback Mountain, and it had Million Dollar Baby as the easy winner over The Aviator. This certainly provides more evidence that the PGA is a negative indicator of Oscar success, and the DG is a strong indicator.


Now, let's apply this data to Sunday's nominees.

Movie Won SAG Won DG Won PGA Ace Points
Babel 0 0 0 1 0.363
Little Miss Sunshine 1 0 1 0 -0.301
Letters from Iwo Jima 0 0 0 0 -0.035
The Departed 0 1 0 1 1.2479
The Queen 0 0 0 0 -0.035

Little Miss Sunshine takes a huge hit from winning the PGA, and The Departed gets a huge boost from winning the DG. Babel is still in the mix, although trailing The Departed. The Queen and Letters from Iwo Jima have no chance. (The Departed's score is greater than 1 due to it's ACE tie with Babel. It's an unusual situation.)

Does Little Miss Sunshine really have the least chance of winning out of all of the nominees? There are certainly some problems with this model. There's a 12 percent chance of homoscedasticity, which can lead to problems. There's also the problem of only using data from the last five years, but I'm not terribly concerned about this as I was looking for the latest Oscar trends. But based on the past five years of Oscars, it certainly looks like it's a two horse race between The Departed and Babel.

Hope you've enjoyed this alternate analysis of the Oscar race. If others ask for it, I might perform a similar analysis for the directing and acting categories before Sunday night. Please leave your comments and suggestions.

Hope everyone enjoys the awards, and here's to The Departed taking home the gold!


Addendum: changed the terminology from "percentage" to "points." Sorry for the confusion, and thanks to Tim for the pointer.


Addendum 2: I went back to the 1998 Oscar (Titanic) and ran regressions on that data. Things became a lot more convoluted, and the model's predictive power dropped down to about 50%. Still pretty good, but not great. With the new data, the PGA award basically became useless, as it doesn't explain Oscar sucess at all when using data from 1998 to 2006. Box office revenues did become significant and with positive explanatory power, although not that significant. Every $15 million a movie makes gives it a small, small boost in its chances. Nominations in other acting and screenwriting categories did become significant and useful. For every screenwriting or acting nomination a film gets, its about 7 percent more likely to win. I can post the full results tomorrow, if anyone desires.

What I take away from these results is that in the long-run, its much more difficult to predict Oscar success. However, I still think we can spot trends in human behavior, and the original model spots the trends over the last five years pretty well, so I'm going to stick with it. We'll see what happens tomorrow night.

Labels: , ,

Wednesday, February 21, 2007

bandwagon and bloggingstocks

Some interesting posts for the day:

From bloggingstocks.com, here is an article on backup. It talks about how we should perform backups, different ways to do backups, and some options to pick from. Specifically they address the issue of "essentially trusting your data to someone else".

http://www.bloggingstocks.com/2007/02/19/technology-for-the-rest-of-us-back-it-up-baby/

Here is an interesting new service - bandwagon (http://ridethebandwagon.com/) - which provides online iTunes backup. They are suppose to launch tomorrow.

bandwagon and bloggingstocks

Some interesting posts for the day:

From bloggingstocks.com, here is an article on backup. It talks about how we should perform backups, different ways to do backups, and some options to pick from. Specifically they address the issue of "essentially trusting your data to someone else".

http://www.bloggingstocks.com/2007/02/19/technology-for-the-rest-of-us-back-it-up-baby/

Here is an interesting new service - bandwagon (http://ridethebandwagon.com/) - which provides online iTunes backup. They are suppose to launch tomorrow.

Friday, February 16, 2007

Look-no hands

I think it is great that people are beginning to understand the importance of backing up their content. There are new posts almost daily with the "work-arounds" users have figured out.

Here are another 2 entries in the "thanks, but do I have to?" blog back up methods category:
http://www.ashbaughonline.com/2007/02/15/blog-backup-strategy/

http://www.doughellmann.com/projects/BlogBackup/

The point is that these are work-arounds developed by people who understand how all the pieces fit together.
But-blogging has gotten simple enough for everyday users like me to join in, and I don't understand most of what is happening in the background. I don't want to. That's what this Web-as-platform, social-networking, join-the-community explosion is all about.

Look-no hands

I think it is great that people are beginning to understand the importance of backing up their content. There are new posts almost daily with the "work-arounds" users have figured out.

Here are another 2 entries in the "thanks, but do I have to?" blog back up methods category:
http://www.ashbaughonline.com/2007/02/15/blog-backup-strategy/

http://www.doughellmann.com/projects/BlogBackup/

The point is that these are work-arounds developed by people who understand how all the pieces fit together.
But-blogging has gotten simple enough for everyday users like me to join in, and I don't understand most of what is happening in the background. I don't want to. That's what this Web-as-platform, social-networking, join-the-community explosion is all about.

Monday, February 12, 2007

Hackers go Global

Check out this latest news story about hackers attempting to get in to computers and your stuff EVERY 39 SECONDS.

http://www.newswise.com/articles/view/527132/

Hackers managed to briefly overwhelmed at least three of the 13 computers that help manage global computer traffic Tuesday February 8, 2007.

Hackers go Global

Check out this latest news story about hackers attempting to get in to computers and your stuff EVERY 39 SECONDS.

http://www.newswise.com/articles/view/527132/

Hackers managed to briefly overwhelmed at least three of the 13 computers that help manage global computer traffic Tuesday February 8, 2007.

Whose Content is it?

The number of stories about the need to back up your blogs is almost overwheming at this moment.
The link below gets into interesting territory. The issue of who controls your postings and content is really highlighted. http://battellemedia.com/archives/003355.php

Whose Content is it?

The number of stories about the need to back up your blogs is almost overwheming at this moment.
The link below gets into interesting territory. The issue of who controls your postings and content is really highlighted. http://battellemedia.com/archives/003355.php

Friday, February 09, 2007

Control Your Content

Hi, my name is Alison. I joined the Techrigy team this week. BlogBackupOnline.com will be in private beta by the end of next week and the caffiene is flowing. If you want to add your name to our list of testers send an email to beta@blogbackuponline.com.

Blog are not immune to the evils that come at us over the web. We download security updates, scan our emails, protect our PCs and back up our data. Our web based content is more vunerable-making stories like the one below inevitable.

Bloggers get Hacked is a catchy headline. It’s a short piece about being “under attack”, and how personal it feels. Starting up a blog is essentially joining a community, and we want to feel secure within that community. Feeling secure requires that we have control of ourselves and our “stuff” and this little piece highlights the importance of that control.

BlogBackupOnline allows you to take control of your blogs. Back up your remote data to our secure data center. Export it, restore it, receive it on DVD-it's under your control.

Control Your Content

Hi, my name is Alison. I joined the Techrigy team this week. BlogBackupOnline.com will be in private beta by the end of next week and the caffiene is flowing. If you want to add your name to our list of testers send an email to beta@blogbackuponline.com.

Blog are not immune to the evils that come at us over the web. We download security updates, scan our emails, protect our PCs and back up our data. Our web based content is more vunerable-making stories like the one below inevitable.

Bloggers get Hacked is a catchy headline. It’s a short piece about being “under attack”, and how personal it feels. Starting up a blog is essentially joining a community, and we want to feel secure within that community. Feeling secure requires that we have control of ourselves and our “stuff” and this little piece highlights the importance of that control.

BlogBackupOnline allows you to take control of your blogs. Back up your remote data to our secure data center. Export it, restore it, receive it on DVD-it's under your control.