Jump to content

IMDB Top 100K Movies Analysis in Depth Part 1


Turk

Recommended Posts

By Bugra February 15, 2014

Data

Data is from IMDB and it includes all of the popularly voted 100042 movies from 1950 to 2013.(I know why 100000 is there but have no idea how 42 movies get squeezed. Instead of blaming my web scraping skills, I blame the universe, though).

The reason why I chose the number of votes as a metric to order the movies is because, generally the information (title, certificate, outline, director and so on) about movie are more likely to be complete for the movies that have high number of votes. Moreover, IMDB uses number of votes as a metric to determine the ranking as well so number of votes also correlate with the rating as well. Further, everybody at least has an idea on IMDB Top 250 or IMDB Top 1000 which are ordered by the ratings computed by IMDB.

Although the data is quite rich in terms of basic information, only year, rating and votes are complete for all of the movies. Only ~80% of the movies have runtime information(minutes). The categories are mostly 90% complete which could be considered good but the certificate information of the movies is the most sparse (only ~25% of them have it).

This post aims to explore data for diffferent aspects of data(categories, rating and categories) and also useful information(best movie in terms of rating or votes for each year).

I will let the data speak for itself by providing scatter plots and sometimes histograms as well as useful tables for movies.(I know tables are the worst but in some cases they are still useful.)

As categories go, I considered 23 categories although there are 25 categories in the top 100K movies. I removed Game-Show and TV-Reality movies as they have less than 5 movies in the top 100K movies.

Let's get started.

Best Rated Movies for Each Year

I put the rating as 8.5 and also filtered the movies that have less than 25000 votes as there are number of movies which get quite high ratings whereas the number of votes are quite small.
.
.
.
.
What is next?

I was planning to include best directors by rating and also actors and actresses. Which director is best or actor and actress in terms of rating? What about votes? Can we get the activity of directors over time looking at the movies that they direct?

Further, I got more ambitious if we could cluster the movies using outlines. To be able to determine the movie type from outline would be nice, wouldn't it? Stay tuned for the next week.

Full Story: Source

Link to comment
Share on other sites


  • 3 months later...
  • Replies 1
  • Views 1.9k
  • Created
  • Last Reply

Top Posters In This Topic

  • Turk

    1

  • mikso

    1

Top Posters In This Topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...