The Aptly Named, Sam's Blog


Playing with the IMDb dataset to find top movies that have multiple directors

#

I was playing with the pandas library on python and picked the IMDb dataset to explore.

To give myself a learning goal, I asked the following question:

What movies are generally regarded as the best that have multiple directors?

After some finagling the dataset (of multiple large CSV files) I arrived at the following list of twenty, in descending order of average rating:

  1. The Matrix (1999) • 8.7
    Lana Wachowski, Lilly Wachowski

  2. City of God (2002) • 8.6
    Fernando Meirelles, Kátia Lund

  3. The Intouchables (2011) • 8.5
    Olivier Nakache, Éric Toledano

  4. Avengers: Endgame (2019) • 8.4
    Anthony Russo, Joe Russo

  5. Avengers: Infinity War (2018) • 8.4
    Anthony Russo, Joe Russo

  6. No Country for Old Men (2007) • 8.2
    Ethan Coen, Joel Coen

  7. Monty Python and the Holy Grail (1975) • 8.2
    Terry Gilliam, Terry Jones

  8. Gone with the Wind (1939) • 8.2
    Victor Fleming, George Cukor, Sam Wood

  9. Everything Everywhere All at Once (2022) • 8.1
    Dan Kwan, Daniel Scheinert

  10. The Big Lebowski (1998) • 8.1
    Joel Coen, Ethan Coen

  11. Fargo (1996) • 8.1
    Joel Coen, Ethan Coen

  12. The Wizard of Oz (1939) • 8.1
    Victor Fleming, King Vidor, Richard Thorpe, Norman Taurog, Mervyn LeRoy, George Cukor

  13. Slumdog Millionaire (2008) • 8.0
    Danny Boyle, Loveleen Tandan

  14. Sin City (2005) • 8.0
    Frank Miller, Quentin Tarantino, Robert Rodriguez

  15. Captain America: Civil War (2016) • 7.8
    Joe Russo, Anthony Russo

  16. Captain America: The Winter Soldier (2014) • 7.8
    Anthony Russo, Joe Russo

  17. Little Miss Sunshine (2006) • 7.8
    Jonathan Dayton, Valerie Faris

  18. O Brother, Where Art Thou? (2000) • 7.7
    Joel Coen, Ethan Coen

  19. True Grit (2010) • 7.6
    Ethan Coen, Joel Coen

  20. The Butterfly Effect (2004) • 7.6
    Eric Bress, J. Mackye Gruber

Some thoughts on these results:

Notes on the data filtering and sorting process to get to the final list:

The script can be found here.

P.S. The dataset obviously has biases and those impact the results.