The demographics of dialogue

Last week, an interesting study got posted on Polygraph from Hanah Anderson & Matt Daniels.

The pair scoured over 2,000 feature screenplays, using the written dialogue to break down the character demographics of these mainstream movies. Specifically, disparities in gender and age.

The study wasn’t completely scientific however, as they themselves pointed out:

We don’t need to follow a perfectly structured academic study because…
1) This is the Internet. Not academia.
2) We’re publishing on a .cool domain, not an MIT Journal

Their methodology basically centered around extracting dialogue from particular script drafts, and extrapolating that data into gender and age categories.

It quickly becomes obvious that this selective dataset may lead to multiple limitations:

For each screenplay, we mapped characters with at least 100 words of dialogue to a person’s IMDB page (which identifies people as an actor or actress). We did this because minor characters are poorly labeled on IMDB pages.

One of the most interesting aspect in looking at “how” they came to their outcome is to discover which scripts (and versions) the study worked on.

Take a look at their Google doc spreadsheet to find the 2,000 scripts and their relevant source.
You’ll notice a hefty amount of Academy Award drafts, and multiple older versions of scripts; undoubtedly linked to the (lack of) “public availability” of shooting drafts.

The draft for Pixels turns out to be the 2013 version leaked during the Sony Hacks.
The script version used for The Big Short does not include, among other things, the Margot Robbie bathtub scene. (Admittedly not the best representation of a woman character in a feature.)

This isn’t to undermine the study. Their FAQ already tackles a lot of similar objections to their findings.

Given the sheer volume of data extracted, and regardless of how updated those drafts were, I would still consider this a fair bird’s-eye view–and indictment–of representation in mainstream film dialogue.

Just look at this gradient breakdown of words given to men and women from the 2,000 screenplays:

You can search for individual films in the website’s dataset.
It’s quite interesting (read: damning) to see where some cinematic classics fall on.

One is pleased to learn that Pokémon: The First Movie: Mewtwo Strikes Back is at a 54/46 split, thanks to Ash Ketchum and Meowth being voiced by ladies.

Frozen however, with its two woman leads, still ends up with a 57/43 breakdown for men. Place your bets on the non-stop talking sidekick character voiced by Josh Gad.
The same goes for Mulan, which gets a 75/25 split. (Damn it, Mushu!)

No word on any of the Mad Max movies.

The stats are equally as sobering when it comes to age:
– 39% of male dialogue written for men 42 to 65-year old.
– 38% of female dialogue written for women 22 to 31-year old.
The male/female curve-bells are actually the exact opposite. There are more male roles available the older actors get, while roles for women over 40 decrease dramatically.

So, where does this leave us?

Well, there is a critical limitation to this study that we need to address–
What we are talking about here is dialogue relating to specific drafts of specific screenplays.
And given the topic at hand, the natural follow-up question to that statement could be:
Is it a fair assessment of representation to reduce the entire issue only through the amount of words said by a character?

I would argue this approach limits the discourse (no pun intended).
Which is why we should be asking a different question to begin with–

What is an accurate gauge of representation?

Screen time? Number of characters? Nuanced portrayals?
Probably no unique correct answer among those. Nor should there be.

Fair representation is an ongoing dialogue with which our industry is still struggling.
As long as this discussion continues–with more findings, more light being shed on specific issues–the closer we will be to addressing the problems at hand.

Incidentally, there is another conversation going on currently about television representation and the writers’ relationship with their fandom. (The 100, Sleepy Hollow, casting/staffing diversity…)
I won’t address much (or any) of it here since this is a post (or many) on to their own.
For now, I’ll just direct you to read some of the tweets from the past two days by Terminator: The Sarah Connor Chronicles‘ Josh Friedman and Agent Carter‘s Jose Molina.

In fact, I don’t have a groundbreaking revelation to add right now, if only to remind people that “representation” is an amalgam of factors.
It isn’t just how much you say. It’s also what you say, how you say it, and why you say it.
Quantifying any of these values is pretty much impossible since they are mostly a matter of perspective, not objective data.
The one thing we can all do is be mindful of the current landscape, and continue to improve on it.

Write on.

Looking to start your TV writing journey?

The demographics of dialogue

Share this:

Related posts: