How diverse are people’s TV tastes?

When people watch TV, do they tend to stick to watching shows belonging to one kind of genre, a few different genres, or many different genres?

To try to get at this question, I looked at reddit’s various television communities (i.e., subreddits) and the commenting behavior of their members. If members were generally diverse in their TV taste, then we might find that they comment on shows across different genres. On the other hand, if their TV taste is a little bit narrower, they might tend to comment on shows that fall within the same genre.

Warning! This post is a bit longer than my previous ones.

ezgif-com-optimize

tenor1

The data I used here is a subset of the dataset of reddit comments collected (through a Herculean effort) by /u/Stuck_In_the_Matrix. The dataset itself is being hosted on Google’s BigQuery by /u/fhoffa. Check out this reddit thread for more information. For my dataset, I pulled all comments made on various television subreddits made between Jan 2015-Feb 2017; the list of relevant television subreddits were gathered from r/television’s wikipage. As you can see, the different shows have already been divided into four genres: Comedy, Animated, Drama, and Sci-Fi. While I don’t think shows always fall into one category, I’ll use r/television’s categorization for now.

For more information on how I pulled the dataset from BigQuery and the subsequent cleaning I did (and the logic behind the criteria I chose), please check out the scripts on my github repo. Essentially, I selected subreddits that had at least 5200 comments and selected authors/users that had commented on at least 2 subreddits and made at least 20 comments in a given subreddit. From there, we can calculate, for an individual user, the total number of comments they made in each of the four genres.

To get a sense of how diverse subreddit users are in terms of their TV taste, I’ll be using a measure called entropy. Entropy can be generally thought of as a measure of disorder or uncertainty. The linked wiki page has a pretty good example but in the current context, we can think of the entropy measure in the following way: let’s imagine we are trying to predict, for a particular user, what genre of TV show they will comment on. If they have generally commented on one genre, we would have a pretty good idea / can say with fairly strong certainty that the next comment they make will be in that genre. In that sense, the entropy is low — because the amount of uncertainty is low. However, if the user has previously commented across different genres, it becomes a little harder to predict which genre the next comment will fall in. In that sense, the entropy is high because we have greater uncertainty in our prediction.

Thus, for our entropy measure, we can interpret a low entropy to mean that the individual is not very diverse in their TV taste as they tend to comment on one or a few genres. In particular, an entropy of 0 would mean that the individual has commented on only one genre and there is, statistically, no uncertainty in our ability to predict what genre their next comment will be on. A high entropy would mean that the individual is more diverse in their TV taste as they have commented on more genres, making us a little bit more uncertain regarding what genre their next comment will belong in. For our current analysis, the maximum entropy is 1.4 [ln(4), because there are 4 genres to comment on, and ln because the entropy calculations use log base e, i.e. the natural log], which would mean that the individual makes comments across all four genres at the same frequency.

Here is a histogram that details the number of individuals that fall into different entropy score bins.

histogram of entropy-1

As one can see, while there is good number of people who show some diversity in their TV preference, there are very few who are sampling across all four TV genres at similar rates (i.e., few have entropy scores close to the max 1.4). Moreover, there is also a substantial amount of people who only comment in only one genre (i.e., those who have an entropy score of 0) suggesting that these individuals are a little bit narrow in their TV taste.

Let’s further look into the individuals that I call “one-genrers.”

Individuals who comment on only one genre

We can first ask whether people are also selective even within their preferred genre. That is, do they tend to comment on a select few shows or do they comment on many different shows within the same genre. Again, we can calculate an individual’s entropy in the single genre that they comment on. Because the four genres differ in the number of shows that fall within their genre, each will have a different maximum entropy score. They are as follows: (a) Animated: 2.89, (b) Comedy: 3.40, (c) Drama: 3.9, (d) Sci-Fi: 3.30.

Here is a histogram of entropy scores for each genre. You’ll notice that this plots the proportion of individuals that fall within the particular entropy bins. This is to account for the fact that the four genres differ in the number of individuals who are “one-genrers.”

nomralized plot-1

Examining these histograms, it appears that while a large number of individuals are commenting on a couple shows of their preferred genre, there are nonetheless a substantial number of individuals who comment across multiple shows. However, I should note that even for these individuals, their taste is a little bit narrow, as no individual in any genre has an entropy score close to the maximum. So, like the overall pattern we found across genres, within a particular genre we see individuals who stick to a couple shows as well as a group of individuals showing more (although not overwhelming) diversity in their TV taste.

I want to talk about one more analysis that I did on these “one-genres.” Given that there are individuals who do appear to stick to commenting on just a couple shows, a follow-up question would be to examine whether there are shows that users commonly enjoy together, such that commenting on one show would mean the user would also likely to comment on another. For example, one could imagine that there might be some overlap in those who comment on, say, the 30 Rock subreddit, and the Unbreakable Kimmy Schmidt subreddit (since they share creators and the same sense of humor). To investigate this question, I performed a Latent Semantic Analysis on the individual shows that fall into each of the four genres. This analysis was inspired by the FiveThirtyEight article that used this technique to gain insight into r/The_Donald. Essentially the analysis is used to quantify the amount of overlap in users/commenters across different subreddits.

For the Animated genre, we see the following shows sharing similar redditors:

  • American Dad and Futurama
  • Gravity Falls and Steven Universe

For Comedies: 

  • 30 Rock and Unbreakable Kimmy Schmidt
  • The Big Bang Theory and Seinfeld
  • Brooklyn Nine-Nine and New Girl
  • The Office and Parks and Rec
  • SNL and Veep

For Dramas:

  • Daredevil and Jessica Jones
  • Fear The Walking Dead and The Walking Dead
  • How to Get Away With Murder and Scandal
  • Orange is the New Black and Peaky Blinders
  • Outlander and Vikings 

And finally, for Sci-Fi shows:

  • The Flash and Legends of Tomorrow

Overall, I think the pattern of results are largely as expected. That is, shows that overlap are ones which one might guess share a fandom. However, there was one surprise: Seinfeld and The Big Bang Theory! This surprised me mainly because a) I intuitively think they very different shows (although admittedly, I don’t really watch The Big Bang Theory) and b) (to me) they come from different eras of TV. I’ll be very interested to hear what fans of either (or both) shows think about this overlap!

Individuals who comment on more than one genre

Lastly, we can re-focus our analysis on those individuals who commented on more than just one genre (i.e., those that had an entropy greater than 0). Here, I wanted to investigate to what extent preferences to different genres are related/correlated. That is, if an individual prefers to comment on animated shows, will they be more likely to comment on, say, comedy shows as well? For this, I just ran a simple correlation on users’ comment frequencies across the four genres.

Animated Comedy Drama Sci-Fi
Animated 1.000 0.058 -0.232 -0.322
Comedy 1.000 -0.174 -0.340
Drama 1.000 0.235
Sci-Fi 1.000

What we find is that when one comments on animated or comedy shows, they tend to be less likely to comment on dramas or sci-fi shows. Additionally, if one tends to comment on dramas, they are also more likely to comment on sci-fi shows (and vice versa).

Going back to the original question: How diverse are our TV taste?

Here, I’ve measured TV taste by how often one comments on particular TV subreddits. What we find is that while there is a large group of individuals who typically stick to one genre of television, there is also a group of individuals who show some diversity in their TV taste. For this latter group, we nonetheless still see some disparity in their TV taste. Rather than commenting on shows that spread across the four genres, we find that individuals who tend to watch/comment on dramas also tend to watch/comment on sci-fi shows. Conversely, fans of animated and comedy shows don’t seem to be as big of a fan of dramas and sci-fi shows. Lastly, for those who comment on shows that all fall into the same genre, we again see some diversity within a genre. Although given the small to medium entropy scores and distinct shows that overlap in their users/commenters, the diversity might be somewhat constrained. In sum, the analyses together suggest that while people do show diversity in their TV taste, it’s not a lot — individuals tend to like to stick to a few genres and shows.

As an end note, I should revisit the fact that I’m using people’s commenting history as a proxy for their TV taste. It might be the case that people watch shows but never comment on them. This would mean that I’m actually underestimating the diversity of some people’s TV taste. Similarly, the amount one comments on a particular show may not necessarily reflect how big of a fan one is of the show. I acknowledge that these are indeed limitations of the current project. However, given that the analyses here suggest some diversity (even if the degree is maybe underestimated here), I think the results of this project nonetheless provide some interesting insights, e.g., what genres do people typically enjoy together, what kind of shows have overlapping fandom, etc. In the future, it might be interesting to see if performing similar analyses on other communities will replicate the patterns found here.