# Who was the lead character in Friends? The Data Science Answer

It has been more than 13 years since the last episode of Friends aired. But we never stop talking about it. Do we? I do not remember the last time I had a pizza without watching a random episode of Friends.

While watching Ross’ hilarious performance in one of my favorite episodes, “The One With Ross’ Tan”, I started thinking, who actually was the lead character in Friends? Was it one of Ross and Rachel with their everlasting love angle? Was it Chandler with his sarcastic comedy? Was it the cleanliness freak Monica? Was it the ladies’ man Joey? Or was it our favorite singer Phoebe?

Ask around, and you will receive different answers. But what does the data say? Lets find out the data science answer to who was the lead character in Friends.

To determine an answer to this question, I downloaded transcripts of the ten seasons from this amazing fan site. I have used different parameters to find out who stood out among our six friends.

The entire analysis has been done in R. I converted the raw transcript files to a structured tabular form followed by an exploratory data analysis.

You can find the R code in my GitHub repository.

Let’s start by looking at how many lines each of them had. I could not get the actual screen time of the characters but I think the number of lines would give us a reasonable estimate.

Rachel and Ross both had 9k+ lines. This was really close. While Phoebe had the least number of lines – 7.4k. Chandler, Monica and Joey had almost the same number. Let’s also look at the number of lines per season.

It seems that the writers have done a really good job in distributing the number of lines among the six friends. Ross had the most number of lines in the first three and the last season. Well, these were the times when the Ross-Rachel angle was in its prime. Could this be a reason?

Chandler had the most number of lines in season 4 and 6 and by a very small margin, Joey beats him in season 5. Rachel dominates seasons 7 to 9. Monica maintains her position in the top half in almost all the seasons.

Phoebe gets the short end of the stick with the fewest lines in most of the seasons. But she had her own ways to make us fall in love with her. Didn’t she?

Okay, now that we have looked at the lines I was wondering if the number of words also had a similar  distribution.

Well, the number of words more or less follow the same distribution. So who is leading up till now? I would say there is a tough competition between Ross and Rachel.

Now, let’s have a look at the number of screen appearances. I have considered a character to be present in a scene only if they had a line to speak.

Okay. This is interesting. Chandler with 1400+ scenes had the most number of screen appearances. Though, Ross and Rachel were not very far behind with around 1330 and 1370 appearances respectively. Phoebe had the least number of appearances.

The number of individual scenes can also be a good parameter to answer our question. For this part, I consider scenes in which only one character among the six was present. This also counts scenes where other supporting characters were present but with only one of these six.

Well, Ross is a clear winner in this category. Rachel is not even close this time.

Let’s move on to one last parameter. I also want to find out how many times they were mentioned in the episode title. Let’s have a look.

So Rachel takes the lead this time. She was mentioned 27 times followed by Ross who was mentioned 24 times in the episode titles. Monica was last with only 8 mentions.

Let’s do a quick recap.

• Rachel and Ross had the most number of lines with a negligible difference between them. The number of words also showed the same pattern.
• Chandler had the most number of screen appearances but again Ross and Rachel were not very far behind.
• Ross was a clear winner with the most number of individual screen appearances. Rachel was not even close.
• Rachel had the most number of mentions in the episode titles. But Ross was just 3 episodes behind.

It is really close between Ross and Rachel. There was very little difference in most of the parameters. Ross, however, beats Rachel in the individual scene appearances by a wide margin. Hence, I will have to give it to Ross.

It was really fun doing this. I came across some other facts that you might find interesting.

• There was only one scene in the entire series where Ross, Monica, Joey, and Phoebe were present without Chandler, Rachel or anyone else. The episode was “The One with the Blackout” where Chandler gets stuck in the ATM vestibule.
• Chandler and Joey as a pair had the most number of screen appearances than any other pair.
• Other than the 6 main characters, Gunther had the most number of screen appearances.
• Of all the scenes, Central Perk had around 16% of the scenes and Monica’s apartments made 26% of them.
• Phoebe had more number of appearances in Central Perk than anyone else. Of all scenes of Central Perk, Phoebe was seen in around two-thirds of them.

• Of course, no surprises here. Monica had the most number of appearances in scenes shot in her apartment followed by Chandler and then Rachel.

Thanks a lot. I hope you found this interesting. Stay tuned for more. 🙂

## 27 thoughts on “Who was the lead character in Friends? The Data Science Answer”

1. […] Who was the lead character in Friends? The Data Science Answer […]

Like

2. I kinda think Monica is the main character

Liked by 1 person

3. […] year, a dude named Yashu Seth used the scripts from all 10 seasons of the show to determine who the main character […]

Like

4. Sadly your research parameters are flawed by assuming that the most lines by a character means they are the main character. If you followed the show, you would see of course Ross has the most lines cause his character talks a lot. The parameter to judge a main character is to see how many storylines in the series centered on a certain character. In that respect, you see who the show’s main focus is. Rachel clearly has the most storylines and if you remove her from the show, it is not the same. In fact the first episode centers on her running out from her wedding.

Like

5. […] Blogger Yashu Seth found via examining the scripts of all 10 seasons that Ross and Rachel had the most lines – around 9,000 each – well ahead of the others. […]

Like

6. […] In zijn blog legt Yashu uit hoe hij te werk is gegaan. Hij downloadde alle transcripten van de tien seizoenen en gebruikte verschillende meettechnieken om erachter te komen wie van de zes vrienden het meest naar voren kwam tijdens alle afleveringen. Hoewel hij niet achter de precieze tijd heeft kunnen komen hoe lang alle personages in beeld waren, heeft hij wel het aantal zinnen kunnen meten. Zo hadden Ross en Rachel de meeste zinnen (beide meer dan 9.000 in totaal) en Phoebe de minste (rond de 7.000). […]

Like

7. […] while they were watching their favourite episode of the iconic series that they started to wonder who would take the credit as lead character. After being inspired by watching “the one with Ross’ tan” – the episode […]

Like

8. I just think contrary to the statistics. Joey is the only one who got a spin-off show, it was even titled “Joey”.

Like

1. Sam from iCarly and Cat from Victorious got a spin-off series but they’re not the main characters. Spin-offs are usually (from what I have seen) for the supporting characters.

Like

9. […] à plates coutures lorsque l’on compare les apparitions individuelles des deux protagonistes », conclut le chercheur. « Aussi, il y a très peu de différences entre eux en ce qui concerne les autres paramètres. […]

Like

10. Hi I was wondering where you got your dataset from?

Like

1. whoops ignore me thanks!!

Like

11. Nice job! I did a study using centrality measures commonly used to analyse social networks (network theory) and found out that there are differences in betweenness (a measure of good a character connects other in the show) but not in degree (number of interactions); degrees are nearly the same. This means that different measures lead to different conclusions.
My study also agrees with yours in that Monica seems really to be the queen of her apartment but not a good connector.
For those interested in this, the article is available here:
https://arxiv.org/abs/1804.04408
It is long but written (hopefully) in a light way. It also includes investigations about the structure of the groups (community detection).
Enjoy
Ana Bazzan

Like

12. Thanks a lot for sharing 🙂

Like

13. […] visar det sig att dataanalytikern Yashu Seth anser att Ross är den egentliga huvudrollsinnehavaren. Detta efter att han analyserat manuset och […]

Like

14. Hi Yashu, nice job! Thank you for the wonderful analysis ! I have one question, how did you account the number of individuals screen appearance? Thank you for helping me out !

Like

1. The dataset I created had one row with each scene. So I counted the number of rows for each character that had zero lines for the other five characters.

Like

15. […] recently, Friends fanatic and data scientist Yashu Seth’s research has been making its rounds on the internet as he tried to figure out who among the six […]

Like

16. […] recently, Friends fanatic and data scientist Yashu Seth’s research has been making its rounds on the internet as he tried to figure out who among the six […]

Like

17. where did you get the original data set?

Like

18. I’m guessing you had a bit of free time…

Like

19. […] Blogger Yashu Seth found via examining the scripts of all 10 seasons that Ross and Rachel had the most lines – around 9,000 each – well ahead of the others. […]

Like

20. […] can read the entire blog here by Yashu […]

Like

21. […] posted his findings on his website and through analysing hours of data, Yashu discovered a surprising result: the winner was […]

Like

22. Thanks a lot!
I love statistics and I learned so much from them…
So, keep doing such good work with other series and in general:)

Like

23. Hey Yashu

how did you calculate the total word count of each characters?
if you stiil have the data. Can you tell me the exact word count of each characters. I am currently working on a research. So it would be quite helpful to calculate the normalized frequency.

Thank you

Like