Friends AI — A Simulated Sitcom with GPT4

For our latest research project, we put the cast from Friends into the Simulation to generate infinite episodes.

Our goal was to compare the quality of AI-generated scripts with those of real episodes to see where state-of-the-art large language models fall short and where the simulation could step in. We deliberately did not push the visual fidelity to easily adapt the system to other types of shows.

It’s important to use well known and recognizable characters to judge the results. Is this really Chandler making a joke? Is this something Phoebe would say? Is Joey’s reaction true to his character?
Watch the generated 15-min episode of Friends AI down below.

Also, if you missed the media buzz around AI Seinfeld you should definitely check it out. It brilliantly shows the creative potential and pitfalls of the technology with no human in the loop. And just recently the cast from M*A*S*H took chat GPT for a spin to author new scenes as reported by the NY Times. Yet another project is the Anime series Always Break Time on twitch https://www.twitch.tv/alwaysbreaktime

To make Friends AI, we generated basic avatars that somewhat resemble the original cast, so that they can interact with their virtual world. Next, we paired these AI actors with fine-tuned GPT3 models and GPT4 which landed just in time. The simulation then generates episodes, including the title, synopsis, scenes, beats and dialogue. Lastly, the AI Director and Show Runner systems we developed internally make sure everything is presented to us in similar fashion to the original TV show we all love.

Generative AI

Sitcoms can show-off a longevity rarely seen in other formats. Episodes are often self-contained, character centric and well written. A sense of progression is then injected through changing relationships and new challenges each character is confronted with in their lives.
AI sitcoms have yet to prove that they can build a devoted fanbase. They have a hard time, because they either are derivatives of something that already exists or they are abundant 24/7 streams or they simply lack character. 

Right now, we are witnessing a radical change in the creative process. We believe AI will blur the line between authors and fans and how IP owners are organized. It allows for massively co-created and adaptive experiences. 

Jennifer Aniston AI Side by Side

Side by side of Rachel in Unity and AI Rendering of Rachel with Stable Diffusion

However, large language models (including the latest GPT4) have some known drawbacks in the context of dialogue generation and storytelling.

  • Coherence and Memory: The system forgets the overall plot and character intentions over time

  • Hallucination and inappropriate content: The system comes up with random topics or characters that don’t exist and it goes off rails

  • Non sequiturs and timing: The system does not know how a scene should advance or when it should end with a punchline.

Let’s look at two approaches to mitigate some of these problems: Beats and Fine-Tuning.

Units of Entertainment

You’ve probably heard of acts, scenes and maybe even sequences but what any stand-up comedian will tell you, a beat is the thing that keeps you glued to their lips. 

A beat sheet breaks a scene (or act or sequence) down into smaller moments long before the author focuses on specific dialogue or the actor gets to improvise. Through beats, authors have the power to turn the world of their characters upside-down within seconds.

It’s important to chose the right scale for working with LLM’s.

In the context of script generation, we chose beats as they are the most impactful story structures to instantly judge coherence and the entertainment factor. In contrast: Generating the outline for a whole season, or generating just the dialogue for a scene in isolation is a lot harder to judge. Beats allow us to progressively refine context — tuned to our default attention span.

Fine-tuning GPT3

We’ve used the following Friends database to create a fine tuned model to generate beats per scene. 

https://github.com/emorynlp/character-mining

What’s important here is that the dialogue is already broken down into scenes because we first had to augment the dataset. We asked GPT to generate beats for every real episode, so that we end up with new synthesized data that nicely aligns with the episode’s synopsis and dialogue. 
Now we are able to train a new model that mimics the plot points, narrative threads and arcs within a typical Friends scene.

Here is an abridged example of the training data (1 of 326)

{"prompt":"SEASON 01\nEPISODE 01\nMonica and the gang introduce Rachel to the 'real world' after she leaves her fiance at the altar. Ross struggles with his rediscovered feelings for Rachel.", DIALOGUE\nMonica: There's nothing to tell! He's just some guy I work with![...]

"completion":"\nSCENE 01\n(Monica introduces Rachel to the group)\n(Joey suggests Ross go to a strip joint)\n(Rachel explains why she left her wedding at the last minute)\n(Ross reveals he told his parents about Carol)\n(Rachel explains why she came to Monica's apartment)\n[...]

And this is an example of generated beats for a new episode using the fine-tuned model.

(Joey tells Chandler that he has to audition for a part that requires him to play a man with three breasts.) (Joey explains that he will audition by reciting a scene from Twelfth Night.) (Ross interrupts, asking if he can borrow a book about dating.) (Chandler and Joey tell Ross to wait until they are done and then he can have the book.) (Joey stands in the living room, holding a script while Chandler lounges on the couch.)

Python Code for Fine-Tuning

Show Runner

The show runner system defines the fundamental show structure:

  • Type of Show

  • Number of seasons

  • Number of episodes per season

  • Number of scenes per episode

  • Cast, Location (Sets) etc. per scene

  • Number of beats per scene

Show Data in Unity


Scene Data and Scene State

The AI director and staging system we developed in Unity makes sure all actors and cameras are at their mark for the scene. It also injects emotional tags and actions into the prompt to further enhance the dialogue generation.

This allows for highly variable and potentially interactive episodes with just the beat structure as the backbone.

Scene Data and Floor Plan in Unity


Dialogue generation with GPT4

Our attempts to use a fine-tuned dialogue model based on the show’s cast were promising when compared to GPT3, but in the end, it did not yield better results than the latest GPT4 model. The new model mitigates most of the coherence and hallucination issues mentioned above. It also has a good understanding of the characters in friends as it was like its predecessors already trained on the same data.

So all that was needed was the beat structure we generate during scene changes and the simulation data to guide it through the episode. 

The dialogue is then generated on the fly. For audio we’ve used Elevenlabs API because of it’s good quality, easy voice cloning and overall speed.

AI Table Read with Ross

Perspective on the future

Eventhough this experimental project shows that the same issues we already saw years ago still persist, large language models have improved significantly so that new generative seasons of our favorite TV shows seem very doable in the near future. 

GPT4 specifically was able to interpret the guard rails the simulation provided and write dialogue that feels appropriate for each character without losing coherence and with a good sense of humour. 

We would love to push this experiment even further. There is no shortage of ideas in this new world of generative AI. What if Friends would play in the future with a very old cast or was set in a western town? What if Chandler and Monica never got married? 

Here are a few general directions we currently think about:

  • Make even more use of simulation data to create new episodes in playful and interactive ways. How would user interaction look like? Episode to episode, scene to scene, moment to moment: Changing the casts emotions, needs, personalities, prompt the system with new obstacles or do a cameo.

  • How would we keep viewers engaged with infinite 24/7 episodes? How much would it change the format. How much the character’s life? How can we make it more reliable and safe for public display?

  • Can we generalize the Show Runner concept to adapt it to more dramatic series like a True Detective show or NYPD Blue?

  • Which series should we tackle next? An animated show like South Park with timely subjects and on the edge of inappropriate AI generated content?


We are super excited for whatever comes next!

AI generated Rachel with Stable Diffusion, D-ID and Elevenlabs.

 


The "Friends AI" research project is an experimental, non-commercial endeavor aimed at exploring the potential of artificial intelligence, voice synthesis, and deep learning technologies to recreate the images and voices of the original cast members from the television series "Friends".
The Project is not affiliated with, endorsed by, or connected in any way to the creators, producers, or copyright holders of "Friends," the Cast, or any related parties. All intellectual property rights, trademarks, and copyrights associated with "Friends" remain the exclusive property of their respective owners.


Sets by Koen J. https://3dwarehouse.sketchup.com/user/0998643389915521058863890/Koen-J

Previous
Previous

Generative AI Terrains in Simulations

Next
Next

Introducing: The Simulation