How about a date with data? We’re joined by a Plexian (a Plex Software Engineer, no less) who has shared with us the inside workings of the Plex Metadata team and what they do to enhance the Plex experience. Rated A for an awesome Q&A.
Adriana has worked behind the foundation of our cloud services and ensures the best metadata representation, across all Plex services.
Q&A on the A-Z of Metadata
Get a behind-the-scenes look on how our team works on constantly improving the Plex experience through Metadata.
Hi Adriana, thanks for joining us for Plex Pro Week, and giving us behind-the-scenes details on how you and your team work to continually improve and expand the Plex experience through metadata. Before we dive in, tell us a little bit about how you became a Plexian.
I was a back-end developer for Watchup, a news aggregation and streaming platform startup, when we got acquired by Plex. This was almost six years ago. There’s a funny story actually, I was planning on resigning from my former company exactly on the day when I found out about the acquisition, so after hearing the news I decided to stay, and I’m really glad I did.
We can’t help but ask—as both a user and a Plexian, what do you like most about Plex?
As a user it’s definitely the ability to stream my movies & TV collection on my TV. And maybe I’m biased here, but I love the magic that happens when you create a new library and you suddenly get to see your movies organized, complete with beautiful posters and data.
What I love most as a Plexian is the group of people that I work with, they’re all incredibly talented and kind to each other. We are spread across more than 20 different countries so there’s a lot of cultural diversity that is fun to discover. Everyone here has a voice and is encouraged to come up with new features or improvement ideas.
You’re currently a software engineer on the metadata team—can you tell us a little bit about your team?
Right now we’re a small team of three people. Tim has a long history working on the media server and more recently he’s especially focused on metadata integrations with the media server. Michal works on the cloud services back-end side, he’s the brain behind file matching and he’s also our go-to for ML (machine learning) solutions. My focus is mainly on the metadata, on the integrations with our data sources and on the cloud services back end.
You’re now in your sixth year with Plex (and we hope there’s many more where that came from)! What have you been able to accomplish so far with your team?
I joined Plex right as we were building the first cloud services for serving content so I got to be part of a small team that built our news, podcasts, and web shows services and the TIDAL integration from scratch. After that, I worked on our initial version for a cloud-based EPG data (TV guide data) provider and for a short while on our Movies & Shows service. In the past two years I’ve been working almost exclusively on metadata. I’ve helped build the supporting services for the new metadata agents and I work closely with other teams to support their metadata needs for Plex Media Server, Movies & Shows, Discover, or any other feature at Plex that uses data about movies and TV.
Ok, we’re ready to really nerd out. Can you tell us a little bit about how Plex uses/imports metadata to improve the app?
We use data from a lot of different sources like IMDb, TVDB, TMDB, fanart.tv, IVA, and others and aggregate them in order to generate the best representation for every piece of data that makes up a movie or a show entry. Some sources are refreshed once a day while others are fetched continuously throughout the day. Because each source is unique, some give us access to full exports, others use rate limited APIs so we have built a component that is internally called the metadata cache. Here we store the latest available representation for each movie/show from each source. This helps us to iterate more quickly over our aggregation code and, for instance, if today we change the way posters are selected the cache helps us recompute the poster for all movies much faster than if we had to use all these external APIs in real time.
And does Plex use metadata differently than other apps?
Most apps use metadata from a single source and for most apps that’s more than enough. We combine metadata from a multitude of sources. This does add a lot of complexity in our implementation but it also gives us much better coverage and better data in general as every source is better with some fields and we get to pick which fields we want to use from which source. Sometimes the selection process is as simple as a prioritized list of sources and other times it’s more complex. The simplest example I can give are summaries: we prefer some sources over others, but the length of the summary also adds a score and we penalize summaries that end mid-sentence.
We’re curious to know some of the most interesting/complicated metadata challenges that you’ve encountered so far. Got any you can share?
File matching is an interesting challenge and one that we constantly try to improve. While we stick to our guidebook on how files should be named, we do try to make an effort to support as many variations as we can. Changes done in that area of the codebase can be very subtle yet make a strong impact so we have a test set of about 60k filenames for movies and episodes that we run matching against every day and get alerted quickly if our accuracy level were to decrease.
Matching across sources is another one. Because we use aggregated data among multiple sources to generate our best representation of a movie we need to be able to match IDs across those sources. Some of the sources have external IDs to other sources, for instance TVDB and TMDB entries often have IMDb IDs. It gets tricky when these sources contradict themselves, when for instance a TVDB show and a TMDB show link to each other, but their IMDb IDs differ. It’s a brainteaser to figure out whom you believe in every edge case and sometimes we use our own matching logic to solve that problem.
Episode orders were the next most difficult detail to implement on the new agents because our metadata database schema assumed that an episode can have a single parent, so a single season, but with multiple episode orders, an episode can show up in multiple seasons. It also made cross source matching more difficult especially when it comes to seasons, because if the seasons contain a different set of episodes even though they are both Season 2, are they still the same? We believe that they are not so we had to refactor a lot of internals accordingly.
We’ve gotta know—what’s your favorite Plex Pass feature?
For me, it’s the additional little details that you get like trailers, lyrics, and intro detection.
Ok, final question. Can you share any fun hints about things in the works?
There are always tons of improvements happening on the data side and on PMS but there’s one upcoming feature that we’ve decided to share with you and that’s end credit detection. We’re adding support for credit markers and soon you’ll be able to skip watching the credits that you’ve already seen multiple times before moving on to your next episode.
Get a piece of the perks.
The clock is ticking on our latest partner discounts and deals, exclusively for Plex Pass members. Get in on these Pro Week ‘22 specials before they’re gone.
Want to know more about Metadata? We’ve got you covered:
More Plex Pro Week 2022 Sessions:
Monday, September 19th
Let’s Talk Transcoding & Play in the Pro League (Linus Tech Tips)
Tuesday, September 20th
Your Home Is Now a Cinema (Kevin The Tech Ninja)
Thursday, September 22nd
(How To) Let the Cord Go, for Good (Eric Powelson)
Friday, September 23rd
Say ‘Sup to Supersonic (Elan Feingold)