Plex блог

September 20, 2010

Metadata Update

So what exactly is metadata? Defined on Wikipedia as “data about data”, it turns our collection of media from a drab list of files into an interlinked web of facts and pictures, and imbues each item with a rich set of properties. Your files might have structure, but painting them with metadata converts that simple collection into a multi-dimensional universe of relationships. You can now set about answering complex questions like “Do I have any romantic comedies from the 1990’s that I haven’t watched in at least a year, starring Julia Roberts, and not featuring Whoopi Goldberg?”

Everyone loves metadata, and we didn’t anticipate the extreme load the Plex/Nine release would put on a number of sites when we launched. Tens of thousands of early downloaders, eagerly rescanning their huge personal media collections, contributed to massive amounts of traffic to multiple sites.

As a result, we’ve had to spend quite a bit of time since the release focusing on stabilizing our sources of metadata, optimizing the metadata agents (the bits of code that go out and get your metadata), and adding infrastructure to support all of our new users. Here’s a summary of what we’ve done:

  • We’ve brought up a massively powerful machine to serve as our TheTVDB proxy cache. All requests to TheTVDB go through this machine, and it serves over 99% of all bytes out of its cache, which means we’ve reduced the data load on the parent site by a factor of 100x. At peak, we were serving over 500 requests/second, and sending out 320Mbps. Darrin, one of our super-talented Plex engineers, worked literally day and night to get this running after the release, and we also appreciate the help and support from the TVDB guys!
  • For movies, we’ve moved to using data that is accessible with an API or through structured data dumps. Specifically, we’re using metadata from Freebase, Wikipedia, and TheMovieDB (as well as a few others for extra artwork, such as MoviePosterDB). This ensures the best availability and stability of the data.

If you’re not familiar with Freebase, you should check them out. It’s one of the few sites in recent memory that’s totally blown me away. The people who designed it are very, very smart people and the amount of data available is unbelievable. If you check out the page for the movie 300, you’ll see it links to 33 reviews of the movie, 6 other sites (such as Rotten Tomatoes), and then has a veritable cornucopia of data including cast, genres, subjects, filming locations, award lists, and more. All of that data is available via a sophisticated API, or via weekly database dumps.

We’ve processed the most recent Freebase data dump into a form that’s most suitable for our agent to consume. Additionally, we’ve enhanced the Wikipedia agent to support multiple different languages for the summaries. Finally, much more data from TheMovieDB is being pulled in by that agent.

In summary, massive amounts of data, all structured (no more “scraping” sites that can change at a moment’s notice), and all completely up to you as to how you use them. Like TheMovieDB summaries? Drag it to the top of the list of agents. Prefer your summaries in Swedish? Make sure Wikipedia is above TheMovieDB, so its internationalized summaries will take precedence. Have two French movies for your mother-in-law? You can manually set the language preference to French for just those two movies, and she’ll offer to babysit her grandkids while happily reading the summaries in French.

These agent changes have been pushed, and you will have them within the hour, unless you check sooner with Plex Online > More > Check for Updates.

Get your settings exactly how you like them, shift-click the refresh button to get new metadata for all your movies, and then sit back and watch the metadata flow in. (N.B. At this point in time, poster/art selections are “sticky” so once set, it won’t change unless you rescan a section from scratch).

Here’s a summary of what the different movie agents now provide, so as to allow you to prioritize them accordingly, through the settings option shown below:

Fullscreen.jpg

  • Freebase: Genres, content ratings, studio, directors, writers, actors, tag-lines.
  • Wikipedia: Multi-language summaries, directors, writers, actors, studio.
  • TheMovieDB: Summaries (more plot oriented), content ratings, directors, writers, actors, studio, tag-lines.
  • MoviePosterDB: Lots of movie posters, at lower resolution than TheMovieDB.

So as an example, if you hate the Wikipedia summaries, and prefer English plot summaries, drag TheMovieDB above Wikipedia. If you leave Wikipedia enabled, summaries that aren’t found from TheMovieDB will be filled in by Wikipedia.

If you want your summaries in Swedish, you’ll need to enable Wikipedia and have it higher in the list than TheMovieDB. Note that currently, in order to change languages, you’ll need to create a new section with the new language setting. Alternatively you can “fix match” on an individual item and manually set the language.

Lots of you have asked: How can we help? Luckily this is quite easy; let’s say you have a movie that’s missing data, or has incorrect data. You can head to one of those sites above and add the missing data, and then everyone in the community will benefit, including users of other apps that access those sites. This really is a case where each one of you has the power to help hundreds of thousands of other people!

The most immediate “turnaround” from this data would be through TheMovieDB, which we access through a well-designed API. We cache requests for 4 hours, so if you add data, you will not see the new data for at most this amount of time. (Note that we are also working to improve TheTVDB refresh times, which are now between 24-48 hours.)

Also, if you’re a developer, please check out our repository for agents. They are easy and fun to write, and we’re really looking forward to seeing the creative things you come up with. Oncleben31 has already written an agent for Allociné for French users, and the ever talented Sander wrote one for MovieMeter, for our Dutch users.

In the near future, we’ll allow you to fully customize any of the data for your media and lock it in place, so that it won’t be overwritten by new data from the Internet. So, for example, you can lock all your titles and summaries, but let the ratings and genres continue to expand and improve over time.

Your media has a bright future inside Plex, and metadata is the key.

Plex — лучшее

С Plex Pass вы получаете эксклюзивный доступ к потрясающим новым возможностям и приложениям.

Подробнее

Plex — лучшее

Metadata Update is a premium feature and requires a Plex Pass subscription.

Подробнее