How Deluxe Uses the Complete HashiStack for Video Production
Dec 14, 2018
Deluxe is using every tool in the HashiCorp stack to build it's video production tools. See how they use each.
Learn how Deluxe is using the HashiStack—Terraform, Nomad, Consul, and Vault—to build an open and modular ecosystem for their next generation of visual effects, post-production, and content distribution services.
- Konstantin WilmsSVP Cloud, Deluxe Entertainment
- Abhishake PathakChief Systems Architect, Deluxe Entertainment
Konstantin Wilms: My name’s Kon and I’m here with Abhis. We both work at Deluxe. I guess the first question would be, How many people here think that we’ll be talking about the CDN network today? Probably some of you, right?
We’re going to be talking about what it takes to move things like episodic and feature films from source all the way to distribution sync. This is a lot of work that Deluxe does. It’s pretty hard to try and formalize this and describe the whole process end to end, so we thought we’d split it in 2. You might find the first half interesting, because you have to understand the patterns that are used, the way that we have to drive the architecture from the business prospective. Because a lot of these things, if you look at it from face value of, “We’re in a single region in AWS,” for example, that sounds like an anti-pattern, right? For our use case, it’s the only way to make it work cost-effectively.
: There are those kinds of design decisions that we’ve had to make for this. The whole group of us here work on a platform called Deluxe One. Deluxe has been in business for like 103 years. It started as a film-processing lab in L.A. connected to Fox and then spun off from there. Everything that Deluxe does as a company is from creation all the way through to delivery of content.
We’re on the technical team, if you will. There’s a creative division, there’s a technologies division, and there’s a distribution division as well.
We work on a platform called One. If you think about things like film processing or film distribution, if you’re distributing to theaters and whatnot, traditionally that’s all been things like hand-carrying drives, because they have to be encrypted to protect the movies that are going for distribution. A lot of LTO tape libraries, etc., so it’s a very physical process.
Automating the digital supply chain
And that whole supply chain that we work on—the digital supply chain, for aggregating episodic and feature films and then distributing that—the whole thing around that has been very hands-on, lots of CSRs, lots of people involved. So our whole thing is to try and automate everything that we can to make this thing fully automatable end to end, which is not an easy task.
If you look at the basic parts of the business for Deluxe, there’s mastering, and that’s where you’re finishing a film before you put the master somewhere, store it, and then create what we call a heavy asset or a mezzanine file. Just like a mezzanine file, that file sits between the master film asset that the studio holds and the distribution assets that are pushed out to your Netflix or your iTunes or your Hulu or your theaters or whatever.
These assets are pretty big. Probably our biggest problem is dealing with the weight of these assets. They can go up to multiple gigabytes in size, close to terabytes.
So where do we start? The asset drives the supply chain that we have to build infrastructure around. Just a single file, if you look at, say, a feature film, for example, one of these files, like a heavy asset, can be 500GB. If you look at 4K and 8K, that goes up to 800GB and upwards of there. That’s for one asset.
Then we also have to have other things with it. So imagine if this content has to get distributed in, say, the Czech Republic. We have to localize the language. We need subtitles. We need what we call materials analysis—in other words, we have to analyze the file, figure out the codex in it, the video properties, etc., the metadata that goes with it that describes it, like the movie genre, etc. Because those things are important for your iTunes or your Hulu or whatever to drive their search engines.
It all comes from that source for distribution. That’s why you see for the assets per title, that explodes out. These could be smaller files, but there are a lot of them. If you try to optimize infrastructure to process very large files, the outlier is now we have hundreds of thousands of very small files, so that affects what you select for storage. Everything that we do right now we have to be object-store-native, because it’s just too time-consuming to do things like hydrate content to EBS volumes and/or move it around.
Struggling with output
The outputs: This is where it gets even worse. If you look at things like doing output to, say, a PlayStation 3 or PlayStation 4, you need probably tens of thousands of files, small chunk files that get distributed out. The asset we start with is a challenge in and of itself.
Our platform is absorbing a lot of these things in automating these workflows, but we have to deal with these existing use cases.
The existing business delivers about 4.5 to 5 petabytes of content per month for distribution, which is a constant velocity. This affects everything from, “Can we use direct connect?” to “How many direct connects do we use?” to “Can we use UDP Acceleration?” All of these things key into the input and the output side. The platform we build is extremely important.
You can see the volume and the scale: 4,500 digital assets delivered monthly. These could be these large files, or they could be those groupings of like 10,000 files. The other end of that spectrum: That was a distribution piece and a source piece, but then if you look at what metrics drive the platform, there are actually a whole bunch. I won’t get into all of them, but think of just the one figure there, for 7 million minutes processed per month. That means ingest the content, transcode it, conform it, checksum it, maybe run machine-learning analysis against that as well, archive it, encrypt it and then distribute it.
It’s a lot of processes, and you can probably see from this type of structure that’s very fungible, something like a monolithic application or even auto-scaling just on instances with code deployed, it doesn’t work. The only approach that works for this is a truly microservice-type platform, horizontally scaled. If you look at some of these figures specifically, our challenge is that in the current business, 30,000 assets get ingested a month. We have to scale that and get that into object store. The total volume, 600 petabytes, that could be on-prem, but that can also be in clouds. So there’s a problem of, “Where do you attach the compute to? How do you distribute your processes?”
Then we have to meet the SLA of 99.99%. But the thing to bear in mind is that the expectation from a recipient of content is not to have it delivered in a few minutes; these SLA time windows are typically 8 hours. The cool thing is that gives us the ability to orchestrate restores from Glacier in batch to the cheapest tier. Because we can still meet SLA, and we’re still beating on-prem time.
Most importantly, the real thing that drives the platform is that any amount of compute that we consume on, say, an AWS or anywhere else is irrelevant, because all of our cost goes into storage and egress—data costs of moving the content around.
On the left you see all these processes lined out, and what we are doing for processing a piece of content is it’s a DAG, from top to bottom. So, ingest, transform, localize—all of these are separate microservices orchestrated together. They pass the content or they pass references to the content from one container to another, go through the whole process, and then distribute it, and then close it out, with the SLA met or not met, or whatever might be.
I didn’t get into some of the other things like The Handmaid’s Tale, that’s an interesting one. But if we get time for a Q&A afterwards, we can describe how that works from the film perspective, and how those files come together, and what drives it from the creation side as opposed to technical side.
These are the key things:
Security: We don’t have to meet PCI compliance security, but we do have to meet MPAA compliance. CSA: We have to meet things like TPN, Trusted Partner Network, so that’s a whole set of things that we have to align to. It’s not really a certification that you pass; it’s a way that we handle content. The interesting thing there is that for many of these things, like encryption or high-value assets, we can’t store the assets anywhere. We can operate on them as long as they stay in memory. So that’s also another interesting pattern that we have to look at.
But this notion that the cloud industry has of content gravity: Because it’s so expensive to move everything around, we have to put the compute where the storage is. And in the processing Affinity, we have to put clusters of processing close to each other to optimize that SLA.
And similarly for horizontal scale: When you’re delivering 4.5 petabytes, there’s no steady state. At any point in time we can get an order for distribution to any country in the world for x amount of titles, assets, branding materials, etc. And it all has to be automated.
And finally, this all plays into having to do highly distributed systems and services.
So with that, I hand it over to Abhis.
Abhishake Pathak: Hi, guys.
So how does all this work? I’ll get into all the technical details.
Scheduling with Nomad
The key part is scheduling. Usually, when people think of scheduling, they think of resource scheduling. But in our context it goes much beyond resource scheduling, as I show you here: CPU, GPU, whether I’m using Spark, ML, Docker, Raw Exec. Obviously, these are constructs that are available in Nomad, and that’s what we use at the core to do most of the scheduling. But those are just your high levels.
To take it one step further, we also look at content isolation, process and content gravity, and then the access to and from the content itself.
So what does content isolation mean?
When we talk about content isolation, we say your content is essentially housed in one place, and your access to and from it is very controlled. So in the place where your content lives, you have no processes. If you have a process, then you have a vulnerability, you have a potential of someone directly manipulating the content.
So when we say “content isolation,” our content lives in a completely separate isolation zone. And you can think of that as a data center, a cloud provider, an AWS account, a GCP account, however you want to look at it.
But the idea being, it is separate. Then, when we get to process and content gravity, that plays a role with the isolation in that anything that wants to process content now lives in a different isolation zone. So content lives here, but you’re processing it here. And the reason for that is we have controlled access between the 2. If you need to turn on content—whether it’s machine learning, whether it’s transcoding, “I want to generate a subtitle,” whatever it is you’re doing—you must perform that work in this isolation zone.
So that way we have control and we know at all times who is touching what content, and what they’re allowed to do, and what they’re not allowed to do.
Processing the content where it lives
And to that note, depending on what the process is being performed, we may move the process to the content. Meaning we’ll schedule a Nomad job to where the content lives. Or we may decide that it’s better to just pull the content down, straight from S3. What we’ll do is 10 parallel pulls, and just have 30 or 40 jobs and just work at them in parallel.
But depending on what the workflow is, we’ll balance out and play into the scheduling as to size, the transit time, and then where the data itself lives.
And finally you have access. We have services that customers need to interface with, so they’re going to be available on the internet. So a clean line between the 2, to say, “These are customer-facing services. These interface with these backend services that may talk to content, but they don’t have direct access.” Once again, another layer of security on top of it.
Authentication with Fabio
Ingress and egress: Here’s where we did a little bit of our own development and coding.
How many of you have used Fabio?
So we took Fabio—this was prior to Envoy being popular, prior to Nomad 0.9—and then we extended it to another level to say, “We are going to add authentication, authorization, and then the exposure of your service on the internet as well.”
So we overloaded it so that, in a single Nomad job now, we can say, “This service needs this load balancing. It requires authentication. Here’s the URL we’re going to expose on the internet.” To the point where we can say, “If you’re on the internet. you’re forced to go through authentication.”
So we took Fabio and called it Fabio. Coincidentally, it means the same thing. One’s in Latin, and one’s a different language, but it still means “bean.” Fun fact.
So that’s how we handle authentication and authorization between services. You’ll see the sidecar pattern being used a lot where, per service, you’ll have Envoy or Fabio attached to each service. But we chose to do it at a node level. So we only have 1 proxy for a given node. And the reason for that is we have a single point of control where we can say, “Push a policy down,” and then everybody has to adhere to the policy, rather than implicitly trusting the developer to say, “Yes, everything is in place and I trust that.”
Everything comes from Nomad
And then the final piece is automatic configurations. The extension to this ingress and egress portion, we automatically, at a platform level, take care of the logging, the metrics, monitoring, and secrets.
So by default when we onboard a project we do some bootstrapping—a bit of code here, a bit of code there, and your project gets bootstrapped. But then for SSL, for logging, for things like that, we allow the developer to declare all of that in a single Nomad job. And that way, even for us as platform operations, we can go to a single place and we’re not running around trying to figure out, “Where do I get this config? Where do I get that config?” So all of this is from the Nomad job. Nothing is external. The CI portion, obviously external, but when it comes time to deployment and actionable things during deployment, they all come from the Nomad job.
This is a sample. We use Replicator to do some of the auto-scaling. We use Fabio as the proxy. And we have some internal tools that we worked around both of these tools to make work.
A look at the infrastructure
Now I’ll get into what the infrastructure looks like. Where do we use scaling? Where do services live? Where do they not live? Why does Nomad work for us so well?
This shows you the breadth of a single environment. Nomad sits in the middle, where the masters are, where everything’s controlled from. And then we have all of these isolation zones that I mentioned earlier. And I’ll get into the details of what each isolation zone does shortly.
But each one has an intent and a purpose. And based on that is how work is executed and scheduled.
So let’s start with the core one, which is a shared environment where your masters live, Vault lives, where operations does all of their management—your Git hooks, your CI/CD, whatever the case may be.
That lives in 1 account. It could be an AWS account, it could be an on-prem facility. In our case its an AWS account. But it’s 1 isolated account, and that’s all it does. It’s just meant for operations.
Any other service, like if we wanted to in the future add some more monitoring of some sort, it would probably end up in the shared account.
And then we get to the internet-facing accounts. The second portion of the scheduling is the services that need access to the internet, or client-facing services. They automatically get scheduled to this environment. You’ll see here that all of them have these multiple circles. The hard circles are our core Nomad workers in the middle. And each service is surrounded by a proxy, Fabio, which handles your authentication, your load balancing via Consul, authorization, etc.
Next, we move on to the input and output network. We have dedicated, unilateral rules for content distributors and content providers. Let’s say we get content from Fox. We have explicit rules allowing us to say, “They can push, or we can pull over,” and vice versa. So we have 1 area or isolation zone that’s explicitly dedicated for that, and it’s only for that.
Going back to the content security, we want to isolate what our core catalog is versus the content we receive and we deliver. That happens here. It lands here. And at some point, after certain events take place, it will get churned in the real account, which is the content account.
After the content gets validated, it’s all good and dandy. All the metadata’s available. We go ahead and register that piece of content into the content account, where that’s all that’s happening. That account only manages like, “OK, I want to move from bucket to bucket.” Or, “I want to change my chunk size because my process requires it. If I do this chunk size, I’ll get better performance. I want to tier it. I want to go from Glacier automatically. Within a month, if it doesn’t get touched, let’s put it to Glacier, lifecycle policies. If we need it, we can automatically pull it back.”
That singular place is solely responsible just to manage content. That’s it.
The core working environment
And finally, the true working environment. This is where all the core work happens. Anything that touches content lives in this isolation zone, things like transcoding, machine learning, localization, and fingerprinting. Transcoding and localization are fairly straightforward. Fingerprinting and machine learning, those entail, “How do we identify content?”
A good example is you get Avengers, and then you get Avengers Director’s Cut. How do you know it’s Avengers versus Avengers Director’s Cut, short of watching the whole movie? Because it’s gonna be one scene in the middle that’s different. Everything else is gonna look the same.
Fingerprinting plays a very important role. Otherwise, let’s say I’ve got a 1TB file. I’m gonna reaccept it all over again because I wasn’t able to identify that I already had it. Fingerprinting the content so we can confidently identify and say, “Yes, this is Avengers Director’s Cut,” or, “This is the cable version of Avengers and not the theater version.”
And then taking that one step further, we have machine learning, which is like conforming data. You’ll get a piece of audio, you’ll get a piece of video. It’s useless until you can line the data up, right? Otherwise, like you see on Netflix or YouTube sometimes, the video rolls ahead, and the audio’s kind of lagging behind. Dragon Ball Z, perfect example. It kind of looks like that, right?
Without conformance, you get that Dragon Ball Z effect all the time. And without that, the data’s useless. So that part plays a very important role to find, “Oh, we have a gap here. All right, I see there’s a gap, so I’m gonna have to realign the data.” So that’s where we use a lot of machine-learning AI algorithms to get the final product out.
In a nutshell, we have services, we have batch jobs, we have ML processes and workflows. We have a really interesting flow and mix of services across the board. The challenges are very interesting.
Feel free to ask questions if you have any.
Questioner 1: I have one. That’s an awful lot of data to put over the wire.
Questioner 1: How do you do that? Do you reserve that amount? Do you own the wires that go through your location? What do you do?
Abhis: Good question. It’s a mix of both. If you look at our traditional model, we owned the lines. We had private networks going across the world. Now, with cloud, it’s a lot easier. We can convince a studio or a client, “Why don’t you put in S3? It’ll be cheaper for you, and it’ll be better for us, too.”
Questioner 1: Then you’re vulnerable to denial of service, somebody picking up on your ...
Abhis: On our private lines, or do you mean on the public internet?
Questioner 1: The public internet.
Abhis: At that point, we have the same constraints as any other application running on the internet. We do have dedicated lines for certain high-volume customerss. It really depends on what the volume is. Traditionally, it’s been: We have private lines, like I showed you earlier here. That’s what that particular isolation zone is dedicated for, to handle that level of throughput. We have dark fiber, Aspera, HTS, Resilio, different protocols. Obviously, the hardware to back it up as well. Like he said, object store, whether it’s Amazon or GCP or whatever, it really makes a difference because it really allows us to parallelize a lot of it.
Konstantin: We use things like Aspera. We use UDP Acceleration quite a lot, too. Depending on where we’re deployed, whatever cloud provider that is, we’ll use their things, like WAF, GuardDuty, etc. For lots of things, if it’s Aspera or Signiant or other products, we’ll consume their SaaS service. We are giving them, for a limited time, STS credentials to access our content. It becomes a contract with that vendor.
We do have an internal ring network within L.A. Most of the studios in L.A., they’re funneling content back and forth among each other. We have dark fiber that runs there as well. That’s our BDN, our Deluxe Broadcast Delivery Network. We have these different avenues available to us. If you’re thinking about high-value content, most of that will never go over the wire anyway. If it’s mastering, it’ll be someone coming into a secure facility with biometrics and literally hand-carrying the drive with the master for Avengers to, say, have color conformance done on it.
Abhis: The key here is we want to minimize the movement of the data, just on sheer size. If we can land it where it’s supposed to be right from the get-go, it makes life a lot easier.
Konstantin: If we can do an S3 bucket-to-bucket copy within region, with KMS, we’re done. It’s a backend operation, and we don’t have to touch it, and it doesn’t egress over the public internet. That factors into cost when you’re moving that volume of content as well.
A mezzanine file is by definition compressed, but it’s compressed at, not 200 to 300 Mbps, but around 30 to 50 Mbps. All of the derivatives, if you think of like a 4K file that’s on Netflix, that’s probably way under 20 Mbps, depending on what you’re playing it back on. Some platforms, like PS4, has higher quality for these kinds of things, so it, varies but it’s always a lower-quality derivative. We’re never moving those 800 Mbps files around.
The one thing we didn’t get into is things like dailies and pre-production. That’s another problem we have. When a movie gets shot and those dailies start coming in, and then the director and editors want to do what’s called “rough cuts” against that —in other words, start to conform the stuff as it comes off set—that’s usually a petabyte per production. It’s about 400TB that move back and forth between houses that basically go, “I want to work on this V effects thing. I’m going to composite Groot in this scene, or whatever it might be, work on the content, and then ship it back to you.” That’s such a huge amount of content that that’s still on-prem. It’s not even feasible to move that to cloud infrastructure right now.
Abhis: There’s a big deviation between what comes off the camera to what we see at home. Let’s put it to you that way. Next time you go on Netflix, hit that star button. It’ll tell you the megabits per second. What is it, 7, 8, 9, 10 at the most? Compared to the numbers he just mentioned, it’s a drop in the bucket. To your point, if it’s that important, we’ll have a guy drive it in, and he’s tracked all the way, end to end.
For the stuff that normal consumers would see, it usually will go over the wire, but it’s chopped to like a tenth or fifteenth of the total.
Any other questions?
Questioner 2: What does Ja Rule think?
Abhis: What does Ja Rule think? Where’s Ja Ja when you need him? Ja Rule would love this platform. Those in the know, know it: Dave Chapelle.
Konstantin: Thank you, sir.
Abhis: Thank you.