Building the Open Metaverse

Innovations in Visual Graphics

Anton Kaplanyan, VP of Graphics Research at Intel, joins the Patrick Cozzi (Cesium) and Marc Petit (Epic Games) to discuss the future of visual graphics in the metaverse.

Guests

Anton Kaplanyan
VP, Graphics Research, Intel AXG
Anton Kaplanyan
VP, Graphics Research, Intel AXG

Listen

Subscribe

Watch

Read

Announcer:

Today on Building the Open Metaverse.

Anton Kaplanyan:

We are visual creatures. 80% of traffic nowadays is videos. It speaks for itself, right? We are consuming visuals. I think AI and machine learning opens up an interesting opportunity for us to pretty much speak the same visual language.

Announcer:

Welcome to Building the Open Metaverse, where technology experts discuss how the community is building the open metaverse together, hosted by Patrick Cozzi from Cesium and Marc Petit from Epic Games.

Marc Petit:

Hello, I'm Marc Petit from Epic Games, and my co-host is Patrick Cozzi from Cesium. Patrick, how are you today?

Patrick Cozzi:

Hey Mark. I'm doing well.

One of the benefits of us co-hosting these podcasts is that you and I get to learn a lot along with the audience, and I'm really excited for today's episode because anything that's tech, that's geek, that's research, that's right up my alley.

Marc Petit:

I'm going to have fun today, Patrick, because we're super happy to welcome to the show Anton Kaplanyan. He is the Vice President of Graphics Research at Intel. Anton is one of the leading researchers focused on real-time rendering, and his research is used all over NVIDIA's middleware and LTS hardware, game engines like Unreal Engine, Unity, and CryEngine, and games and also in Pixar's RenderMan.

Anton, it's super great to have you with us on the show. Welcome.

Anton Kaplanyan:

Thanks. Thanks, Marc and Patrick. Thanks a lot for inviting me. I'm happy to be here to chat about my favorite topics.

Patrick Cozzi:

Excellent, Anton; thank you so much for being here.

As you know, we like to kick off the show and ask our guests about their journey to the Metaverse; how did they get started? Let's dive in because there's so much to talk about. I mean, you've been involved in graphics innovation for 20 years across Crytek, KIT, NVIDIA, Facebook, and now Intel. Tell us your journey.

Anton Kaplanyan:

Oh yeah, absolutely.

I think I wrote my first path tracer in 2002, so quite a while ago, before all the current path tracing and ray tracing tendencies. I've been in game development. I'm a game developer at heart, so I've been a game developer starting in 2004. Had quite some fun there at Crytek. We worked on Crysis, CRYENGINE, and I think it made quite some sales for NVIDIA back then with the Crysis thing. Fun times.

I am also a Ph.D. in light transport, ray tracing, path tracing, modern what's called physical-based rendering by training. That was, I think, my next journey after game development. I decided to go a bit back to the drawing board and make sure that I learned the new essentials of physically based rendering.

Back then, it was quite timely, which I think it's maybe not that wide-known, but I had a startup as a spinoff of one of my papers, Ph.D. papers that we have integrated into RenderMan.

They had a short show of how they use it at SIGGRAPH; they said that the artists started to use this technology basically everywhere. It's like an editing tool, insights, physically based rendering.

What they did is, in the show called Cars, the third version of it, Cars 3, they basically edited the reflection of the eyes because the eyes of the cars are in the hood. Then, if you look at the car, you basically see two pairs of eyes, and they just dimmed down all of the reflections of the eyes so that you can see only one pair of eyes, and it's not that confusing.

It's interesting things I've been doing back then.

When I started working at NVIDIA, something big was brewing; that's what's called RTX hardware now. That was a lot of fun working on the simulator, as well as the things around it. Because if you think about the first generation of ray tracing, it's maybe not super powerful; you need to augment it with denoising and a few other things just to get a picture out of it. That was a fun technical journey as well.

I think I'm personally enjoying moving the needle in graphics.

I'm a technologist at heart, and when I started working on AR, VR, even back at NVIDIA, that was interesting. I think we had one of the first eye-tracking headsets.

There was a very simple test. You just put a wide rectangle where your gaze is; it was the first eye-opening experience of perceptual graphics for me. Because when you wear this and you start looking around, you have this interesting perception that this white rectangle that shows where you're looking at is jumping to the place that you want to look at before you even arrive with your eyes there. It was a bit of a creepy experience because, at first, you have this uncomfortable feeling that “does it read my mind?” Then you realize it’s just fast enough. But if you dig deeper into that, there's the whole world of perceptual graphics that we just live 100 milliseconds in the past.

This is what excited me quite a bit. This is where I switched to working on the graphics, ultra-low power graphics, inverse graphics, democratized content creation back at Meta, which was back then Oculus Research.

Now I'm thrilled to be back to the big industry of GPUs, when Raja said, “we're launching a new one.” It's not often you have this opportunity. The last ones were launched, what was it, over 20 years ago? Now it’s Intel coming to this market as well.

These are certainly super exciting times, and we'll probably touch on this, but the demand for graphics, the demand for visuals is nowadays growing across the board.

Marc Petit:

Let's talk about innovation and ray tracing.

In the last five years, we've seen explosive innovation in computer graphics, real-time ray tracing, neural graphics. Let's talk a little bit about that.

Let's start with ray tracing first. The industry has gone through a revolution, and now ray tracing is almost in every home. Can you walk through any new advancements and where the focus is on next?

Where do we go from there?

Anton Kaplanyan:

We've gone through multiple generations of ray-tracing hardware by now. Even mobile vendors start to bring up some ray tracing hardware and acceleration on mobile phones. That's an interesting time. I don't think we are fully there yet. There's still a long way to go. Some adoption needs to be done due to the performance and availability of the hardware. This is certainly our focus at Intel as well.

What I can say is ray tracing is certainly a better, more consistent way of doing graphics, especially compared to rasterization or the REYES algorithm. It could be more costly for just camera rays. Then as you go further, ray tracing simplified a lot of things; in a nutshell, ray tracing is just pointer chasing. It's a very divergent workload.

What's interesting is that, with our latest GPU architecture, our cache line size is not that big, and it's beneficial for divergent workloads.

We have some very interesting ray tracing hardware that can achieve interesting performance barriers and so on.

But it's not just ray tracing. Ray tracing just gives you a hit point, and from there needs to do a lot of other things. You need to fetch your material textures, yada, yada, do shading there, do sampling for the next ray.

For example, if you're doing the path tracing for a full path, there's a lot of divergence, not just in the ray tracing itself, but in this whole system around ray tracing. I think this is an interesting architecture challenge for the GPUs because, initially, GPUs are designed for very coherent, almost lockstep workloads.

Now, with ray tracing, you need to think much more about re-converging this divergent execution across the whole machine. This is where we introduce, for example, the threat scheduling unit in DG2. NVIDIA has their own hardware for that.

This is, I think, just the first step towards it; towards basically teaching the GPUs to execute more and more divergent and incoherent workloads.

And then, of course, another challenge that we see with ray tracing now is I think there is still not enough dynamism. In terms of dynamic geometry, in terms of dynamic worlds, animation, and so on. Ray tracing comes with a cost of building an acceleration structure around your scene graph to get to the logarithmic complexity when you do the actual ray tracing. Then building this acceleration structure is something that's a non-zero cost, and I would say it's a pretty high cost.

This is where I think we still have a lot of work to bring more dynamic interactions into ray tracing across the board. Of course, as you go further down, ray tracing is just, as I said, it's an algorithm to get to the next closest hit point.

But then what do you do with it? I think as we are slowly getting towards real-time path tracing, there are a lot of challenges yet to solve there. I think if you think about why we want to go towards real-time path tracing, people could say they use it in movies, and things look much nicer because it's physically based and so on.

I think one thing we need to emphasize here is the look dev consistency is part of it. Path tracing has much more predictable results compared to, let's say, some of the modern real-time graphics algorithms. That's when you create the content; when you create an experience, a scene, an environment, you don't have to think about all of the constraints of the system. You have much more predictable results, and you can iterate much faster on your results; therefore, we get higher visual fidelity.

Because with the same constant amount of time of production, you get much more pleasing results due to this predictability. I think this is one very important part about the future of ray tracing and path tracing that I would want to specifically emphasize.

Patrick Cozzi:

Anton, a phrase I remember is primary rays cache, secondary rays thrash. Is that still relevant in driving a lot of the architecture decisions?

Anton Kaplanyan:

A lot can be done with smarter algorithms, smarter compression.

You can look at Nanite, for example, in terms of smart decisions, as well as the general way of how you execute the whole pipeline at a high level.

There's a lot of work that needs to be done there. It's why we have scheduling units and all the specialized hardware to also help with this.

Patrick Cozzi:

Another area of innovation we want to talk to you about is neural graphics, which it's a relatively new field intertwining AI and graphics.

We're wondering if you could share a bit about how it's evolved and some of the innovations you're seeing there.

Anton Kaplanyan:

I started playing around with neural graphics back in 2016, back then with denoising, considering two parallel tracks, one conventional denoising algorithm and one was more experimental. How much can we push neural networks to do denoising for ray tracing? So, of course, as any technically interested person, I went through what's called five stages of ML grief, like probably many others.

First, you start with full denial of even the machinery of it. Because it's a black box, you cannot analyze it; you cannot even gain anything out of it. You would expect, okay, we'll train a big network. If it's a simple task, we'll be able to see the structure in this network, and we'll be able to distill some interesting algorithm out of it.

That doesn't work. Even if you ask a network to overfit, a gigantic network to overfit a simple task like, I don't know, addition of numbers, or stuff like that, you'll have your entropy being spread all over the network. No PCA, no other analysis would give you a good meaningful structure of what is the algorithm. This is how black box it is.

Then you go through like, “oh, it's just numerics,” and I'm like, “oh, it's so data dependent,” and so on.

Once, or if, you get towards acceptance, you can basically say, “okay, fine, we cannot analyze these billions of dimensions and billion-dimensional functions yet.”

That's the best-known numerical tool we have now for solving high-level, very complex, high-dimensional problems that we haven't even had any other tool before.

In terms of application to graphics, I think it's still very early days. There is tremendous potential in neural graphics because, at the end of the day, we are visual creatures, and 80% of traffic nowadays is videos. It's been like that, even before COVID. But it speaks for itself. We are consuming visuals, like an image is worth a thousand words.

This is where I think AI and machine learning open up an interesting opportunity for computers to pretty much speak the same visual language. Or not the same, but very similar visual language, because you can actually task the machine learning algorithms, like neural networks, with much higher level tasks.

For example, if you want to task them with some human perception task.

This is how our visual perception system works.

These are the frequencies and contrasts that we care about and temporal sensitivity function, and so on. This is “just go and fit your imagery into these constraints of human perception.” This is something that was before very hard to explain to classical graphics algorithms; because how can you explain it to ray tracing or shading, that it needs to care about these particular spatial and temporal effects and these particular details?

When you work with machine learning, this kind of high-level task definition becomes possible.

This is one very, very important part of neural graphics that gets closer to human; it gets closer to human understanding. And that, of course, comes in addition to things like hardware efficiency because machine learning hardware can be streamlined very well.

You know exactly when you can prefetch what. You can resort to very low-precision arithmetics, for example, like that we use in XeSS, for example.

There is a lot of benefits, even at the low level.

Things like, for example, predictable performance. I can set the network size in advance, and I can say, okay, I'm rendering on this low-end machine, or I'm rendering on a very high-end machine, and I need to render it, let's say, 60 frames a second. This is the size of the network I can afford, and it's going to be guaranteed to be executed at 60 frames a second on this particular machine. Now I fix this network in advance, and I pass it on to the training part, where the optimizer tries to squeeze the best quality out of this fixed performance.

This is a completely different way of thinking about performance compared to, let's say, conventional rendering pipelines where you're trying to reduce your polygon count, your textures, and whatnot, to get to the frame rate.

Here you could set the frame rate in advance if you know the platform, if you know your compute budget, your bandwidth budget, and so on. Those directions, those kinds of opportunities are very interesting future directions, future potentials for neural graphics that we can exploit.

Of course, it's still early days. The hardware is not fully widespread yet. I'm looking at consoles, and performance is still not there for everything; and it's a bit of a chicken egg because you need to invest more in hardware to get more performance. But we're getting there pretty large steps, I'd say.

Marc Petit:

How did you get the idea? Who would think that machine learning could be used to guess pixels and would be faster than computing them?

What was the original impetus going down this path?

Anton Kaplanyan:

Originally, as I said, I was not a strong believer either. So that's why, for me, the first project was a dual-track project.

Okay, we're going to hedge our bets; we're going to do the classical one. Do the neural one just to have some comparisons. How is the performance? How is the compute power? How the quality would look like?

To our surprise, the quality looked comparable. It's different in terms of artifacts and so on. For example, for denoiser, I think the hardest picture is not crazily complicated lighting. It's the Cornell box. It's the Cornell box because you can see all the imperfections, and you know the straight lines and so on.

This was the hardest picture to get for the network to learn. It's a completely different machinery.

But at the end of the day, in terms of performance and quality, it was within the same ballpark.

This is what fascinated me, as well as the ability to set the problem for the network at much higher levels. You can basically do some high-dimensional to high-dimensional mappings.

You can set the loss functions. They’re your quality error functions for the image; you wouldn't be able to explain them or map them to the classical algorithms. It opens up a whole new different field.

Marc Petit:

Interesting.

Well, let's talk about another piece of wizardry, from my perspective. We got a lot of questions about NeRFs,  because it looks like this magical way of doing 3D neural radiance fields.

Since we have you here, can you give us a bit of a primer on NeRF and how does it compare with photogrammetry, which I think is a process that's well understood?

Why is this technology so interesting?

Anton Kaplanyan:

First of all, NeRF is not just photogrammetry. I think NeRF, it started with light fields. Basically, just let's bake a five-dimensional light field, which again was something that was barely possible, if at all, before.

But it also lowers the bar for scanning because it's machine learning; you don't have to have precisely calibrated cameras and radiance set up. This is another thing.

Now you have a bunch of startups that just capture something from your phone using NeRF. But I think internally, what's really powerful about it, is a combination of two pretty young technologies.

First and foremost, the original paper showed that we can train an MLP, a multilayer perceptron, to a very high-frequency signal, which I think was not possible before. Then, once you have this tool, you can represent very rich, for example, a full image or even a full five-dimensional light field, because the curse of dimensionality does not apply that much to networks.

With this small network, which is just a few tens of megabytes of data at the end of the day, I think that's the breakthrough of the whole NeRF technology, that we can have a small, compact function approximator for very high dimensional signals for very complicated signals that we want to approximate.

If you think about it, it's just like, yes, they showed it on the light fields, and you could use it for photogrammetry, but as a mathematical tool, it has way more implications across the board, even in a graphics pipeline, than just light field capturing.

Then the second technology that they also smartly applied is another young technology. It's a differentiable rendering. And Ravi is on the paper for a reason there.

The idea of differentiable rendering is that you can do your rendering process as you would do rendering, but then you also do what's mathematically called a back-propagation through it. You basically differentiate it back and get the gradients in a sense of, if I change, let's say, this geometry, or if I change this texture, how it's going to affect the image?

This gradient allows you to optimize the whole pipeline, to land the optimal solution in the sense of what your camera sees.

I think both of these technologies are very powerful in their own ways.

The MLP part is just a function approximation, just a representation that can be powerfully applied to a lot of things. It's not just light fields; it's not just density fields. It could be a lot of things. It could be an immersive video, for example.

Then, differentiable rendering itself also allows us to have a much, much stronger prior, much stronger, and prescriptive understanding of what the camera actually sees. This is something that opens up a lot of doors in content capturing and optimizations for scanning, for example, and even just for geometry simplification and quite a few other directions.

I think this is what makes the technology itself fascinating. Internally it's a combination of two very, very powerful constructs that were first demonstrated on light fields, but then now you can see mushrooms after the rain. A lot of technologies are applying these neural representations here and there.

A lot of technologies are applying differentiable rendering as well.

There are papers that say, “You don't need an MLP. You can just do proper differentiable rendering and maybe some low linear functions.”

At the end of the day, under the hood, these technologies, they have a lot of promise for the future, both in content creation as well as in the actual neural graphics. Across the whole pipeline, from sensing to pixels, you can easily add dimensions; you can easily add different semantics. It's a richer representation; it's a richer understanding beyond just geometry and textures. You can add time as a dimension there. You could add some other representations and semantics.

For example, I want to do physics. In addition to my graphics and visuals, I want to do some rigid body simulation and keep it in the same representation.

Of course, as you go higher and higher up the stack, it could also incorporate things like, let's say, the scene graph or not just the low-level graphics representation of the scene, but also the higher-level, functional, and even maybe behavioral, representation of the scene.

That the door opens, that the traffic light actually changes lights and stuff like that. This is all possible to do within a single unified representation. This is where I think most of the power is coming from. You don't have to have all these specialized algorithms if you know a higher-level problem at hand.

In addition to that, as first steps, we also see a lot of advances in just democratization of content creation. Just because of the initial NeRF work, it becomes hopefully way easier to capture content.

There's still a long way to get it to good quality, to practicality, and through the existing pipeline, and a system that we have. But I believe it's getting there.

Marc Petit:

I feel so outdated. I thought I knew a thing or two about graphics, but I realized that I feel completely obsolete now. This is fascinating.

Patrick Cozzi:

No, it's a big paradigm shift.

Marc Petit:

Yeah.

Patrick Cozzi:

Anton, thanks for the tutorials here. We're talking about intertwining AI and graphics with neural graphics and NeRFs. But I wanted to take a step back and just get your perspective on AI in general. I mean, during our prep call for this episode, you made some really interesting observations on where AI could take us.

Anton Kaplanyan:

AI allows us to get to higher-level reasoning, higher-level programming.

Right now, it's like talking to a growing child. You start with a very simple concept when you have a baby; you start with very low-level tasks. You go here, do that, and so on. This is how, to me, at least, the classical, for example, algorithmic and computer science looks like. Now your child grows, and you start giving your child higher and higher level tasks. Do some chores at the house, or go do some groceries, and so on.

This is where you start tasking a child with much higher-level problems and assume that it'll know all the steps; they'll know all the steps behind it. It's kind of the same for me, the same feeling as with AI, with machine learning, that you can actually treat it as a child that is growing up. You can set some problems that were just unapproachable or intractable before because we couldn't even know how to solve them.

There were no algorithms. They were either too high-dimensional, or there was like nothing existed there. Now you can just say, yeah, let's do your best there and, given you have a good data pipeline and a good objective for this algorithm, it'll try to do the best.

One important thing that I think is why we have this deep learning revolution in the first place, is of course, Moore’s law. The amount of computing we're having now made it more tractable to actually train these networks.

Because at my university, college times, we could train, I don't know, a small three-layer MLP in, I don't know, overnight or something like that. Now, thanks to hardware acceleration, thanks to GPUs, we can do it reasonably well, or we could grow the network capacity, the dataset capacity.

At the end of the day, we are getting to this higher and higher-level reasoning, and higher-level programming, if you will, which basically, at the end of the day, enables us better human-computer interface.

Marc Petit:

That's fantastic. More accessibility is going to become pervasive, but every medal has two faces.

What are the challenges that we can expect from those technologies?

Anton Kaplanyan:

There are two schools, even in graphics and computer vision. It's basically the machine learning methods in general. They are highly data-driven methods, which means that it’s garbage in, garbage out. If your dataset is not faithful and reliable enough, then you're going to not have any meaningful method out of it.

That's why it's also very frustrating for some people who just started machine learning. Because I try to throw a task at it, but then with just small biases in your dataset, it doesn't perform well. I think this data, the data preparation; it’s like what you teach your growing child. Basically, it's about that.

How do you prepare your data? For example, for synthesizing humans, that's a very, very sensitive topic, and there are a lot of problems. How do you get to a balanced dataset?

This is the problem we had to solve for XeSS as well. How do you get to a balanced, reliable, and reasonably generalizable dataset?

Whereas many corner cases are well and equally represented as possible. And then the network has good guard rails on what are the problems that it has at hand. Then, of course, speaking of guardrails, this is where I think people often think about machine learning as just crowdsourced, huge, captured datasets. You capture a lot of data and then just fit it into the network. It'll do its clustering; it’s magic inside. But, I think, with graphics, we have a very interesting opportunity to actually provide these guardrails to the networks.

If you think about it, everything becomes smarter nowadays. Your dishwasher most likely has a small neural network in it already. Who’s going to train all these neural networks?

Who's going to train the neural networks that are riding on our roads, like self-driving cars, and any embodied AIs? Your vacuum cleaner, I know robots in the warehouse, on an assembly line, and, basically, any AI that we want to act intelligently, we need to teach it. We need to give it good guardrails about how the world looks like, how the world behaves, and so on.

I think this is a tremendous opportunity for graphics. It's the metaverse for AI, if you will, which goes in the direction of digital twins, synthetic data generation for AI, for any kind of robotics, any kind of simulation that relies on visual cues. It's an early and very important field for us to cover as graphics people.

It's not just machine learning for graphics; it's also graphics for machine learning. That becomes very, very important nowadays.

In the context of the future immersive experience, there is also a premise that anyone would be able to create content. Anyone would be able to populate some reasonably meaningful environments, for example. Nowadays, people populate news feeds in your social network, people populate marketplaces, and your metaverse name goes here.

Basically, we should make sure that we enable people to meaningfully create this content, these very immersive experiences, environments, and worlds.

I think there is still a lot of challenge at the higher level. It's a higher-level understanding of humans. This is where ChatGPT, for example, diffusion models, they're scratching the surface there. At the end of the day, we want to have a level of understanding that a five-year-old kid could create their own environment. Anyone could just post something which is immersive, something that they meant to create the way they meant to create.

This level of understanding and this high level of understanding, I think, is still a very challenging task that we need to work with machine learning methods, with AI, and datasets to get to.

Marc Petit:

It's an interesting whirlwind or flying wheel that we have going on because these AI techniques really are going to help commoditize 3D content creation.

Five-year-olds, as you said, can do environments. Then we can create so much content that we can teach those machines even more, using the synthetic data to train even more models. That's going to be a true explosion of the amount of content and the intelligence, and the understanding of machines of that content.

Patrick Cozzi:

Marc, on that note, I mean, if you think about 3D content becoming as democratized as maybe text content today, that's going to put a lot of demands on visual computing.

So Anton, do you have any thoughts on visual computing architectures in terms of CPU versus GPU and edge versus cloud, and how this may play out?

Anton Kaplanyan:

If you look at the very high-level picture, people's lifestyle is getting to be about more free, more lean, and just things should be more convenient. The gadgets become smaller, the lifestyle becomes more mobile, especially after COVID. People start using Starlink and so on.

If you think about this part, people would like to have the freedom of taking something with them and using it only when they need it. That's one, I would say, high-level direction.

But on the other hand, the expectations around the visual quality, the intelligence of your gadgets, is growing pretty fast. At some point, you'll probably get into the situation where your lean clients would rather focus on just content delivery both ways, while leaving the heavy lifting of quality and intelligence to some computer somewhere else.

It doesn't have to be cloud or even edge; it could be nearby. This kind of decoupling, if you will, of displays and the experience around the display is something that we are unavoidably getting into.

I think, to a degree, we're already seeing it with things like diffusion models, ChatGPT.

They're designed to have high-level conversations with humans, yet pretty much most of the compute is somewhere in the backhand; it’s something that you don’t necessarily immediately have in your pocket. Hope not. So I think I'm glad that, as Intel, we are gearing up for this wide gamut of different products, from lower power integrated GPUs in your laptop, which by the way, have ray tracing in them. It's lower power, but not that low power. They're still powerful. All the way to high-end data center GPUs like Ponte Vecchio or the Max data center GPU, which have tremendous compute power, both for AI and also for ray tracing acceleration.

Speaking of which, by the way, we even managed to run Disney's film production scene, the Moana scene, on the data center GPU at a pretty interactive frame rate. You can run film scenes on these big machines now if you need to. If you need to deliver film quality to your lean clients, to your gadgets, to your AR, VR glasses, whatever it is, this could be one way to do it.

Marc Petit:

You were at Crytek in the early days of the CryEngine, so I'm curious to hear your opinion on the evolution of the game engine market.

We've seen some consolidations going on. Any thought about the future industry? And a bonus question, because the Open 3D engine is actually the Crytek engine that you worked on, can that be a successful open-source project?

That's the bonus question. We'll come to it after.

Anton Kaplanyan:

Ecosystems like game engines are still heavily driven, in general, by the brilliant talented team that stands behind them. This is a very focused team that is driving this system in particular directions. If you think about Unreal Engine, Unity, there are many different important directions that those engines are going in.

For the Open 3D engine, I think that's going to be interesting to see because it's crowdsourced. The same goes for Dreamworks MoonRay renderer. It's also open source, so the development of the systems will likely be crowdsourced by many smaller developers, and we'll see where they're going to move the needle in this engine.

Then coming back to the commoditization, I think things like the low-level graphics APIs, as well as this long tail of platforms that we have, and more and more complicated content creation pipeline, is what made in-house engines much more expensive, which led to the consolidation of game engines that we see now.

The gamut of these platforms becomes wider and wider. Thinking about the separation of graphics from displays might be an interesting way to see how you can deliver the consistency of visual experience across this crazy number of different platforms.

Because if you want to ship your game on a very low-end Android mobile phone versus a very high-end PC, this could be a tremendous gamut that you need to cover. How do you develop your experiences for this?

This is, I think, where it will be interesting to see this separation. I think we see some of that to a degree with Lumen, Nanite, and so on, where there's a lot going on behind the scenes. Maybe even as a pre-process before these experiences, to deliver the best experiences for a particular platform, for a particular display at the end of the day.

I think that's an interesting direction. But at the higher level, I think game engines will probably, or I guess most likely, grow into much bigger platforms, much bigger ecosystems. Because even now, big game engines already have a lot of different sub-components. Here's how you create humans; here's how you create environments, this, that.

At the end of the day, you can just populate them. You have started with marketplaces, starting with massively multiplayer games. You can populate it into a bigger ecosystem. With a good feedback loop, it can become even more and more powerful and intelligent in this sense.

It's not going to be just a game engine; it’s going to be a full platform with intelligent content in it.

Marc Petit:

I think it's interesting. We saw NVIDIA come up over the past few years with a completely new simulation engine, and data center scale based on the USD sync graph.

Do you think we could see more real-time computing platforms emerge?

Anton Kaplanyan:

Yeah. In fact, we do have our own platforms. We have CARLA Simulator for self-driving cars. That's been around for quite a while. And we have our rendering stack, like Rendering Toolkit with Embree and Open Image Denoise, and OSPRay is a full-fledged renderer that can again deliver a compute visual simulation platform.

I think in this regard, it's coming back to training AI, coming back to training all of our intelligent machines that are going to be out there. That's suddenly a good and important demand.

For example, there is already a city, I think somewhere in India, where they don't give you a driver's license unless you pass a driver's test in the digital twin of their city. This is how close it gets in terms of what we call metaverse, what we call immersive experiences.

In terms of simulation, game engines, and simulation platforms certainly, it's certainly going to be an important market. I cannot tell, or I don't want to predict if it's going to be the major market, the main market, or it's going to be one of the markets. But, at the end of the day, someone needs to train our AI to be more intelligent, to be more visually intelligent.

Marc Petit:

What would it take to make the web browser a viable platform for graphics?

Are you confident that Web Assembly, Web GPU, will deliver very capable capabilities in web browsers?

Anton Kaplanyan:

The ecosystem. The web browser could be a thin layer at the end of the day.

What you need to do is you need to have a lot inside this web browser. Whether a web browser will be the right platform for it, it could be. It could be if you get to the right level of efficiency with your hardware abstraction, your APIs, and so on; then, it could deliver the promise of having some cross-platform abstraction for compute.

But then, of course, it doesn't remove the problem of consistency of experiences across different platforms because you can run on the web browser on a small phone versus a large, high-end pc.

This is where you would need to still scale appropriately inside this web browser. Or you would need to decouple it and use a web browser just as a content delivery vehicle, which is what is another possible direction.

It could be an interesting platform, an interesting platform to start with. Because, even on mobile phones, a lot of apps are nowadays just web apps at the end of the day. There is a reason for that; because it simplifies the development, simplifies the delivery, and consistency, and so on.

Patrick Cozzi:

So, Anton, it's been just a ton of fun geeking out with you here. We covered everything from GPU ray tracing to neural graphics, to NeRFs, to game engine ecosystems.

One of my favorite episodes when we look at the topics here. To wrap it up, would love for you, if you want, to give a shout-out to any person or organization.

Anton Kaplanyan:

Thanks a lot for inviting me here, Patrick and Marc. It was a pleasure to talk to you. Very good questions. A lot of fun during the conversation.

Of course, I'd like to thank Intel for giving me this opportunity. And I'd personally like to thank Raja, Raja Koduri, for bringing me over for this interesting journey of launching a new GPU. It's not that often you can participate in this event.

Of course, thanks to my wife and family and all the people who support me through my graphics journey.

Marc Petit:

Thank you, Anton. What an impressive journey. Good luck with launching new GPUs. Getting Intel into the world of GPUs is a big endeavor, so things seem to be going well. Also, we wish you even more luck and more success in the future.

Anton, it was deep. I'm going to have to re-listen to this podcast myself to make sure I got all of that, but it was a fantastic deep dive into all those topics. They are complex, but they are very important because they are, as Patrick said, a paradigm shift in our industries.

Thank you very much for enlightening us today.

Anton Kaplanyan:

Looking forward to listening to it as well.

Marc Petit:

And thank you, Patrick, and thank you to everybody who is listening to us.

Please, as usual, hit us on social. Let us know what you think and what you want to hear about. Thank you very much, everybody. Till the next episode.