Nvidia printed theon September 1, celebrating the 21st anniversary of its first GPU, the GeForce 256. The options and specs unquestionably glance spectacular, as you’ll be able to learn extra in our , , and breakdowns. However, we ended up with relatively a couple of questions, and Nvidia equipped a lot of more information that we are summarizing right here. We’ll be including a lot of this to our primary Ampere structure hub, so that is simply the brand new main points.
First, let’s communicate concerning the Ampere streaming multiprocessor (SM). The largest trade for gaming is most likely the doubling of FP32 efficiency. Each SM now has two FP32 clusters, offering for as much as 128 FMA (fused multply-add) operations consistent with cycle. Half of those are complete FP32 + INT cores, whilst the opposite part is FP32 handiest. That may sound like a possible downside, however typically talking (in particular for gaming workloads) FP32 is a very powerful, INT much less so. It’s a balanced way to spice up general efficiency with out bloating the core an excessive amount of.
To assist feed the beast (TM!), the information trail was once doubled, in conjunction with L1 bandwidth. L1 capability could also be 33% higher, with two times the partition dimension.
One of the opposite adjustments made is that Ampere can concurrently run paintings during the CUDA cores, RT cores, and Tensor cores. This permits a sport to run DLSS to upscale one body whilst on the similar time doing the CUDA and RT calculations for the following body, reducing down on rendering time and making improvements to general efficiency.
For the RT cores, Ampere additionally added capability to interpolate triangle place. This is especially vital for such things as movement blur, the place no longer each and every triangle used to render a scene is on the similar place or time. I’m nonetheless no longer an enormous fan of movement blur in video games, despite the fact that it could be extra life like having a look, however no matter. This trade doubtlessly accelerates ray traversal by way of 8X, so it is crucial addition.
That’s it for the actually new data. Much of the remaining is prior to now identified main points, however we have equipped the total slide deck beneath for many who wish to see extra. There are further main points having a look into the efficiency of Wolfentstein Youngblood, in addition to RTX IO (which we have lined somewhere else in our dialogue of).