Categories
hardware

Xilinx Unveils 7nm Versal Premium: 123TB/s Bandwidth, PCIe 5.0, CXL, 112G Transceivers

Xilinx unveiled its Versal Premium lineup of Adaptive Compute Acceleration Platforms (ACAPs) that wield a brand new tackle FPGA design, development on its Versal AI Core and Prime sequence. Last 12 months, Xilinx began sampling the Prime and AI Core sequence, two of the six Versal product traces that it has deliberate, and lately it’s unveiling extra information about the higher-end Premium sequence, despite the fact that it received’t get started sampling till 2021.

Xilinx designed the Versal Premium sequence for prime bandwidth networks in space- and thermally-constrained environments. The corporate claims the Premium ACAPs be offering three instances the throughput and two times the compute density of competing answers. Xilinx says the Versal Premium delivers the similar networking good judgment density of 22 16nm FPGAs.

Image 1 of 5

(Image credit score: Xilinx)

Image 2 of 5

(Image credit score: Xilinx)

Image 3 of 5

(Image credit score: Xilinx)

Image 4 of 5

(Image credit score: Xilinx)

Image 5 of 5

(Image credit score: Xilinx)

We lined Xilinx’ ACAP idea in additional element ultimate 12 months, however as a refresher, Xilinx presented this time period as a result of a contemporary FPGA incorporates a lot more than such the programmable FPGA cloth. Xilinx has divided its Versal compute engines in three classes: scalar engines, adaptable engines and clever engines. There is, in fact, additionally a large number of I/O {hardware} and interfaces (equivalent to DDR4 and PCIe 4.0), and it’s all attached by way of a programmable network-on-chip (NoC). This makes the ACAPs adaptable to a various vary of workloads.

Density comes courtesy of TSMC’s 7nm procedure node paired with a modular design that comprises a dual-core ARM Cortex-A72 application processor and dual-core ARM Cortex-RF5 real-time processor for scalar operations, in conjunction with a PCIe Gen5 controller that helps each CCIX and CXL protocols. The chip additionally wields a DDR4 controller, (as much as) 112G PAM4 transceivers, 600G Ethernet cores (as much as 5Tb/s), and 400G crypto engines (as much as 1.6Tb/s). Xilinx says the crypto engines make the Premium lineup the one platform with hardened 400G crypto beef up for beef up for AES-GCM-256/128, MACsec, and IPsec.

The Premium sequence additionally comes with Interlaken connectivity, which is an industry-standard chip-to-chip interface that is used for switches and routers. A light-weight protocol runs around the connection, which helps other switch charges and widths. 

Xilinx ties those parts to as much as 14,000 DSP slices and 3.4 million LUTs. Overall, the chips, which is available in six flavors, include as much as 7.4 million good judgment cells. Xilinx additionally makes use of different options, equivalent to PCIe, Ethernet, Interlaken, and crypto engine options, to bifurcate the stack. 

Image 1 of 14

(Image credit score: Xilinx)

Image 2 of 14

(Image credit score: Xilinx)

Image 3 of 14

(Image credit score: Xilinx)

Image 4 of 14

(Image credit score: Xilinx)

Image 5 of 14

(Image credit score: Xilinx)

Image 6 of 14

(Image credit score: Xilinx)

Image 7 of 14

(Image credit score: Xilinx)

Image 8 of 14

(Image credit score: Xilinx)

Image 9 of 14

(Image credit score: Xilinx)

Image 10 of 14

(Image credit score: Xilinx)

Image 11 of 14

(Image credit score: Xilinx)

Image 12 of 14

(Image credit score: Xilinx)

Image 13 of 14

(Image credit score: Xilinx)

Image 14 of 14

(Image credit score: Xilinx)

Tying those options in combination calls for rapid community on chip (NOC) efficiency, and XIlinx’s 2.2 TB/s interface suits the invoice. This programmable NOC helps quite a lot of hyperlink widths and speeds, QoS ranges, and a couple of arbitration issues. The interface is claimed to perform at 7pj/b. 

In overall, Xilinx says that the similar networking good judgment density of the Ethernet, Interlaken, and Crypto cores is similar to 22 16nm FPGAs. There could also be an “built-in shell” that permits the ACAP to make use of 0 good judgment parts for networking infrastructure.

The instrument could also be a very powerful element of the ACAPs. On most sensible of the low-level Vivado, Xilinx has a higher-level Vitis building package with accelerator library, and can also be programmed with C, C++ and Python, catering to instrument builders. Thirdly, for information scientists, Xilinx helps the key AI frameworks equivalent to TensorFlow.

Performance

Xilinx has some primary efficiency claims for its Premium sequence. The ACAP has as much as 1Gb of tightly coupled reminiscence, and an on-chip bandwidth of 123TB/s, which is sort of 10x larger than Nvidia’s Tesla V100 with 14TB/s, in line with Xilinx.

 Combined with its heterogeneous engines, Xilinx says this delivers leap forward efficiency in a couple of workloads: 1.6x inference throughput in comparison to the V100 and 4.6x larger object detection efficiency, whilst beating Cascade Lake by means of 65x in anomaly detection.  

Thoughts

The Versal Premium sequence turns out a step above any of Xilinx’ 16nm FPGAs with 3x larger bandwidth and 2x larger compute density. It is upgraded to 112G PAM4 transceivers for 9Tbps bandwidth, 5Tbps Ethernet throughput and 1.6Tbps line-rate encryption by way of hardened crypto engines. It additionally has PCIe 5.0, CCIX and CXL beef up and multi-hundred gigabit Ethernet and Interlaken connectivity.

This makes it a competitor to the Agilex I-series of FPGAs, which will even carry beef up for PCIe 5.0, CXL and 112G transceivers in 2021. In phrases of compute and programmable good judgment density, is might be a step beneath the Premium sequence, because it best has 2.7 million good judgment parts, for instance.

This is as a result of Xilinx and Intel has taken a special manner. Xilinx is development out its Versal portfolio with a large number of dies, whilst Intel has opted for one base die and proliferating this sequence with a various chiplet ecosystem. Intel has hinted that it’s at a “section 2” of Agilex the usage of Foveros to stack a couple of dies, however hasn’t made any bulletins on that entrance but.

Xilinx says the Versal Premium sequence will pattern to early consumers within the first part of 2021.