# White Paper High-Performance SMPTE ST 2110 System-on-Chip on Intel® Stratix® 10 Devices ## **Executive Summary** The rise in popularity of UHD and the increasing numbers of video channels required for life production is driving broadcast equipment requirements higher, such as the transition from 10/40 Gbps Ethernet (GbE) towards 25/100 GbE for transport and processing of UHD/8K video signals. This white paper presents how the Intel® Stratix® 10 device family can enable Ross Video to meet the increasing performance demands in the media and entertainment market, and particularly in the scalability of our SMPTE ST 2110 system-on-chip solution beyond 100G interfaces for UHD workflows. This paper covers a design tested evaluation results in: - Fabric utilization and capacity including the percentage of FPGA that is available for other processing functions - Throughput capacity of device family to scale beyond 100G The Intel Stratix 10 device family is attractive as the next generation broadcast processing platform with its native 25G/100G support, faster fabric, and improved memory interfaces. After the performance evaluation of Ross's SMPTE ST 2110 SWIFT\* platform on an Intel Stratix 10 device targeting the Intel Hyperflex™ FPGA Architecture (initially architected for an Intel Arria® 10 device), we achieved a 195% improvement in fabric speed with 10X the Ethernet bandwidth and 8X the number of supported UHD streams by increasing approximately 2.5X the resource usage in Intel Stratix 10 FPGAs.<sup>†</sup> # **Performance Comparison** #### Ross Video SWIFT\* ST2110 Platform Figure 1 shows Ross Video's existing SWIFT\* system-on-chip implements a full SMPTE ST 2110 stack on an Intel Arria 10 SoC. It is a flexible design that provides scalable numbers of video channels and Ethernet ports. Figure 1. Ross Video's Existing SWIFT\* System-on-Chip #### **SWIFT Configuration** Table 1 shows the supported capabilities of the designs targeting the Arria 10 and Stratix 10 along with the relative increase in performance. Table 1. SWIFT Configuration | | Intel Arria 10 Device<br>Design:<br>Intel Arria 10 660 SoC | Intel Stratix 10 Device<br>Design:<br>Intel Stratix 10 2800 GX<br>SoC | Relative<br>Performance /<br>Datapath Increase | |--------------------------------|------------------------------------------------------------------|-----------------------------------------------------------------------|------------------------------------------------| | Ethernet | 2X 10 GbE w/1588 | 2X 100 GbE w/1588 | 10X | | Serial digital interface (SDI) | 2X 3G-SDI IN + OUT | 8X 12G-SDI IN + OUT | 16X | | DDR SDRAM | 1X DDR4 SDRAM 64 bit<br>at 933 MHz | 2X DDR4 SDRAM 72 bit<br>at 933 MHz | 2.25X | | Datapath | Up to 25 Gbps receiver (RX) and transmitter (TX) with redundancy | 200 Gbps RX and TX with redundancy | 8X | #### Fabric Utilization Table 2 shows the fabric utilization comparison between the Intel Arria 10 SX 660 device and Intel Stratix 10 GX 2800 device designs; over 75% of the Intel Stratix 10 SX 2800 device fabric remains available for implementing other processing and value-added functions. Table 2. Fabric Utilization | | Intel Arria 10 Device<br>Design | Intel Stratix 10 Device<br>Design | Increase in Fabric<br>Utilization | |----------------------------------|---------------------------------|-----------------------------------|-----------------------------------| | | (% used) | (% used) | | | Adaptive logic modules (ALMs) | 90,000 (36%) | 228,000 (25%) | 2.5X <sup>†</sup> | | M20Ks | 730 (34%) | 1,300 (11%) | 1.8X <sup>†</sup> | | Digital signal processors (DSPs) | 30 (2%) | 160 (3%) | 5.3X <sup>+</sup> | ## External Memory Throughput Table 3 shows the DDR memory throughput comparison between the Intel Arria 10 and Intel Stratix 10 device designs. | | Intel Arria 10 FPGA Design | Intel Stratix 10 FPGA Design | |----------------------------|----------------------------------------------------------------------|------------------------------------------------------------------------| | Memory data width | 64 | 72<br>(Use error correction code<br>(ECC) bits for extra bandwidth) | | DDR4 SDRAM instances | 1 | 2 | | Total DDR4 SDRAM bandwidth | 119.466 Gbps | 268.800 Gbps | | Required design bandwidth | 12 Gbps for writes<br>6 Gbps for reads<br>18 Gbps total <sup>1</sup> | 96 Gbps for writes<br>96 Gbps for reads<br>192 Gbps total <sup>2</sup> | $<sup>^{1}</sup>$ In the Intel Arria 10 FPGA design, the primary and redundant streams are both written to the DDR4 SRAM. The design supports two redundant 3G streams, so 2 x 2 x 3 Gbps = 12 Gbps. The data is read once out of memory so 2 x 3 Gbps = 6 Gbps. #### Conclusion The analysis shows that the improved architecture delivers 10X the Ethernet bandwidth and 16X the video performance while consuming 2.5X the ALMs. $^{\dagger}$ The Intel Hyperflex FPGA Architecture allows running the datapath ~2X faster<sup>†</sup>. This should allow other processing functions to also run at much higher clock rates resulting in better utilization thus lowering total cost of ownership (TCO) compared to other FPGA architecture technologies. While the DDR SDRAM bandwidth may be the ultimate limit on how much video processing is possible in the Intel Stratix 10 GX devices, the Intel Stratix 10 MX devices with its tightly coupled 512G of on-die memory promises to eliminate this bottleneck. The Intel Stratix 10 device provides a high-performance platform for next-generation broadcast applications requiring 100GbE and UHD video stream support with abundant resources left for additional video processing functions. <sup>©</sup> Intel Corporation. Intel, the Intel logo, the Intel Inside mark and logo, Altera, Arria, Cyclone, Enpirion, Experience What's Inside, Intel Atom, Intel Core, Intel Xeon, MAX, Nios, Quartus and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. See Trademarks on intel.com for full list of Intel trademarks. \*Other marks and brands may be claimed as the property of others. <sup>&</sup>lt;sup>2</sup> The new architecture will be smarter about tracking redundant streams so that we only write packets to the DDR4 SDRAM once when they are received on both the primary and redundant links. <sup>†</sup> Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit <a href="https://www.intel.com/benchmarks">www.intel.com/benchmarks</a>.