CONGESTION CONTROL IN AI DATACENTERS: LEVERAGING ECN AND PFC FOR IMPROVED NETWORK EFFICIENCY

Authors

Mohitkumar Savaliya
USA

Keywords:

Congestion Control, Explicit Congestion Notification (ECN), Priority Flow Control (PFC), Artificial Intelligence (AI) Datacenters, RDMA Over Converged Ethernet (RoCE), Network Performance, Latency, Throughput, Graphics Processing Unit (GPU)

Synopsis

The surge in AI applications has sparked intense demand for high-performance computing in data centers. As these workloads expand, network congestion has become a central bottleneck, limiting performance, efficiency, and reliability. In this paper, we’ll explore Explicit Congestion Notification (ECN) and Priority Flow Control (PFC) as prominent solutions for congestion management in AI data centers. We focus on how ECN and PFC behave in AI data center networks and how they can be configured to complement one another. Our discussion reviews existing deployments and research, points out the challenges that operators face in practice, and outlines design choices that balance latency, throughput, and fairness for GPU-driven applications. While ECN and PFC each have specific advantages, a combined approach is vital for addressing the complex requirements of AI workloads, balancing latency, throughput, and fairness.

 

References

[1] K. Ramakrishnan, S. Floyd, and D. Black. The addition of explicit congestion notification (ECN). RFC 3168.

[2] IEEE. 802.11Qbb. Priority based flow control, 2011.

[3] Infiniband Trade Association. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A16: RDMA over converged ethernet (RoCE), 2010.

[4] Infiniband Trade Association. Supplement to infiniBand architecture specification volume 1 release 1.2.2 annex A17: RoCEv2 (IP routable RoCE), 2014.

[5] S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking, 1:397–413, 1993.

[6] M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010.

[7] Infiniband Trade Association. InfiniBand architecture volume 1, general specifications, release 1.2.1, 2008.

IJCET

Published

September 23, 2025