Wednesday 10 May 2017

What is the best bonding mode for TCP traffic such as NFS, ISCSI, CIFS, etc? (c) RedHat

What is the best bonding mode for TCP traffic such as NFS, ISCSI, CIFS, etc?

Environment

  • Red Hat Enterprise Linux (all versions)
  • Bonding or Teaming
  • Large streaming TCP traffic such as NFS, Samba/CIFS, ISCSI, rsync over SSH/SCP, backups

Issue

  • What is the best bonding mode for TCP traffic such as NFS and Samba/CIFS?
  • NFS repeatedly logs nfs: server not responding, still trying when no network issue is present
  • A packet capture displays many TCP retransmission, TCP Out-of-order, RPC retransmission, when there should be no reason for this.

Resolution

Use a bonding mode which guarantees in-order delivery of TCP traffic such as:
  • Bonding Mode 1 (active-backup)
  • Bonding Mode 2 (balance-xor)
  • Bonding Mode 4 (802.3ad aka LACP)
Note that Bonding Mode 2 (balance-xor) requires an EtherChannel or similar configured on the switch, and Mode 4 (802.3ad) requires an EtherChannel with LACP on the switch. Bonding Mode 1 (active-backup) requires no switch configuration.

Root Cause

The following bonding modes:
  • Bonding Mode 0 (round-robin)
  • Bonding Mode 3 (broadcast)
  • Bonding Mode 5 (balance-tlb)
  • Bonding Mode 6 (balance-alb)
Do not guarantee in-order delivery of TCP streams, as each packet of a stream may be transmitted down a different slave, and no switch guarantees that packets received in different switchports will be delivered in order.
Given the following example configuration:
.---------------------------.
| bond0 in 0 (round-robin)  |
'---------------------------'
| eth0 | eth1 | eth2 | eth3 |
'--=---'--=---'---=--'---=--'
   |      |       |      |
   |      |       |      |
.--=------=-------=------=--.
|          switch           |
'---------------------------'
The bond system may send traffic out each slave in a correct order, like ABCD ABCD ABCD, but the switch may forward this traffic in any random order, like CADB BDCA DACB.
As TCP on the receiver expects to be presented a TCP stream in-order, this causes the receiver to think it's missed packets and request retransmissions, to spend a great deal of time reassembling out-of-order traffic in to be in the correct order, and for the sender to waste bandwidth sending retransmissions which are not really required.
The following bonding modes:
  • Bonding Mode 1 (active-backup)
  • Bonding Mode 2 (balance-xor)
  • Bonding Mode 4 (802.3ad aka LACP)
Avoid this issue by transmitting traffic for one destination down the one slave. Mode 2 and Mode 4's balancing algorithm can be altered by the xmit_hash_policy bonding option, but they will never balance a single TCP stream down different ports, and so will avoid the problematic behaviour discussed above.
It is not possible to effectively balance a single TCP stream across multiple bonding or teaming devices. If higher speed is required for a single stream, then faster interfaces (and possibly faster network infrastructure) must be used.
This theory applies to all TCP streams. The most common occurrences of this issue are seen on high-speed long-lived TCP streams such as NFS, Samba/CIFS, ISCSI, rsync over SSH/SCP, and so on.

Diagnostic Steps

Inspect syslog for nfs: server X not responding, still trying and nfs: server X OK messages when there are no other network issues.
Inspect a packet capture for many occurrences of TCP retransmission, TCP Out-of-Order, RPC retransmission, or other similar messages.
Inspect bonding mode in /proc/net/bonding/bondX

1 comment:

  1. I read your blog and found it helpful. Thanks for writing well cleaned and nice blog on TCP/IP

    ReplyDelete