Challenges in load balancing

In the past few years, I have worked a lot with IWAN, Cisco’s SD-WAN implementation (prior to the Viptela-acquisition). One of the advantages of SD-WAN is the ability to load balance the traffic. Load balancing is not a trivial matter. Why not?

Equal packet distribution – ECMP

If you have an office with two MPLS-or two Internet circuits and both have the same bandwidth, load balancing is simple. Or at least relatively simple. Say you have a site with two100 Mbit/s Internet circuits.

This means that we can use equal cost multi pathing (ECMP). Whether a flow ends up on link A or link B does not matter. The flow will have an equal chance of using the bandwidth it needs on the either link.

There are still some factors to consider, even when it concerns ECMP.

Flow size
Some flows will be much larger than others, for example the transfer of files via CIFS or other protocols or downloading something from the Internet, compared to, for example, Citrix traffic which generally has smaller packets and does not require much bandwidth.

Number of flows
When load balancing is performed by ECMP, some form of hashing algorithm is used to decide which link to place the flow on. This is usually done by looking at the sender and destination IP and, in some cases, can be added more entropy by also looking at the port number. If we only have a few flows, these are perhaps not distributed so evenly on the links.

For ECMP to properly share the load, meaning that the links have the same utility ratio, we need to have enough flows to increase the probability of achieving balance where there are large flows on both links. ECMP is fairly straightforward, but we may need to monitor link utilisation and factor it in when deciding the link to which a flow is to be assigned.

Unequal packet distribution – UCMP

Now think of a different situation where link A is 100 Mbit /s and link B is 10 Mbit/s. This means that we must perform unequal cost multi pathing (UCMP). Now, this was not really supported in protocols other than EIGRP, although there was advanced configuration in BGP to do something like that. Using UCMP in SD-WAN has not been the norm, therefore, I have not encountered the situation I will be describing below.

When we do not have the same bandwidth on all the links, it poses a significantly bigger challenge. Assigning a flow to a link is done at the beginning of the flow, which, for example, can be when a TCP session is about to start. At the beginning of a flow, we do not know how large the flow will end up being. It could be a small flow like SSH or a large file that is downloaded. If the flow is placed on a 10 Mbit/s link, the flow must never grow to exceed 10 Mbit/s.

If the users, prior to moving to SD-WAN, had an active/standby configuration, they could always use up to 100 Mbit/s, but when performing UCMP, there is a total of 110 Mbit/s available. If the user ends up on the “wrong” link the flow ends up being “penalised” because it cannot grow sufficiently.

Alternative solutions for load balancing

Even if we monitor link utilisation, we can never achieve perfect balance because we cannot predict the size of the flow. One alternative for solving this could be to move the flow when it grows beyond a certain size. The challenge then is how to move the flows, and not to move them too often, otherwise it could cause churn and possible packet losses. Do you move the flow back again when it reduces in size? UCMP is significantly more complex than ECMP.

Another alternative would be to perform load balancing per packet, however, this is far more complex and can result in a changed packet order. To get this to work, added intelligence is needed so that the router knows when a packet comes in to a router, the order in which it should be sent out. There are some SD-WAN implementations that can do this today but be aware that this is a highly complex function.

When considering SD-WANs from different providers, you should think about how all these things work. Don’t be afraid to ask the tough questions. The above situation can be eased somewhat by choosing suitable links for the different types of applications. The purpose of this blog is that, even with intelligent solutions you still have to make design choices and consider what your most important traffic is.

My colleagues and I  are happy to answer any questions you may have about load balancing for your business, contact us.

By Daniel Dib, Senior Network Architect, Conscia, CCIE #37149, CCDE #20160011