At NZNOG 2012, I presented our work on applying point-to-point semantics to Ethernet-like interfaces, described in my earlier post, Broadcast Interface Addressing Consiudered Harmful.
The slides are available here.
We’ve done a bit more work on this since the original article. One thing that occurred to us was that if are prepared to keep maiking ARP requests for a client, you know whether the link is alive or not. In fact, you can do ARP to a host even if you’re not really talking to it.
Consider: We have an IP host, say 10.99.1.11/24. We tell it that its default gateway is 10.99.1.1. We answer for all ARP requests the host makes, except for itself (see the earlier paper).
But now instead of one upstream router, we have two. And furthermore, we have the two routers, using 10.99.11.2 and 10.99.11.3 respectively as their local IP addresses, i.e. the addresses they will put in their ARP packets (and in any ICMP packets generated from the interfaces). We still tell the client its default gateway is 10.99.11.1.
The two routers both ARP for the host. Both routers know if they can reach the host. Between them, via a “back channel” (i.e. a protocol running over the backbone), they agree which host should be the “active” router for that host.
The active router simply behaves as the upstream router as previously described. The inactive router does nothing more than make ARP requests for the host, and report its availability. This way, if the active router stops participating in the information protocol (i.e. dies), or the active router loses contact with the host, and the inactive router can still contact the host, the inactive router can take over the active role.
As it does this, it can generate an unsolicited unicast ARP reply to the host, to inform it that the “default” IP address (10.99.1.1 in our example) has changed. Other addresses will sort themselves out depending on the host’s ARP caching strategy. Ideally, the client host will have a fairly rapid ARP time-out and will retry its broadcast ARP for any such addresses.
This approach has advantages over protocols like VRRP. VRRP works by changing the interface MAC address to a “shared” address, so that IP clients don’t know that there has been a change when the active router swaps over. While that makes for a potentially more rapid fail-over, it comes with a number of disadvantages:
- The shared MAC address changes requires a change to the MAC table on layer 2 switches;
- There is some risk of MAC address collisions, especially in Q-in-Q (stacked VLAN) configurations;
- the VRRP protocol is visible (multicasted) on the client VLAN;
But the major advantage of this approach is that since there is a handshake with the end client. VRRP and similar protocols have no such handshake; they’re fine for detecting and replacing a failed router, but where the failed component is intervening layer 2 infrastructure, VRRP has no way of knowing that the host is not reachable from the active host, but is reachable from the inactive one. For example:
- Switch X connects to Y, and Y to Z
- Client C connects to switch Y
- Client D connects to switch X
- Router A connects to switch X, and is active for clients C & D
- Router B connects to switch Z, and is inactive for client C & D
If the link between switches X and Y fails, Router A loses connectivity to Client C. With ARP handshaking, this loss of connectivity is detected and handled by failing over advertisement of Client C’s address to Router B. Furthermore, Client D remains reachable from Router A (and indeed connectivity is lost from Router B), but since each client IP address is processed independently, the active router for that host does not change.
We believe this is applicable to a number of situations, especially Internet access networks, be they in a data centre or layer-2 metropolitan access networks.