Posts tagged ‘broadband’

I hate IPv4 link broadcast interface (e.g. Ethernet) addressing semantics.  To recap, if I have two boxes on each end of a point-to-point link (say between a gateway and an end host), we address as follows (for example):

  • 10.1.1.0: Network address (reserved)
  • 10.1.1.1: Host 1 (gateway)
  • 10.1.1.2: Host 2 (end host)
  • 10.1.1.3: Broadcast address.

That’s four IP addresses, for a link to a single host.  Hello?  Haven’t you heard the news?  IP addresses are running out!

Some folks manage to get away with using /31 masks, e.g.

  • 10.1.1.4: Host 1 (gateway)
  • 10.1.1.5: Host 2 (end host)

which is just wrong.  Better in terms of address usage (two addresses instead of four), but still just plain wrong. An you’re still wasting addresses.

The PPP folks a long time ago figured that a session, particularly in client to concentrator type configurations, only needs one IP address. A “point to point” interface has a local address, and a remote address, of which only the remote address needs to be stuffed in the routing table.  The local address can be the address of the concentrator, and doesn’t even need to be in the same subnet.

So why can’t my Ethernet interfaces work the same way?

A point to point link really doesn’t have broadcast semantics.  Apart from stuff like DHCP, you never really need to broadcast — after all, our PPP friends don’t see a need for a “broadcast” address.

Well, we decided we had to do something about this.  The weapon of choice is NetGraph on FreeBSD.  NetGraph basically provides a bunch of kernel modules that can be linked together.  It’s been described as “network Lego”.  I like it because it’s easy to slip new kernel modules into the network stack in a surprising number of places. This isn’t a NetGraph post, so I won’t spend more verbiage on it,but it’s way cool. Google it.

In a real point-to-point interface, both ends of the link know the semantics of the link.  For Ethernet point-to-point addressing, we can still do this (and my code happily supports this configuration), but obviously both ends have to agree to do so. “Normal” clients won’t know what we’re up to, so we have to do this in such a way that we don’t upset their assumptions.

So we cheat. And we lie. And worst of all,we do proxy ARP!

What we do is tell our clients that they are on a /24 network. Their IP address is, for example, 10.1.2.5/24, and the gateway is 10.1.2.1. Any time we get a packet for 10.1.2.5, we’ll send it out that interface, doing ARP as normal to resolve the remote host’s MAC address.

Going the other way, we answer ARP requests for any IP address in 10.1.2.0/24, except 10.1.2.5, with our own MAC address.  That means that if they ARP for 10.1.2.6, we’ll answer the ARP request, which directs that packet to us, where we can use our interior routes to route it correctly.  In our world, two “adjacent” IP addresses could be on opposite sides of the network, or it could be on a different VLAN on the same interface.

The result is one IP address per customer.  We “waste” three addresses per 256, the network (.0), gateway (.1) and broadcast (.255), and we have to be a bit careful about what we do with the .1 address — it could appear on every router that is playing with that /24.  But we can give a user a single IP address, and put it anywhere in the network.

We can actually have multiple IP addresses on the same interface; we do this by having the NetGraph module have a single Ethernet interface but multiple virtual point-to-point interfaces.  So if we want to give someone two IP addresses, we can do that as two, not necessarily adjacent, /32 addresses.  We don’t answer ARPs for any of the assigned addresses, but do answer everything else. The module maintains a mapping of point-to-point interface to associated MAC address.

I found myself explaining this one at Curry tonight, in the context of discussing fast broadband.

Basically, if you have a reliable stream protocol like, to take a random example, TCP, and you’re not doing anything imaginative with it, you run into the following problem:

Every byte you send might need to be resent if it gets lost along the way.  So, you buffer whatever you send up until you get an acknowledgement from the other end.  Let’s say, for argument’s sake you use a 64k buffer. We call this buffer the window, and the size is the window size.

Now, let’s say you have a looooonnnngggg path between you and your remote endpoint. Let’s say it’s 200 milliseconds, or 1/5th of a second. This is pretty reasonable for an NZ-US connection — the speed of light is not our friend.

And finally, for simplicity sake, let’s say that the actual bandwidth over that path is Very High, so serialisation delays (the time taken to put one bit after the next) are negligible.

So,  if I send 64k bytes (or 512 k bits) worth of data, it takes 200 ms before I get an acknowledgement. It doesn’t matter how fast I send my 64k; I still have to stop when I’ve sent it.  200 ms later, I get a bunch of acknowledgements back, for the whole 64k (assuming nothing got dropped), and I can now send my next 64k.

So the actual throughput, through my SuperDuperBroadband connection, is 64k bytes per 200 ms, or 2.5 Mbps.

To turn this around, if I want 2.5 Mbps at 200 ms, I need a 64k byte window; if I want 5 Mbps on a 200 ms path, I’m going to need to up the window size to 5 Mbps times 200 ms = 128 k bytes.

That window size calculation is the bandwidth delay product.

There’s ‘s the theory. Pick a big window size and go fast. Except:

  1. You don’t get to pick. Even if you control your application, for downloads you can ask for a bigger window size, but you don’t necessarily get it. Probably, you’ll get the smaller of what the applications at either end asked for.
  2. Standard, 1981 edition TCP has the window (buffer) size that can be communicated by the endpoints maxed out at 64k. This isn’t the end of the world; in 1992 Van Jacobson and friends rode to the rescue with RFC 1323, which allows the window size to be scaled, to pretty much anything you like. But most TCP stacks come with a default window size in the 64k-ish range, and may applications don’t change it.
  3. Even if both ends of a TCP session ask for and get a large maximum window size, they don’t start with it. TCP congestion control requires that everyone start slowly (it;s called slow start), and this is done by starting with a small window size, and increasing it as the acknowledgements flow in and the sending end can get an idea about how much bandwidth there is.  So if your application uses lots of short TCP sessions rather than one long one, you’ll never reach your maximum window size and therefore never saturate your connection.

What to do? It depends what you’re trying to achieve.  For file transfers, run lots of TCP sessions side by side – can anyone say BitTorrent? Local caching helps for web traffic; move the content closer, and the bandwidth delay product is less. Use a different protocol. I have to say I’ve seen quite a few UDP-based file transfer protocols come and go, because tweaking TCP parameters at both ends is usually a darn sight easier than getting a new protocol right. (see don’s law).

What it comes down to, is that if all you’re going to use your UltraSuperDuperFast broadband connection for is downloading videos from US servers, you’re going to be disappointed. The real key to making this useful is local, or at least, locally hosted, content. Preferably located right by the fibre head-ends. It’s a parallel stream to the effort to get the fibre in the ground and get it lit, and it needs to be attended to PDQ.