Author Archive

Trades Hall

Trades Hall, seen from near the corner of Vivian St and Taranaki St.

This is Trades Hall, in Vivian St, symbolic home of the union movement in Wellington since the 1920s. On 27 March, 1984, during the final months of the Muldoon Government, caretaker Ernie Abbot went to move a suitcase that had been carelessly left in the entrance into an office for safe keeping. The ensuing blast blew debris, including Abbot’s dog, Patches out into the street, and killed Abbot instantly. The case remains unsolved today.

But this isn’t about an act of presumed political violence in the ’80s.

In 1996, a fledgling web development company, associated with the union movement (complete with posters of Marx & Lenin on the office walls) needed better Internet access than their dial-up could provide. yet within their meagre budget. And at NetLink, we had just built a wireless service.

We didn’t call it “wifi” then, and the Proxim gear we were using was expensive, and ran at about half a megabit or so, much less than the 802.11 standards that came out over the nest few years. 2.4 GHz wasn’t as crowded as it is now. With a clear line of sight and outdoor, directional antennas, we’d had it providing access at well over a kilometre, and the Cotton Building at Victoria University has a fantastic view over the entire CBD from its high Kelburn campus.

Coverage map

Original NetLink coverage map, circa 1997

From the roof of Trades Hall, we couldn’t see the Cotton Building antennas. The building is three stories high – generous, 1920s stories, admittedly, but an adjacent building was twice the height. The rear corner of the building could see past that, though, but blocking that view was the Marion St complex, retail, plus three levels of car parking and another three levels of apartments. It looked hopeless.

But I figured it was just a question of how tall the pole needed to be, and I suspected that it wouldn’t be unreasonable. By walking down the street a bit, we could get a line of sight to the Cotton Building, over the Marion St apartments, and, crucially, over the corner of Trades Hall.  We estimated that a pole of about five metres or so would intercept that line.

NetLink’s wireless service, and indeed NetLink itself (bought out by Telstra in 1999) are long gone. But that pole is still there, with its “patch” antenna still attached. You can see it on the far right of the photo above, and magnified below.

5m pole on corner of Trades Hall

Reaching for the sky

 

“ShellShock” … a truly stunning example of an ill-considered feature.

For those who live under a rock, or weren’t paying attention, the so-called ShellShock bug as stated by most is that if you create an environment variable in the form: name='() { :; } ; command’ and start Bash, command will be executed unconditionally when Bash starts. Which isn’t normally a problem, but if Bash is the default shell, and (say) a web script executes a system() call to run a system command, it’s going to run Bash. And since CGI scripts (and things that behave like CGI) put things they got from the original web client’s HTTP headers, that basically provides a means of running whatever you want in the context of the web application. Ugly.

Of course there are now patches, now that the white hats know about the problem, although how long the black hats have known and were exploiting it, no-one can say.

So let’s look at the problem in detail. (If you aren’t familiar about Unix-like OSes and shell programming you can stop reading now).

Bash has a feature that allows a function to be exported in the environment and imported from the environment. For example,

$ foo() { echo i am foo ; }        # Define a function foo
$ foo                              # Execute it
i am foo
$ bash                             # Start a subshell
$ foo                              # foo is not defined in the subshell
bash: foo: command not found
$ exit                             # Return to the outer level
$ export -f foo                    # export foo to the environment
$ bash                             # Start another subshell
$ foo                              # foo is now available to the subshell
i am foo

Now the mechanism that Bash uses to implement this feature is simple. Too simple. Internally, Bash maintains separate tables of variables and functions. On starting, it imports the environment into the list of variables. This is true of all Bourne-compatible shells like Bash. But Bash has a couple of special cases, one of which is that it can place functions into the environment too. (It doesn’t by default; you have to use export -f function-name to do this.)

The environment is pretty straightforward; it is simply a list of strings in the form name=value. So how does Bash store and retrieve a function?

It’s simple. Too simple. It looks for the string “() {” (that is, open paren, close paren, space, open curly). In our example, foo() is exported as “foo=() { echo i am foo ; }“. When Bash starts, it recognises the “() {“, rewrites the line as “foo() { echo i am foo ; }“, and hands it straight to its command interpreter for execution, just as if it had been entered like the first line of the example.

Prior to the patches coming out, that’s all it did. It didn’t check to see if the definition had anything after the closing curly bracket. So if you put anything in the environment that looked like “function-name='() { function-definition } ; other-commands“, other-commands would be unconditionally run. The patches attempt to stop other-commands from being executed.

As I write this, most patches out there are flawed, because there are other things that can go badly awry with this. And that’s not a surprise, because the basic action is still, fundamentally, hand a piece of arbitrary text, of unknown source, to the command interpreter. How could that possibly go wrong?

Let’s step back a bit here. The environment is a place for programs to put bits of data for programs running in sub-processes to pick up. Usually, this is benign; the sub-processes generally only look for variables they want, and can take or leave the data. There are of course many examples of shell scripts executing environment variables as shell code, because they haven’t quoted the expansions properly, but generally, you can write secure shell script.

But Bash’s function export/import feature fundamentally changes that model. It allows the code that the script is executing to be changed by data inherited from outside its control, and before the script takes control.

For example, let’s just assume that all the patches to Bash work, and the functionality is reduced to only ever allowing a function to be imported, and never having any other nasty side effect. I can still do this:

$ cd() { echo I am a bad man ; }  # Redefine the cd shell builtin
$ export -f cd                    # Export it
$ cat x.sh                        # x.sh is just a script that does a cd
#!/bin/bash
cd /home
$ ./x.sh                          # And run the script
I am a bad man

The implications? If I can control the environment, I can control the operation the commands executed by any Bash script run from my session, including, for example, any script launched by a privileged program. And if /bin/sh is linked to Bash is the default shell, any shell command launched via a system() call is also a “bash script”, since system(“command“) simply spawns a sub-process, and in it, executes /bin/sh -c command.

When I look at the function import feature of Bash, my reaction is, why the hell did anyone think this was a good idea?

I’m usually not keen on removing features. In my experience, if you think nobody would do it that way, you’re probably wrong (see don’s law). But this one is just bad for so many reasons it’s ridiculous. It’s not needed for /bin/sh compatibility. As far as I can make out, it’s rarely if ever used at all. So if there’s a candidate for a featurectomy, this is it. (If you want to do this, the offending code is in the function initialize_shell_variables() in the file variables.c of the Bash source code at ftp.gnu.org/gnu/bash/.)

Or perhaps we should all just do what FreeBSD and Debian Linux have already done, and use a smaller, lighter shell (such as Dash) for shell scripts (installed/linked as /bin/sh), and relegate Bash to interactive command interpreter duties only.

Band-aid patching around this bug without removing the underlying issue – that Bash imports code from an untrusted source – is only addressing part of the problem.


Edit: There are of course now patches in play which do a few things; the band-aids referred to above, and a new one to move the exported functions into environment variables named BASH_FUNC_functionname. I’m not sure that the latter significantly improves security of the “feature”.

However, there is one way to deal with commands being passed to /bin/sh. Bash recognises when it is executed as “sh”, and makes some assumptions. This patch (to Bash 4.3 patch 27) makes Bash refuse to import functions when executed as “sh”. The advantage of this is that commands invoked from system(), and scripts that specify their interpreter as “#!/bin/sh” (and therefore should not expect Bash-isms to be present) will not be vulnerable to any abuse of the function export/import feature.

Don’t get me wrong, I am still advocating a complete featurectomy. But this might be more acceptable to those who think importing random functions from who knows where is somehow a good idea…

Back in the Dark Ages, when dinosaurs ruled the earth … yeah, say the mid 1990s, early ISPs tended to offer “free” email service as part of their connection plans. It was cheap to do; the email usually just took the form of a POP email box, via which you downloaded your email with a client such as Eudora or POPmail for those reprobate MS-DOS users who loved their text-mode clients.

Your email address was usually something like your-dialup-username@the-isp’s-domain-name, e.g don@netlink.co.nz. There were a bunch of reasons for this.

  1. Email was an early “killer app” for the Internet. Giving the customers an email address got them on and able to do something useful with the ‘net, back when the Web was still in its infancy.
  2. Domain name services were offered as a premium service, and were often expensive in terms of the effort and domain name fees required to provide; corporate customers with permanent connections would usually provide their own email service.
  3. Having the ISP’s domain name in all its customers’ email addresses provided brand recognition.
  4. The requirement to change email address created a disincentive for customers to change provider.

The world has changed a bit since then.

There are a bunch of email providers, like Hotmail, Yahoo! and GMail, which will happily give you an email address, and a very nice web interface, which you can use to get at your email from anywhere. There’s absolutely no need to get yourself tied to a specific provider. (There is one caveat though, and that’s that if you’re not paying for the service, you are not a customer.)

Point 1 no longer applies. You don’t need the ISP’s email service.

There are now many commercial domain hosting services available. Granted, they are not free, although many are cost effective, but these provide good email services, including hosted IMAP service (far superior to the old POP service, which assumed you’d only ever get your email on one computer), server-side filtering, spam removal and so-forth, as well as web hosting options. The days of the ISP manually configuring a “virtual domain” onto its web and email servers, and charging a premium price for it, are long gone.

The game of providing email has changed.The service isn’t a case of holding mail in a temporary spool for later download by a single desktop computer. A decent email service stores, and backs up, all email, so that it can bet retrieved from multiple desktop, portable and mobile clients. Spam processing is a major drain on resources; many folk don’t understand that it’s war out there – spam is driven by large commercial interests who pay highly organised criminals to spam, and to attack computers to create the means to spam. So not only do you not want to be the target for these gangs, not being that target is actually cost effective. Automation makes configuring domain names, email and web hosting easy and cheap for suitably organised providers, and domain name registration fees are down to very low prices. For prices in the low hundreds per year, or less, you can have your own domain name, as many email addresses as you need within it, and a smart web host running easy to operate software (such as WordPress, which I’m using to write this).

The last two reasons for ISPs providing email are for the their benefit, not yours. They get the brand recognition. They get to keep you as a customer, or at least on their customer list, for long after their use-by date has passed.

Email has never, ever, been a “free” service; somewhere, somehow, the providers of the service have been making a buck out of it. Maybe it’s in customer retention, maybe it’s in the brand recognition. (It was Telecom Xtra’s explicitly stated goal in its early days to make “xtra.co.nz” a recognised brand.) Maybe it’s in advertising. When you buy that fancy domain / web hosting package with email? Well, the provider has probably spent as much if not more on the email part than on the web hosting part. Which brings me to a simple question.If domain hosting is so cheap, who do I still see @xtra.co.nz email addresses painted on the sides of vans, on billboards and on business cards? The money you spent on that isn’t promoting your business, it’s promoting Telecom’s. Why would you do that?

What is your email worth to you?

What would you do if xtra.co.nz was no longer available?If you’re no longer a Telecom customer, you’re likely to see your @xtra.co.nz email address axed in the near future, unless you pay them to keep it. If changing your address means reprinting your stationery and repainting signs, and losing email from customers that haven’t noticed that your email address has changed, that’s a high price to pay for a “free” service.

So, c’mon. In NZ, we have a domain registration system that’s the envy of the world (and I’m proud to say I’ve had a bit to do with that). Hosting your email has never been so easy or so cheap, at a time that trying to do it yourself has never been so difficult. How you present yourself to an increasingly digital world is important to how others see you, and whether they want to do business with you.

So once again, what is your email worth to you?

A recent posting on an InternetNZ list mailing reminded me of just how far we have come. In March, 1995, I took the minutes of the New Zealand Internet Society of  steering group meeting.

Just so we’re clear on what the Internet was back then, the Web was only just beginning to get traction; typical data rates were 48 kbps; establishments such as universities had rates of up to 256 kbps, the total amount of Internet bandwidth out of the country was (I think) 384 kbps. That’s the same amount as six phone calls. Most traffic was email and file downloads using FTP. Interactive services usually required a terminal session using Telnet. Dial-up Internet was only just becoming available; most services that you could use from home required you to dial into an Internet-connected computer service using a terminal emulator, and running your mail program and FTP downloads from there; if you wanted to download a file to your computer, you used a file transfer program like Zmodem to suck it down from the service provider’s computer after the FTP download had finished.

So the Internet was still a new thing. We were still trying to get to grips with how things should be done. So far, all the officialness required was being done through the Tuia Society, which was simply not equipped to address interests outside the immediate research and education community. It did, to its credit, recognise that this baton needed to be passed onto a more broad-based organisation. The March meeting was to explore the possibility of creating an New Zealand Internet Society, possibly as a chapter of the international Internet Society.

The technology to record this ground breaking meeting? Pen and paper.

Here, then, are the minutes to that meeting: Continue reading ‘Old school’ »

At NZNOG 2012, I presented our work on applying point-to-point semantics to Ethernet-like interfaces, described in my earlier post, Broadcast Interface Addressing Consiudered Harmful.

The slides are available here.

We’ve done a bit more work on this since the original article. One thing that occurred to us was that if are prepared to keep maiking ARP requests for a client, you know whether the link is alive or not. In fact, you can do ARP to a host even if you’re not really talking to it.

Consider: We have an IP host, say 10.99.1.11/24. We tell it that its default gateway is 10.99.1.1. We answer for all ARP requests the host makes, except for itself (see the earlier paper).

But now instead of one upstream router, we have two. And furthermore, we have the two routers, using 10.99.11.2 and 10.99.11.3 respectively as their local IP addresses, i.e. the addresses they will put in their ARP packets (and in any ICMP packets generated from the interfaces). We still tell the client its default gateway is 10.99.11.1.

The two routers both ARP for the host. Both routers know if they can reach the host. Between them, via a “back channel” (i.e. a protocol running over the backbone), they agree which host should be the “active” router for that host.

The active router simply behaves as the upstream router as previously described. The inactive router does nothing more than make ARP requests for the host, and report its availability. This way, if the active router stops participating in the information protocol (i.e. dies), or the active router loses contact with the host, and the inactive router can still contact the host, the inactive router can take over the active role.

As it does this, it can generate an unsolicited unicast ARP reply to the host, to inform it that the “default” IP address (10.99.1.1 in our example) has changed. Other addresses will sort themselves out depending on the host’s ARP caching strategy. Ideally, the client host will have a fairly rapid ARP time-out and will retry its broadcast ARP for any such addresses.

This approach has advantages over protocols like VRRP. VRRP works by changing the interface MAC address to a “shared” address, so that IP clients don’t know that there has been a change when the active router swaps over. While that makes for a potentially more rapid fail-over, it comes with a number of disadvantages:

  • The shared MAC address changes requires a change to the MAC table on layer 2 switches;
  • There is some risk of MAC address collisions, especially in Q-in-Q (stacked VLAN) configurations;
  • the VRRP protocol is visible (multicasted) on the client VLAN;

But the major advantage of this approach is that since there is a handshake with the end client. VRRP and similar protocols have no such handshake; they’re fine for detecting and replacing a failed router, but where the failed component is intervening layer 2 infrastructure, VRRP has no way of knowing that the host is not reachable from the active host, but is reachable from the inactive one. For example:

  • Switch X connects to Y, and Y to Z
  • Client C connects to switch Y
  • Client D connects to switch X
  • Router A connects to switch X, and is active for clients C & D
  • Router B connects to switch Z, and is inactive for client C & D

If the link between switches X and Y fails, Router A loses connectivity to Client C. With ARP handshaking, this loss of connectivity is detected and handled by failing over advertisement of Client C’s address to Router B. Furthermore, Client D remains reachable from Router A (and indeed connectivity is lost from Router B), but since each client IP address is processed independently, the active router for that host does not change.

We believe this is applicable to a number of situations, especially Internet access networks, be they in a data centre or layer-2 metropolitan access networks.

Juha Saarinen dropped me a note a week or two back, asking for an update to my last post, in the wake of the IANA IP address pool finally running out and the recently announced successful bid for Nortel Networks’ IP address space by Microsoft for inclusion in NZCS Newsline.

The published article can be found here, and is different enough from the previous version to warrant re-posting.

Continue reading ‘IPocalypse Now’ »

The IPocalypse is upon us. There are seven /8 IPv4 address blocks left! Soon there will be six. Then five.

On that fateful day, when the sixth to last /8 block is assigned, the five Regional Internet Registries (RIRs) will receive one each of the remaining five /8s for final allocation. This will probably happen in the next month or two.

Then there will be no more! Oh woe is us!

Or not. There are a bunch of ways that we can measure IP address space usage. They include:

  1. The number of address available. Formally, this is 2^32 minus the 588,514,560 addresses (or just over 35 /8 blocks)that  are assigned for special uses (multicast, reserved, private addressing etc), leaving 3,706,452,736 addresses (or the equivalent of just over  220.9 /8 blocks) available for present or future end-user assignment.
  2. The amount of addresses assigned by IANA to RIRs for allocation. Currently, this stands at pretty much all of the above space, less the aforementioned seven /8s (or 117,440,512 addresses).
  3. The amount of address space allocated by RIRs. According to Geoff Huston, this is likely, at current rates of assignment, to run out in mid-late 2011.
  4. The amount of address space that is actually advertised. Right now, a little under 2/3rds of the allocatable address space (that is, excluding private, multicast and reserved address space) is actually advertised to the global routing table. That’s right, 1/3rd of the IP address space is unequivocally dark.
  5. The amount of address space actually allocated to infrastructure. Now things get murky. Is a /8 advertisement actually representing a /8 worth of allocation? Or is the holder of that /8 advertising it simply because they can?
  6. The amount of address space actually in use. This too is largely unmeasurable. Many advertisements, especially smaller ones. are to achieve multihoming, in which case a /24 may have very small numbers of hosts actually assigned to it. The nature of IP address assignment is that you always have to allocate a larger subnet than you plan to use, unless you can do single IP address per client allocations, e.g. using PPP & friends, my ARP hack or layer-3 VLAN schemes.

Measurements 1 through 4 are easy. 5 & 6 are hard. All we can say for sure is that each measurement will give a smaller number of addresses in use than the one above. If an address appears on the global routing table, we can follow it to its associated autonomous system, but beyond that, we have to look ad individual addresses, and even then an assigned and in-use address my be behind a firewall or something and effectively invisible but none the less actively in play.

It did occur to me to look at reverse map entries, but experience suggests that these are unhelpful, being fairly universally badly managed.

So, the question of when IP address space will run out remains difficult to answer. Geoff’s IPv4 Address Report shows a curve in address advertisements (fig. 11c) which,although initially exponential, seems to have settled to a linear growth of about 176,000,000 addresses per year in actual advertisements since 2006. If that rate is maintained, the 1.3 billion or so unadvertised addresses should run out in about seven years.

But I suspect that as RIR space becomes unavailable, we’ll start to see address space that is currently advertised but not actually in use being re-allocated (read: sold). For starters, there are about 200 million addresses tied up in non-carrier addresses that are currently advertised as /8s. Admittedly, a goodly chunk of that space may actually be in use, but one suspects that a significant proportion isn’t. There are a lot of equally historical /16 assignments and smaller blocks assigned under multihoming policies that are similarly underutilised, and could shed a large proportions of their advertised allocation as their holders discover it’s worth more to them in someone else’s hands than in their own.

So I’m going to lick my finger and stick it in the wind. I think we have ten years or so before we really, genuinely run out of IPv4 addresses, and that ignores the transition to IPv6 completely. In reality, as IPv4 addresses become scarce (read: expensive), we’ll see folks making do with less and looking harder at IPv6 transition, so I doubt we’ll ever actually run out. Sure, there’s a whole bunch of stuff you can’t do without lots of addresses, but those applications will simply have to go to IPv6.

Don’t get me wrong; I’m not suggesting for a moment that we don’t have to worry. The single thing that will prevent exhaustion is money. Scarce resources have value; the more scarcity, the more value. RIRs have some really hard choices ahead of them; they’re going to be in the firing line to manage the emerging market in IPv4 address space. Pretending that organisations don’t “own” their address space will stop being an option; the court cases haven’t started in earnest yet, but unless the RIRs urgently awake from the fantasy that IP address space is not a tradable asset, they will.

Either they will rise to the challenge, or they’ll swept into irrelevancy. I rather hope the latter doesn’t happen, because the alternative is anarchy. The best we can hope for is that enough wiser heads prevail to ensure that the emerging IP address bourses have sufficient support to ensure that the fabric of the Internet isn’t torn apart by the conflict between those who long for a non-commercial Internet where everyone plays nice, and the immediate needs of a market where folks need to get stuff done.

This is a picture of my keyboard:

Yes, it’s grubby. And yes, this keyboard really is old enough to not have Windows keys. Actually, it’s about twice that old. Twenty years ago I needed a new keyboard, so I bought a cheap one. (Back then, $200 or more was cheap for a keyboard.) I’m not really sure what I’m going to do when it expires because I’ve never used a keyboard since that I liked. They don’t make keyboards like this any more, with discrete key switches and a distinct tactile click when the key goes down. (Well, they do, but they’re big heavy IBM keyboards that are so noisy they can be heard three blocks away.)

And yes, that key between the Ctrl and Alt keys is labelled “Any”.

The true irony of this is that this key doesn’t actually do anything. No key-code is generated when you press it, so pressing the “Any” key in response to “Press any key to continue” will result in a distinct lack of continuation.

We all have our favourite tech support stories, the “my cup holder is broken” cases, the “it works better if you plug it in” cases. So I wonder how many of us have actually had someone ask where the Any key was?

Once I got called out to look at a printer that apparently wasn’t working. The data plug was upside down. It was a D-shell plug, and they only go in one way, but there it was.

I turned it over and it worked fine.

I know you don’t believe me. I wouldn’t believe me. But it did happen – the male plug was a wee bit bigger than it should have been, and only had a few pins installed which in turn were a bit loose, and the combination of these faults meant it actually went together and seated tightly.

Many, many moons ago, back in the days of serial terminals and multiplexors, the boss came by, saying, “I just had a call from the Auckland office. They say all their terminals are down.” I muttered something unprintable, and wandered into the comms room.

Looking at the multiplexor, I noted the “RA” light flashing. Remote Alarm, meaning the mux couldn’t see the mux at the other end. Probably a comms fault, hardly the first time. Moving up the rack, the NTU on the data circuit to Auckland indicated that it couldn’t see its partner at the other end.

That could just about explain it.

So I ambled off in the direction of the technicians’ office. Back then the telco stuff was handled by the people who looked after the phones, and that meant the techs. So I told Evans, the head tech of my findings, and he picked up the phone to put through a fault call.

Later that day, I ran into Evans in the corridor. “What’s up with that Auckland circuit?” I asked.

“Oh the fault man went out there. There’s no power.”

“What, to the NTU?”

“Nah, to the building.”

I hate IPv4 link broadcast interface (e.g. Ethernet) addressing semantics.  To recap, if I have two boxes on each end of a point-to-point link (say between a gateway and an end host), we address as follows (for example):

  • 10.1.1.0: Network address (reserved)
  • 10.1.1.1: Host 1 (gateway)
  • 10.1.1.2: Host 2 (end host)
  • 10.1.1.3: Broadcast address.

That’s four IP addresses, for a link to a single host.  Hello?  Haven’t you heard the news?  IP addresses are running out!

Some folks manage to get away with using /31 masks, e.g.

  • 10.1.1.4: Host 1 (gateway)
  • 10.1.1.5: Host 2 (end host)

which is just wrong.  Better in terms of address usage (two addresses instead of four), but still just plain wrong. An you’re still wasting addresses.

The PPP folks a long time ago figured that a session, particularly in client to concentrator type configurations, only needs one IP address. A “point to point” interface has a local address, and a remote address, of which only the remote address needs to be stuffed in the routing table.  The local address can be the address of the concentrator, and doesn’t even need to be in the same subnet.

So why can’t my Ethernet interfaces work the same way?

A point to point link really doesn’t have broadcast semantics.  Apart from stuff like DHCP, you never really need to broadcast — after all, our PPP friends don’t see a need for a “broadcast” address.

Well, we decided we had to do something about this.  The weapon of choice is NetGraph on FreeBSD.  NetGraph basically provides a bunch of kernel modules that can be linked together.  It’s been described as “network Lego”.  I like it because it’s easy to slip new kernel modules into the network stack in a surprising number of places. This isn’t a NetGraph post, so I won’t spend more verbiage on it,but it’s way cool. Google it.

In a real point-to-point interface, both ends of the link know the semantics of the link.  For Ethernet point-to-point addressing, we can still do this (and my code happily supports this configuration), but obviously both ends have to agree to do so. “Normal” clients won’t know what we’re up to, so we have to do this in such a way that we don’t upset their assumptions.

So we cheat. And we lie. And worst of all,we do proxy ARP!

What we do is tell our clients that they are on a /24 network. Their IP address is, for example, 10.1.2.5/24, and the gateway is 10.1.2.1. Any time we get a packet for 10.1.2.5, we’ll send it out that interface, doing ARP as normal to resolve the remote host’s MAC address.

Going the other way, we answer ARP requests for any IP address in 10.1.2.0/24, except 10.1.2.5, with our own MAC address.  That means that if they ARP for 10.1.2.6, we’ll answer the ARP request, which directs that packet to us, where we can use our interior routes to route it correctly.  In our world, two “adjacent” IP addresses could be on opposite sides of the network, or it could be on a different VLAN on the same interface.

The result is one IP address per customer.  We “waste” three addresses per 256, the network (.0), gateway (.1) and broadcast (.255), and we have to be a bit careful about what we do with the .1 address — it could appear on every router that is playing with that /24.  But we can give a user a single IP address, and put it anywhere in the network.

We can actually have multiple IP addresses on the same interface; we do this by having the NetGraph module have a single Ethernet interface but multiple virtual point-to-point interfaces.  So if we want to give someone two IP addresses, we can do that as two, not necessarily adjacent, /32 addresses.  We don’t answer ARPs for any of the assigned addresses, but do answer everything else. The module maintains a mapping of point-to-point interface to associated MAC address.

Seriously.  They don’t like it.  They sulk.

Brendan Gregg of the Sun Microsystems Fishworks engineering team, has written up this effect, with video, at http://blogs.sun.com/brendan/entry/unusual_disk_latency

Moreover, don’t vibrate your drives.  Why an I saying this?

Because, three months ago we took delivery of three 1U pizza boxes. They’re small Supermicro boxes, with room for a normal ATX motherboard and a hard drive.  We equipped these with terabyte drives, fairly normal Supermicro motherboards, 3 GHz Core2 Duo CPUs and 8GB memory each.

They just didn’t run right.  Occasionally, one wouldn’t even make it through an OS install, and the ones that did wouldn’t put through as much work as a much lower spec machine.

We suspected the drives; we suspected the power supply.  Actually, we really thought it was the power supply, but even though the PSUs on these chassis were small, and the 12V rails seems to be running slightly low, at 11.85V, no amount of bashing the numbers suggested that the systems were actually underpowered.

The first breakthrough was running “hdparm -t –direct /dev/sda” on the drive, which showed wildly fluctuating numbers, consistent with the behaviour we were seeing.  So it was something to do with the disk subsystem.

The next breakthrough was when we discovered that if we unplugged the chassis fan (an ugly centrigufal thing) from the motherboard, the problem went away.  The hdparm numbers stabilised at 100MB/s or more.

We saw small changes in power supply volts when we did this, so we were still suspecting the power supply.  I put an ammeter on the fan power line, to see how much power the fan was pulling.  1.2A at full speed.

We played with the fan speed in the BIOS; at its lowest speed, it would pull 0.25A, and the drive would perform well; at the “server” setting, with the server otherwise unloaded, it would pull about 0.6A.  At that rate, it was starting to have an effect on performance.

This was a PSU that was supposed to be able to deliver 18A on the 12V rail, and 260W total.  I really couldn’t see how the 12V would be at the edge when the PSU was pulling less than 100W (measured at the AC feed) and was running three fans and a hard drive and a few minor bits and pieces like the serial port and network interface, all of which should have summed to maybe 5A.  The numbers didn’t add up.

Finally, I had a brainwave.  I removed the fan from the chassis, still running.  The problem went away.  I touched the fan to the drive.  The drive throughput dropped through the floor.

After a few more experiments, the conclusion is that with the fan mounted close to the drive, the vibrations were enough to upset the performance of the drive, consistently.  Two different terabyte drives (one Seagate, one Western Digital) exhibited the same problem.

I duplicated this by applying abnormal vibration to the case of my desktop PC (half terabyte Seagate), and even the grottly little thing I have at home (a Seagate 160GB PATA drive).

Conclusion: all modern drives are subject to potentially serious performance issues when faced with abnormal vibration.  The Supermicro chassis exacerbated the problem  because of the placement of the fan with respect to the drive, and the fact the drive is mounted directly to the chassis.  Also, the placement of cables up against the fan meant that vibrations were being transferred directly through the connectors from the fan; somthing that could be partially alleviated by re-routing the power cable under the fan.

The fact that right angle SATA power connectors are so darned hard to get made this more of an issue than it should have been.

I think a bit of judicious use of closed-cell foam packing, turning the fan speed down, and re-routing cables away from the fan will finally solve the problem.

Hopefully.