Posts tagged ‘hardware’

This is a picture of my keyboard:

Yes, it’s grubby. And yes, this keyboard really is old enough to not have Windows keys. Actually, it’s about twice that old. Twenty years ago I needed a new keyboard, so I bought a cheap one. (Back then, $200 or more was cheap for a keyboard.) I’m not really sure what I’m going to do when it expires because I’ve never used a keyboard since that I liked. They don’t make keyboards like this any more, with discrete key switches and a distinct tactile click when the key goes down. (Well, they do, but they’re big heavy IBM keyboards that are so noisy they can be heard three blocks away.)

And yes, that key between the Ctrl and Alt keys is labelled “Any”.

The true irony of this is that this key doesn’t actually do anything. No key-code is generated when you press it, so pressing the “Any” key in response to “Press any key to continue” will result in a distinct lack of continuation.

We all have our favourite tech support stories, the “my cup holder is broken” cases, the “it works better if you plug it in” cases. So I wonder how many of us have actually had someone ask where the Any key was?

Once I got called out to look at a printer that apparently wasn’t working. The data plug was upside down. It was a D-shell plug, and they only go in one way, but there it was.

I turned it over and it worked fine.

I know you don’t believe me. I wouldn’t believe me. But it did happen – the male plug was a wee bit bigger than it should have been, and only had a few pins installed which in turn were a bit loose, and the combination of these faults meant it actually went together and seated tightly.

Many, many moons ago, back in the days of serial terminals and multiplexors, the boss came by, saying, “I just had a call from the Auckland office. They say all their terminals are down.” I muttered something unprintable, and wandered into the comms room.

Looking at the multiplexor, I noted the “RA” light flashing. Remote Alarm, meaning the mux couldn’t see the mux at the other end. Probably a comms fault, hardly the first time. Moving up the rack, the NTU on the data circuit to Auckland indicated that it couldn’t see its partner at the other end.

That could just about explain it.

So I ambled off in the direction of the technicians’ office. Back then the telco stuff was handled by the people who looked after the phones, and that meant the techs. So I told Evans, the head tech of my findings, and he picked up the phone to put through a fault call.

Later that day, I ran into Evans in the corridor. “What’s up with that Auckland circuit?” I asked.

“Oh the fault man went out there. There’s no power.”

“What, to the NTU?”

“Nah, to the building.”

Seriously.  They don’t like it.  They sulk.

Brendan Gregg of the Sun Microsystems Fishworks engineering team, has written up this effect, with video, at http://blogs.sun.com/brendan/entry/unusual_disk_latency

Moreover, don’t vibrate your drives.  Why an I saying this?

Because, three months ago we took delivery of three 1U pizza boxes. They’re small Supermicro boxes, with room for a normal ATX motherboard and a hard drive.  We equipped these with terabyte drives, fairly normal Supermicro motherboards, 3 GHz Core2 Duo CPUs and 8GB memory each.

They just didn’t run right.  Occasionally, one wouldn’t even make it through an OS install, and the ones that did wouldn’t put through as much work as a much lower spec machine.

We suspected the drives; we suspected the power supply.  Actually, we really thought it was the power supply, but even though the PSUs on these chassis were small, and the 12V rails seems to be running slightly low, at 11.85V, no amount of bashing the numbers suggested that the systems were actually underpowered.

The first breakthrough was running “hdparm -t –direct /dev/sda” on the drive, which showed wildly fluctuating numbers, consistent with the behaviour we were seeing.  So it was something to do with the disk subsystem.

The next breakthrough was when we discovered that if we unplugged the chassis fan (an ugly centrigufal thing) from the motherboard, the problem went away.  The hdparm numbers stabilised at 100MB/s or more.

We saw small changes in power supply volts when we did this, so we were still suspecting the power supply.  I put an ammeter on the fan power line, to see how much power the fan was pulling.  1.2A at full speed.

We played with the fan speed in the BIOS; at its lowest speed, it would pull 0.25A, and the drive would perform well; at the “server” setting, with the server otherwise unloaded, it would pull about 0.6A.  At that rate, it was starting to have an effect on performance.

This was a PSU that was supposed to be able to deliver 18A on the 12V rail, and 260W total.  I really couldn’t see how the 12V would be at the edge when the PSU was pulling less than 100W (measured at the AC feed) and was running three fans and a hard drive and a few minor bits and pieces like the serial port and network interface, all of which should have summed to maybe 5A.  The numbers didn’t add up.

Finally, I had a brainwave.  I removed the fan from the chassis, still running.  The problem went away.  I touched the fan to the drive.  The drive throughput dropped through the floor.

After a few more experiments, the conclusion is that with the fan mounted close to the drive, the vibrations were enough to upset the performance of the drive, consistently.  Two different terabyte drives (one Seagate, one Western Digital) exhibited the same problem.

I duplicated this by applying abnormal vibration to the case of my desktop PC (half terabyte Seagate), and even the grottly little thing I have at home (a Seagate 160GB PATA drive).

Conclusion: all modern drives are subject to potentially serious performance issues when faced with abnormal vibration.  The Supermicro chassis exacerbated the problem  because of the placement of the fan with respect to the drive, and the fact the drive is mounted directly to the chassis.  Also, the placement of cables up against the fan meant that vibrations were being transferred directly through the connectors from the fan; somthing that could be partially alleviated by re-routing the power cable under the fan.

The fact that right angle SATA power connectors are so darned hard to get made this more of an issue than it should have been.

I think a bit of judicious use of closed-cell foam packing, turning the fan speed down, and re-routing cables away from the fan will finally solve the problem.

Hopefully.