SSDs, or everything old is new again

Some information about SSDs that a lot of people were apparently unaware of:

High temperatures can make SSDs run slower.
The log-structured I/O SSDs use is really poor for databases, which add their own layer of log-structured I/O.
If you put multiple VMs on a single SSD, you can get unpredictable read latencies.
The more you write to an SSD, the quicker it will fail.

Item 1 was actually news to me. Item 2 is something that ought to occur to you if you know how databases handle ACID updates and how SSDs handle writes, but obviously there are an awful lot of people who use both technologies with no idea of the underlying details.

Item 3 is an interesting one. The fundamental problem is related to item 2: the operating system basically has no way to know how the SSD is operating internally, and can’t make smart decisions to optimize performance to ensure low latency operations, even if they’re just reads.

Something similar happened with hard disks a few decades ago. In the 60s, 70s and 80s, the operating system generally handled the translation of a flat logical block address into individual values for cylinder, head and sector (CHS). In addition, the OS generally knew the cluster size of the underlying hardware — that is, the minimum number of sectors the hardware would read in a single operation.

So a request to read sector 48321 of the disk might translate into a request to read cylinder 3, head 2, sector 1 — and might actually result in sectors 0 thru 3 being read, and sector 1 passed back.

Because the OS knew about the disk geometry, it could arrange files and data structures in order to speed up access. Defragmenting, which most Windows users are somewhat familiar with, was only part of this.

Suppose you’re reading a large database file which is unfragmented. You can slurp up the first (say) 63 sectors by lowering the head to the disk and reading all the way around a single circular track. Now you need the next sector, so you instruct the head to move in to the next track. But while the head moves sideways, the disk is still spinning, so by the time the head is in place to read the next track it might be over (say) sector 11. So if you map sectors in the naïve way, you’ll have to wait for almost an entire revolution of the disk before sector 0 is under the head again. So ideally, you’ll skew adjacent tracks by enough sectors to give the head time to move to the next track.

Oh, but often hard disks would spin so fast that it wouldn’t be possible to shunt a sector of data across the bus to the memory quickly enough before the next sector on the track turned up under the head. So when I said you could slurp 63 sectors in a single operation — well, maybe you could, but not necessarily in that order. Often sectors would be interleaved on the track, 0 32 1 33 2 34 3 35 and so on. You’d read sector 0, wait for it to clear the bus, read sector 1, wait for it to clear the bus, and so on, reading the entire track in a single operation — but needing two disk revolutions.

There are other complexities. Maybe the drive has 4 heads. In that case, you might want to put the first track of data on head 0, the second on head 1, and so on, and then arrange it so that by the time you’ve read a track from head 3, head 0 will have moved across to the next track, and that track will be skewed enough that the next sector will be coming up. To do this, you need to know how long it takes to switch heads, because that takes time too.

Basically, every change of raw CHS address had a particular set of latency characteristics, whether you were switching head, switching sector, or switching cylinder. They weren’t simple fixed latencies, either. Cluster sizes could be different, and even sector sizes could be different. Optimizing it all was a dark art.

An aside: In the early days of MS-DOS hard disk support, there was a fixed limit on the number of sectors DOS could handle. But, people wanted bigger hard disks! Some manufacturers came up with a brilliant idea: They’d make their sectors twice as big, 1024 bytes rather than 512. They’d ship a special patched version of MS-DOS that knew about big sectors, and they’d be able to sell hard drives twice as big as their competitors could! Everyone would be happy, right up until Microsoft released a new version of MS-DOS, and the customer installed normal Microsoft MS-DOS on his computer, and MS-DOS treated every sector on his hard disk as half its actual size and scrambled all his files. That’s how I learned the intricacies of this stuff — working in data recovery…

Eventually people observed that tracks at the outside of a disk could easily squeeze in more sectors than the smaller tracks in the middle. Drives started to be built which had a variable number of sectors per track. This was a bit of a problem for CHS-based addressing, and often the drive would pretend to have a fixed number of sectors per track for simplicity.

Then in the mid 80s, SCSI started to become popular. With a SCSI hard disk, your computer simply asked for logical sector 48321, sending a request down the SCSI bus. The hard drive grabbed the data and sent it back. All of the messy details of cylinders, heads and sectors were hidden from the operating system. That meant the OS couldn’t use that information to optimize its requests. However, it also meant that you didn’t need to spend hours optimizing your hard disk’s addressing algorithm in order to get good performance. SCSI controllers started to include big chunks of RAM cache, and to do things like reading ahead a few sectors and caching the results on the off chance that you might ask for one of those sectors next. In addition, SCSI had a killer advantage: it didn’t require CPU time. But your disk performance was greatly dependent on the quality of the disk’s SCSI interface and its firmware. My first SCSI hard drive was generally great, but it was cheap (I was a student) and its latency was all over the place.

So that’s the point we’re at with SSDs. Some of them have a RAM cache, some don’t. Some have really good firmware, others not so much. If you want predictable performance and reliability, you need to be very careful who you buy SSDs from, and bear their long term characteristics in mind. For example, don’t skimp on RAM for OS-level caches, don’t swap to SSD, and make sure the machine is properly cooled.