SI unit prefixes: a plea for sanity

Computer programmers can skip to the next sub-heading.

Computers work in binary. They measure things in binary numbers internally, at the hardware level. While you are seeing letters and decimal numbers on this page, the computer just sees streams of 1s and 0s. The successive digits in binary (moving left) count 1s, 2s, 4s, 8s, and so on, just like the digits in a regular base 10 number are worth 1, 10, 100, 1000. Powers of 10, powers of 2.

Way back in the mists of primordial computer history—or the 60s, as we also know it—someone decided it would be a good idea to define a kilobyte as 1024 bytes, rather than 1000, because 1024 is the closest power of 2 to 1000. That is, a 10 bit binary number can count from 0 to 1023, just like a 3 digit decimal number counts from 0 to 999.

The problem is, this unusual definition of “kilo” wasn’t given its own symbol at the time; they just used the same “kilo” prefix used for the metric system. Nor was the unusual definition of “kilo” universally used, even in the computer industry. For instance, while the memory of the computer was measured in binary kilobytes, the speed was always measured in decimal kilohertz.

As the years went on, computers got more memory, and got faster, and their network connections got faster. We ended up with more and more ways to store data. And people kept on randomly choosing whether to use base 2 ‘computer’ units, or normal base 10 units, when measuring computer stuff.

Welcome back, programmers!

Right now, CDs are measured in base 2, as per the official CD standard—your 650MB CD contains 650×1024×1024 bytes. On the other hand, DVDs are always measured in base 10 units—your 4.7GB writeable DVD has the normal 4,700,000,000 bytes.

The size of the memory in your computer is always measured in base 2 units (megabytes). However, the bus speed of the memory in your computer is always measured in base 10 units (megahertz).

The file you download has a size, almost certainly reported in base 2 units by your web browser. But, you’ve guessed it, the speed of your modem connection is always measured in base 10 units. Your 1Mbps cable modem gives you 1,000,000 bits per second, not 1,048,576.

The hard disk in a Macintosh? Always specified in base 10 units. If you get a 40GB disk, you get 40,000,000,000 bytes. The disk space reported by Mac OS X? Always binary units. Even Apple aren’t consistent.

Let me be blunt: This is a mess.

There is no logic to it. There is no consistency to it. You can’t work out whether a given measurement is base 10 or base 2, you just have to magically know—or guess, and hope that if you’re wrong the difference isn’t too important.

The solution

There is a solution to this problem. The IEC has a set of official binary prefixes. When you want to refer to something in base 2 units, you can use the appropriate binary prefix instead of using the closest base 10 metric prefix, and your meaning will be crystal clear. (That still leaves the problem of what to do if you’re measuring one of the many computer-related things that are measured in base 10, but if we get everyone using binary prefixes it won’t be a problem any more, will it?

And that brings me to the thing I actually want to write about: knee-jerk conservatism.

It turns out that there are a lot of computer programmers who really get pissed off by the idea of having to write MiB for base-2 megabytes. “Megabytes have always been base 2, and always been written as MB”, they sneer. “Everyone knows that 1MB is 1024KB, unless you’re talking about DVDs, or reading manufacturer specs for a hard drive, and that’s just the hard drive manufacturers being stupid. Everyone knows that ‘K’ on a computer means 1024; except for speeds, where it means 1000, except for file download speeds where it means 1024, except when it’s the speed of your modem, when it’s 1000. Everyone knows that. What, are you stupid?”

I find it quite fascinating, really. Engineers generally pride themselves on consistency and clarity, yet when it comes to being consistent and clear in their use of measurements, well, you’d think you were asking them to drink decaf or something.

Change which makes things easier, more consistent, and less ambiguous is good change. It should be embraced. Clinging to confusing and inconsistent ways of working, just because it’s what you’re used to, doesn’t make you look superior—it makes you look like an ass. You’re not clinging to consistency with the past, because the past usage was not consistent. The computer industry has never been consistent in its use of units, it’s not being consistent now—but it’s time for it to start. And there’s only one way to do that.

If you measure in base 2 units, report in base 2 units using the base 2 prefixes.

If you measure in base 10 units, report in base 10 units using the base 10 prefixes.

This is not a big plot to make you report disk sizes in base 10 if you don’t want to. Go on measuring your hard disk in GiB and whining about the hard drive manufacturer conspiracy to defraud you, if you want; I don’t care. I just want you and your software to be clear, correct and unambiguous when you provide information to me. Leaving me to guess the meaning of ‘K’ and ‘G’ based on context is not good enough. It is not unambiguous; see above.

Now, get with the program already. All of you. Tell your friends. If anyone whines, point them at this article. And someone get Steve Jobs to lay down the law at Apple, their continuing inconsistency is really annoying me…

6 thoughts on “SI unit prefixes: a plea for sanity

  1. Let’s just admit it. The SI names are just too embarrassing to use. Let’s start a campain to abandon the current standard. Change to something obvious, simple and less embarrasing like:

    1bMB, 12bPb
    pronounced
    one binary megabyte, twelve binary petabits

Comments are closed.