You’re trying to access AWS S3 using the CLI utility, via (say) aws s3 ls
. You’re supplying credentials via AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
. You get an error:
An error occurred (ExpiredToken) when calling the ListBuckets operation:
The provided token has expired.
The solution:
You need to set AWS_SESSION_TOKEN
to an empty string. Otherwise the AWS CLI grabs a session token from who knows where and tries to use that.
This problem also seems to affect s3cmd
, presumably because it uses Amazon’s AWS libraries.
However, if I’m going to learn a new language at this point, it has to be a better fit for me than the ones I’m already using and the ones I’ve already tried and abandoned. No language is perfect, and I could criticize even my favorites, but some are a better match to my needs and desires than others.
So to save time, here’s what I’m looking for in any new programming language, with examples to try to clarify what I mean.
Unix family, specifically including macOS and Linux.
AMD64, ARM and RISC-V.
I don’t care about Windows.
Open source with a free software license, so it can’t suddenly be destroyed by a malicious corporation.
I have a hard time remembering syntax, so I want something that’s simple. Ideally it should also be clean and easy to read, with a minimum of sigils and operators.
Good: Scheme, Go, Ruby
Adequate: Java, Swift
Horrible: C++, Perl, Rust
I want the standard library to have good coverage of POSIX APIs. Again, I don’t care about Windows. The standard library should have support for TCP/IP connections, UDP, Unix domain sockets, Unix permissions, standard cryptographic algorithms, JSON and XML and base64, Unicode, and so on, or else it’s just a toy language.
I also want a standard package manager. Just one, maybe two tops.
Good: Ruby, Go
Adequate: Java
Horrible: Scheme, JavaScript
Since I’m not going to be able to find everything for this new language, I need to have the option of occasionally using code written in something else. That means I want some reasonable FFI option.
Good: C, Zig, Kotlin.
Adequate: Go.
Horrible: Java.
These are in approximate descending importance:
Basic types: unsigned bytes, 32 and 64 bit unsigned integers, signed integers, strings, variable-length arrays, hashes/maps.
Decimal floating point numbers (decimal64 and decimal128).
UTF-8 support throughout the entire language.
Static typing with compile-time type checking.
Lexical scoping.
Object-orientation through composition.
Automatic memory management, either through a fast garbage collector or through some sort of unintrusive and robust reference-counting system. (Examples: Go, Rust.)
First-class functions.
Functions that can return tuples. (Examples: Ruby, JavaScript, Go.)
Fast compile times. (Example: Go. Horrible: C++.)
Trivially easy cross-compilation. And by trivially easy, I mean “set one option on the compile command and you’re done” easy. (Example: Go.)
More advanced types: arbitrary precision integers, arbitrary precision decimal floating point numbers, rationals.
Immutability by default. (Examples: Scala, Haskell.)
Tail-call elimination.
Function overloading.
Parametric polymorphism.
Facilities to make safe concurrency easier, such as CSP structures. (Examples: Go, Kotlin.)
Small binaries and a lightweight runtime. (Examples: Zig, C.)
A good native GUI option for macOS and Linux.
One defined standard set of formatting conventions, and a formatter to implement them. (Example: Go.)
Things I don’t particularly want or care about but I’m OK with them:
OO class inheritance.
Ubiquitous type inference. (I prefer to state my types, thanks.)
Multiple implementations.
Option types.
Binary floating point numbers.
Things I don’t want, but I’m willing to put up with them:
Pointers.
Null (used to represent an absent value, like in SQL).
Nil (a zero value, like in Go).
There are some things I’m just not willing to put up with:
Semantic whitespace. (Examples: Python, YAML.)
A de facto requirement for containers or virtual environments. (Examples: Python, JavaScript.)
A required special IDE. (Examples: Smalltalk.)
To handle this, the binary needs to look at its runtime metadata for information about what we expect it to do. While this is really easy on Cloud Foundry platforms — just look at environment variables — it turns out to be significantly more involved on Amazon Web Services (AWS) when running in Elastic Compute Cloud (EC2).
The first problem is picking an official SDK. There’s the official SDK for Go, which at the time of writing is at v1.35.26; but there’s also the developer preview of the v2 SDK, which is confusingly tagged as v0.29.0. The v0.29.0 SDK “v2” is more recent than the v1.35.26 SDK, and at first glance has more functionality. Unfortunately, although it was released back in 2017, they’re still making breaking changes in 2020. Documentation is missing or incorrect, examples don’t even compile. With that in mind, I decided to stick with the old SDK until things settle down. So, here’s how to get the tags for the running EC2 instance, using only the original AWS Go SDK.
I’ll start off by listing the libraries I used:
import (
"errors"
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/ec2metadata"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/ec2"
)
The first step is to find out the region and instance ID of the running instance. It’s obvious enough why the instance ID is needed, but you might be wondering why the region. It turns out that the EC2 APIs don’t default to the current region; instead, they demand that you specify the current region explicitly for any operation, or else you get MissingRegion: could not find region configuration
returned as an error.
The code actually works by connecting to the instance metadata service, which is available at http://169.254.169.254/ inside the instance. In the shell, you might query it via curl
:
curl http://169.254.169.254/latest/meta-data/instance-id
So I could have used Go’s HTTP client, but I wanted to do things the official way. The official Go equivalent of the above curl
command looks like this:
sess := session.New()
svc := ec2metadata.New(sess)
if !svc.Available() {
panic("ec2metadata service not available")
}
instance, err := svc.GetMetadata("instance-id")
if err != nil {
panic(fmt.Errorf("can't get instance ID: %w", err))
}
The SDK uses an aws/client
to make an HTTP call to get the info. The ec2metadata.GetMetadata
call takes a string argument. It’s documented to be a path, but as you can see it’s not simply the entire path you would use with a curl command. Fortunately the SDK is open source, and delving into the code reveals that it’s the part of the service path that comes after /latest/meta-data/
Until quite recently, many people looked up the region by looking up the Availability Zone (AZ) and mapping that back to the region. That usually meant removing the last character — for example, AZ us-east-1c
is region us-east-1
— but of course, there was no guarantee that rule would continue to hold true, so I ruled that approach out.
Fortunately, a few months ago someone at Amazon got the developers to add an API call to the ec2metadata
service to fetch the region. It doesn’t look like there’s any documentation of it yet, but you can fetch the Region as http://169.254.169.254/latest/meta-data/placement/region
, or by passing the path placement/region
to ec2metadata.GetMetadata exactly as for instance-id
above. So now we can write a utility function to get the region and instance ID:
func getRegionAndInstance() (string, string, error) {
sess := session.New()
svc := ec2metadata.New(sess)
if !svc.Available() {
return "","", errors.New("ec2metadata service not available")
}
instance, err := svc.GetMetadata("instance-id")
if err != nil {
return "", "", fmt.Errorf("can't get instance ID: %w", err)
}
region, err := svc.GetMetadata("placement/region")
if err != nil {
return "", "", fmt.Errorf("can't get region: %w", err)
}
return region, instance, nil
}
But there’s another way to do it. If you fetch the entire Instance Identity Document using ec2metadata.GetInstanceIdentityDocument
, it contains both the instance ID and the region:
func getRegionAndInstance2() (string, string, error) {
sess := session.New()
svc := ec2metadata.New(sess)
if !svc.Available() {
return "","", errors.New("ec2metadata service not available")
}
idd, err := svc.GetInstanceIdentityDocument()
if err != nil {
return "", "", fmt.Errorf("can't get instance identity document: %w", err)
}
return idd.Region, idd.InstanceID, nil
}
Is that better? On the plus side, the SDK only has to make a single HTTP request; on the minus side, it has to fetch and deserialize an entire JSON object into a struct, most of which we don’t need. For my usage, running it once when our web application starts up, it really doesn’t matter which we use, both are fast.
Whichever you choose, you should now have the region and instance ID, so it’s time to move on to the difficult part, fetching the tags as a map of strings.
The first trick is that although tags are metadata, you can’t get them using the ec2metadata
service; you need to go in through the main ec2
API instead, setting up the region first as mentioned above:
sess, err := session.NewSession(&aws.Config{
Region: aws.String(region)},
)
svc := ec2.New(sess)
(Thanks to Claes Mogren for pointing me in the right direction.)
The next problem is that there isn’t a GetInstanceTags
method. Instead, there’s a general purpose DescribeTags
method to fetch information about any resource of any type, and you have to set up filters to restrict its output to the actual object you want. Those filters are specified as a field of a DescribeTagsInput
object. After examining some examples, I arrived at the following code:
input := &ec2.DescribeTagsInput{
Filters: []*ec2.Filter{
{
Name: aws.String("resource-type"),
Values: []*string{aws.String("instance")},
},
{
Name: aws.String("resource-id"),
Values: []*string{aws.String(instance)},
},
},
}
All this complexity is so that you can have filters on any property of any object, using single or multiple values, or presumably even null (nil
) values. Not really the Go way; it reminds me of LDAP or JNDI. But, that’s the way the v1 Go SDK needs it done.
Given that the input was a DescribeTagsInput
object, you might have already guessed that what you get back from DescribeTags
is a DescribeTagsOutput
object, which also implements an overly general system of structs and pointers so that keys and values, and even the entire list of tags returned, can be allowed to be nil
. So the final task is to run through that, and turn it into an idiomatic Go map[string]string
of the non-nil values, so we can easily query or range over our tags:
tags := make(map[string]string)
result, err := svc.DescribeTags(input)
if err != nil {
return tags, fmt.Errorf("ec2.DescribeTags failed: %w", err)
}
if result.Tags == nil {
return tags, nil
}
for _, t := range result.Tags {
if t != nil {
k := t.Key
v := t.Value
if k != nil && v != nil {
tags[*k] = *v
}
}
}
Of course, if you need to support nil
tag names and/or nil
tag values, this won’t work for you, and you’ll have to deal with the []*TagDescription
in all its glory.
The final code, with both options for fetching the region and instance ID, is in a gist. I hope it saves you from the frustration I went through trying to find this stuff out.
]]>scp
has been deprecated. If you’ve never used scp
, it basically tries to be a version of cp
that works between computers, via an SSH connection. I haven’t used it in a long, long time, because I use rsync
instead. While there will eventually be something that works like scp
but using the SFTP protocol, I’d like to suggest that the far superior option is to learn rsync
and switch to that, as I did a long time ago.
Here are some key benefits of rsync
:
scp
.However, the rsync
manual page is extremely long and has a ridiculous number of options listed, and I think a lot of people look at it and decide they don’t have time for all that. So, here’s my summary of everything you need to know to handle 99% of day-to-day file transfer tasks using rsync
.
Here’s an example incantation:
rsync -a path/to/file example.com:remote/path/to/file
The -a
option is a shortcut for a bunch of individual options that you almost certainly want to ensure that the remote file ends up the same as the local file in terms of permissions, timestamps and ownership.
You might guess that you can add a -v
flag to get verbose output, and you’d be right. Almost all my usage of rsync
starts with rsync -av
. (You can also do rsync -a -v
of course.)
Either path can start with /
if you want to specify an absolute path from the root directory. Otherwise, the local path starts from the current directory, and the remote path starts from your SSH login directory, as you’d expect:
rsync -a /home/steve/log.txt example.com:/tmp/steve-log.txt
Exactly the same, but with the source and destination in the opposite order:
rsync -av example.com:Music/startup.mp3 ~/Music/start-me-up.mp3
This general principle works for all these examples, you can flip source and destination to move data the other way.
Now comes the part that trips people up. The rsync
program will also work on directories, but to get the results you probably expect, you’ll want to remember to put a trailing /
on any path that points at a directory. For example:
rsync -av *.jpeg somemachine.local:/tmp/
Going the other way, you’ll need to use some quoting so that your shell doesn’t try to expand the wildcard:
rsync -avs "remote.example.com:Pictures/*.jpg" Pictures/
This is why I love rsync
. I can do this:
rsync -av Software/ newbox.local:Software/
Everything from the source directory will be transferred to the destination directory. Note the destination slash symbols on both directory paths. (The fish shell autocompletes the directories with the trailing slashes, by the way.)
If there are already files in the destination which aren’t in the source, they’ll be left alone. If you don’t want this, and want the destination to end up exactly like the source, the --delete
flag will do the trick:
rsync -av --delete Software/ newbox.local:Software/
After this, the destination will hold exactly the same contents as the source.
You might wonder how to handle it if you need to specify a username for the SSH connection. The answer is you can specify the SSH command to use via -e
. For example:
rsync -av -e 'ssh -l myusername' local/*.jpg remote.example.com:Photos/
If you use that remote system regularly, though, it’s probably easier to create an entry in your ~/.ssh/config
file so you don’t need to do that. You can also specify a port and IP address override the same way:
Host remote.example.com
HostName 10.34.1.40
User myusername
Port 663
If you know that rsync
is going to need to transfer entire files because they’re totally different from what’s in the destination, and if the computers and disks involved in the transfer are slow, you can save time by telling it that. The -W
option (or --whole-file
) says to copy whole files rather than using binary diffing to find the changes.
For the opposite situation, where the network is slow and the computers are fast, the -z
option will compress all the data in transit and decompress it again at the far end.
By default, rsync
uses a temporary file on the destination system, and then renames it into place when it’s fully transferred. This is to try to prevent other programs from seeing incomplete files, but if you don’t need that feature you can tell rsync
to just go ahead and use the final filename rather than a temporary file, using the --inplace
flag.
If the destination path has spaces in, the shell on the destination system can mess up the transfer. To avoid this, use the -s
option to bypass the shell on the destination system. (This feature is missing from the default version of rsync
supplied with macOS.)
If you add the --dry-run
flag (-n
), rsync
won’t actually transfer or delete anything, it’ll just tell you what it would do.
If you don’t need all the details from -v
but do want to know what it did, the -i
or --itemize-changes
option will make it print a concise list of updates.
If you omit any hostname from the rsync
command, no SSH connection is used. Instead, rsync
works as a local copy command. This means you can use it for directory synchronization between local disks, for example making backups:
rsync -av ~/Photos/ /mnt/backup/Photos/
You can also use it with --dry-run
to compare two directories and display which files are different.
As I said at the start, rsync
has a large number of options. Once you’ve mastered the basics, you’ll probably find that for any weird file transfer problem you have, there’s an rsync
option that can help.
For example, one time I had a bunch of files on an MS-DOS FAT filesystem on a very slow USB device. I wanted to copy on some more and merge them into place, but I really wanted to avoid unnecessary data transfer. The solution involved telling rsync to trust that files were identical if their timestamps and sizes were identical, and to compare timestamps with 2 second resolution to allow for lack of accuracy on FAT filesystems.
]]>Mention Notes online and you’ll usually encounter derision, particularly from people who think it’s an e-mail program. It was all too often sold as one, but that was really something it could incidentally handle, not what it was designed for.
To understand what Notes is, and why it is the way it is, you have to know that it was inspired by PLATO Notes in 1973, and development started in 1984. It predates widespread e-mail, the web, home Internet service providers, and every application people disparagingly compare it to.
At the bottom, it’s a database engine which supports any number of distributed document-based key-value databases. Nowadays we’d call it NoSQL, and compare it to something like CouchDB or RocksDB — but Notes shipped in 1988! Each base level set of keys and values in the database is called a note, hence the name of the product.
On top of the database engine is a graphical form designer. It works like a word processor mail merge, giving you a rich text document that you can drop fields into. Each field can have one of a range of basic data types, including string, number, date, and user ID. Fields can also allow multiple values to be entered. When you save using a form, the data is stored in a note, along with information saying which form should be used to display or edit it, when it was saved, who saved it, and so on.
Once you have some documents, you can create one or more views. A view is basically a programmatic “smart group” that fishes a defined set of documents out of the database, and displays values from their key/value fields in table format.
The original Notes offered a spreadsheet-like formula language for scripting. Later on, when Visual BASIC was the new hotness, a VB-like language was added that was more capable. Later still, partial JavaScript support was added.
As well as the data, all of the application design — the forms, views, scripts, and everything else — was itself stored in the database as notes. So Notes was that it handled all kinds of things for you with very little effort:
Combine this with the ease of creating forms and views, and with a bit of practice you could put together applications for secure distributed multi-user collaboration in an afternoon, and then iterate on it rapidly. The only thing I’ve seen that approaches the development velocity is Ruby on Rails. (And Rails also gets its fair share of robust criticism.)
Of course, Notes applications would rarely have a polished, professional user interface, because they were usually developed in house by large corporations, without much attention paid to design. So like Visual BASIC before it, Notes as a whole got a bad reputation.
There were also some product decisions that were just mistakes. To pick just one example, “folders” in Notes don’t work like real physical folders. When they were implemented, the developers apparently thought it would be more flexible if a single object could be part of any number of folders, rather than limiting it to one and having to copy things to put them in more than one folder. That is indeed great if you’re an information management nerd or a computer scientist, and can understand why deleting something from one folder would make it disappear from every other folder it’s in. Unfortunately ordinary people don’t expect desktop folders or e-mail folders to work like that.
In the mid 90s when the web started to become important, Lotus added a component called Domino that would take your Notes applications and (mostly) manage to serve them up as web sites, as long as they didn’t try to do anything too complicated. Over time this became a core feature of the Notes server, and the entire server became known as Domino.
Under IBM’s ownership, major changes continued. By 2007, basically the entire product had been rebuilt from the bottom up using Java and the Eclipse framework. With that done, it was possible to use Java code in your Domino web applications. In 2008, a framework called XPages was built on top of JSF, and that became the easiest way to produce cutting edge web applications using Domino.
You could build servlets and run them in Domino. You could use JDBC. You could call Java libraries bundled in your application database as jar files. You could script dynamic web form behavior in JavaScript.
At some point, though, IBM’s interest in the product seemed to wane. Notes and Domino 9 were released in 2013, but in 2018 the latest release was still 9.0.1 (fix/feature pack 10). The Domino engine wasn’t improved much, and Notes applications served up as web pages were a mess of nested tables. XPages also languished, as the rest of the Java world moved to JSF 2. The JavaScript engine remained basically the same as at the turn of the century. Overall, it became more and more painful to develop modern web sites with Domino as a foundation. So on the whole, I’m glad to be saying goodbye to Domino.
Now that HCL are developing the product, major updates are happening again. For example, there’s a new client to make it fast and easy to deploy a Notes application to iPads. There have been some tough decisions, though; XPages is probably a dead end, as the work to update it to JSF 2 would be prohibitive, so instead the new thinking is to develop your web application with something else (NodeJS is often suggested) and use Domino as a simple NoSQL database. That’s all very well, but there are a lot of other NoSQL database options, not to mention SQL databases, and if you’re not going to use the Notes and Domino features, why invest in the platform? It’s probably a good strategy for hanging on to existing customers, but it doesn’t seem like a good way to grow the customer base.
Will Notes and Domino remain viable platforms for groupware development? I don’t know. What I do know is that it’s no longer something I personally have to worry about.
]]>http.HandlerFunc
interface is set by
the Go standard library:
type HandlerFunc func(ResponseWriter, *Request)
There’s no obvious way to add more arguments, so what can we do?
I’m going to run through three different ways to solve the problem: the object-oriented approach, a more functional approach, and a third variant using an interface and an adapter function.
First, though, let’s run through a simple example. Check out global.go, the code for a simple POST/GET/PUT Web Service for short messages.
Note that since this is an example, it’s written to be as short and straightforward as possible, rather than robust or elegant. The error handling leaves a lot to be desired, there’s no MIME type checking, and in a real web service you’d probably abstract out things like decoding the request.
The main()
function is easy enough:
func main() {
var err error
db, err = sql.Open("pgx",
"postgres://localhost:5432/example?sslmode=disable")
if err != nil {
panic(err)
}
router := makeRouter()
err = http.ListenAndServe(":80", router)
log.Fatal(err)
}
In this example I’m using the excellent Chi router to make decoding easier, and PostgreSQL as the database, but the same principles apply for any router or database. I’m also using the standard library’s SQL API to avoid any confusing Postgres-specific features.
The makeRouter
function is also easy enough:
func makeRouter() http.Handler {
r := chi.NewRouter()
r.Get("/msg/{id}", getMessage)
r.Put("/msg/{id}", putMessage)
r.Post("/msg", postMessage)
return r
}
The individual handler functions then all follow the same pattern:
func getMessage(w http.ResponseWriter, r *http.Request) {
id, err := strconv.Atoi(chi.URLParam(r, "id"))
// Do stuff with database and return HTTP response
// ...
}
There’s just one problem with this code, and it’s right at the top of the file:
var db *sql.DB
It’s a global variable! Eww!
So, how do we get rid of the global variable, but still keep the code working?
I’m going to start off with the object-oriented answer to the problem, because I think it’s the simplest to understand.
First of all, we create an object (struct) type which we will use to hold our application-wide state:
type Application struct {
db *sql.DB
}
We then change our handlers to be methods on the Application
object:
func (app *Application) getMessage(w http.ResponseWriter, r *http.Request) {
id, err := strconv.Atoi(chi.URLParam(r, "id"))
// ...
row := app.db.QueryRowContext(r.Context(), ... )
// ...
}
Because the handler is a method, it can now access the sql.DB
as app.db
.
The final changes are to remove the global, make main
pass an Application
to makeRouter
, and have makeRouter
call the methods on the Application
it is passed:
func main() {
db, err := sql.Open("pgx", "postgres://localhost:5432/meta?sslmode=disable")
if err != nil {
panic(err)
}
app := &Application{
db: db,
}
router := makeRouter(app)
err = http.ListenAndServe(":80", router)
log.Fatal(err)
}
func makeRouter(app *Application) http.Handler {
r := chi.NewRouter()
r.Get("/msg/{id}", app.getMessage)
r.Put("/msg/{id}", app.putMessage)
r.Post("/msg", app.postMessage)
return r
}
You can see the full code in object.go.
The advantage of this approach is that it makes for clean and uncomplicated code. Your application’s handler functions might need a database pool, a logger, some templates, configuration information read from a file, and all kinds of other runtime state, but the router code can remain simple as it’s all encapsulated in the method receiver.
The downside, of course, is that the Application
object can become a dumping ground, and there’s no way to
tell from the method signature which fields of the object any given handler might need. That, in turn, can make
it more problematic to write unit tests, and couple them to the implementation — do you set up a complete Application
with all its fields for every test, or just set up the fields you’ve worked out that the handler needs from reading
through all of its code?
Another pitfall is that you might also be tempted to code the database-opening logic into the object, perhaps via a
NewApplication
constructor. I recommend not doing that – it’s better to do dependency injection, so the Application
doesn’t have to know or care what kind of database it’s passing around; a principle known as
separation of
concerns. That way you can easily pass in a mock sql.DB
for your unit tests, or have the application support
multiple types of database chosen at run time.
I also suggest that you don’t try to make the object a singleton. There’s a good chance you’ll want to be able to construct multiple instances of it when writing unit tests; maybe not for database access, but for other pieces of application-level state.
Now let’s look at a completely different way to tackle the problem.
This time we’re going to use partial application and a closure to store the application-level state. Or in less fancy terms, we’re going to write our handlers by writing functions which take some state and return handlers which operate on that state.
So for each handler, we’ll transform from something like this:
func getMessage(w http.ResponseWriter, r *http.Request) {
// Have to use a global if you want db access here
}
to something like this:
func getMessage(db *sql.DB) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Inside here we can still access the variable db from the outer function!
}
}
What we’re doing above is taking a function that really needs three arguments, and manually turning it into a function which applies the first argument and returns a function which will apply the remaining arguments.
Once we’ve done that, in the makeRouter
function, we’ll swap this:
r.Get("/msg/{id}", getMessage)
for this:
r.Get("/msg/{id}", getMessage(db))
The db variable can just be passed in from main
as a regular argument to makeRouter
.
The end result of the refactoring is in function.go.
The advantage of this approach is that everything is explicit. If your wrapped handler only needs a few pieces of application state, the wrapper function can skip the rest. This is good for unit testing, and also for refactoring — if I refactor a wrapped handler and it no longer use one of the state variables, my IDE will tell me I have an unused function argument in the outer function and I can remove it.
The downside of this approach is the same: that everything is explicit. If you have eight pieces of application-level runtime state and your handler needs all of them, you’ll have to pass them all as arguments.
You might also dislike the fact that every handler is now wrapped in an extra function, and be wondering if there isn’t a way to avoid that extra code. Which brings me to the third approach..
Imagine for a moment how easy this would all be if the http.HandlerFunc
type included a sql.DB
parameter. Well, we
can make that happen, or at least something like it. We start by defining our own extended version of the HandlerFunc
type, and giving it any extra arguments we want, in this case a sql.DB
:
type ExHandlerFunc func(db *sql.DB, w http.ResponseWriter, r *http.Request)
Now we can write each handler as an ExHandlerFunc
instead of a HandlerFunc
:
func getMessage(db *sql.DB, w http.ResponseWriter, r *http.Request) {
// Code as before
}
The only problem is that our router wants a HandlerFunc
for each route. We solve that problem by writing an adapter:
func withDB(db *sql.DB, xhandler ExHandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
xhandler(db, w, r)
}
}
The adapter takes a database pool and an extended handler, and returns a regular handler which calls the extended handler
with the database pool. The reason for the name is more obvious when we update the makeRouter
code:
func makeRouter(db *sql.DB) http.Handler {
r := chi.NewRouter()
r.Get("/msg/{id}", withDB(db, getMessage))
r.Put("/msg/{id}", withDB(db, putMessage))
r.Post("/msg", withDB(db, postMessage))
return r
}
The finished code is in interface.go.
This is really a variation on the function wrapper technique, we’re just generalizing the wrapper so that the handler implementation doesn’t have to have it visible.
The advantage of this technique, as with the previous one, is that in the (extended) handler code, all the state is
explicitly supplied as arguments. When it comes to unit testing, there’s no Application
object to create. There’s a
little more complexity in the router building code, but not much.
The downside is that this technique won’t work so well if your handlers need widely varying selections of application state. You’ll either end up with lots of arguments you don’t always use, or you’ll have to define multiple extended handlerfunc interfaces with different combinations of state variables as argument. Still, it’s a useful technique to be aware of for some situations.
So, three different ways to solve the same problem. Which is best? As is usual in software engineering, it depends — there are advantages and disadvantages to each approach, and you’ll have to consider the tradeoffs and decide which is best for your specific application.
]]>I’m not a COBOL programmer and I don’t work in the mainframe division, but I’d like to point out a few things about this story.
First of all, assuming they really are using a 40 year old mainframe, they could start by upgrading the hardware. In 2002 an entry level z800 ran two cores, each at 625MHz and started out at 80 MIPS (according to Technical Introduction: IBM eserver zSeries 800.) That’s only 20 years ago, and according to the spec sheet it was generally faster than the OS/390 systems that preceded it, so let’s assume their 40 year old mainframe can’t be any faster than that.
In comparison, a modern z14 mainframe (circa 2018) can run up to 170 processor cores, at 5.2GHz, and exceed 146,000 MIPS (again according to the technical guide).
Looking at I/O, the z800 had six 1GB/s STI links for all I/O. If I’m reading things correctly, the z14 can run 160 FICON Express16s cards each at 16GB/s. So that’s a leap from 6GB/s to 2560GB/s.
So sure, claims are up by 1,600%. But upgrading hardware could give them a machine thats 182,500% faster, if they really need it.
That’s not all, though. Compiler technology has also improved a lot in the last 40 years; or more specifically, optimizer technology has. You can get a standalone optimizer for your existing legacy COBOL binaries that will reoptimize the code, giving you an average 47% CPU load reduction.
I think the real story is revealed in CNBC’s coverage:
New Jersey plans to ask for volunteers with a variety of skills, including technologists
Key word: volunteers. Looks to me like it’s not really a technical problem, more of a money problem, with a side order of bad timing.
If you’re wondering why COBOL is still used for things like unemployment claim processing (and air flight booking and tax return processing and invoicing and and and…) the answer is that COBOL uses accurate decimal arithmetic for computation by default. Sure, Java has BigDecimal, but it’s ugly to use, whereas COBOL gets the math right with no coding overhead. That’s important to big companies, tax offices, and the Federal Reserve.
]]>main()
binary will be in Kotlin.First of all, don’t set up a new Kotlin project using IntelliJ IDEA. If you do
that, the project will be set up to have IDEA build the code, which will make
your life harder. In fact, if you want to (say) use kotlinx.serialization
it’ll make your life impossible. Yes, there’s a plugin to build Kotlin code
with IDEA, but think of that as a set of training wheels to get newbies writing
Kotlin without needing to know about build tasks.
I say this because I tried to use Kotlin without Gradle being involved. I’d had a few bad experiences involving Android development, and really didn’t want to repeat them. However, eventually I just had to give in and start using Gradle, and when I did… it actually wasn’t that bad. Gradle’s pretty speedy these days.
But to get to that point, you need to tell IDEA to set up a Gradle project, but targeting Kotlin/JVM. At that point I was faced with a small but significant checkbox: “Kotlin DSL build script”. I checked the box.
Gradle is traditionally scripted with Groovy, a scripting language modeled on Ruby. Groovy has a bit of a troubled history – it started off as a project of G2One, who were bought by SpringSource, who were bought by VMware. VMware were bought by EMC, Pivotal was spun off and given Groovy to look after, EMC were bought by Dell, and Pivotal gave Groovy to the Apache Software Foundation, which tends to be a bit of a software hospice ward. The JSR document for standardizing Groovy was set to dormant in 2012 after 8 years of inactivity. Not auspicious, and I don’t feel like investing time in learning Groovy.
The downside of using Kotlin for scripting Gradle is that almost all the how-to information out on the Internet assumes you’re using Groovy – even for building Kotlin applications, because Kotlin scripting of Gradle only hit 1.0 in 2018.
So, I told IDEA to set me up a Gradle project targeting Kotlin/JVM with Kotlin DSL build script. I gave it a group ID and project name, it churned away for a while and set me up a new project:
.gradle/
.idea/
gradle/
src/
main/
java/
kotlin/
resources/
test/
java/
kotlin/
resources/
build.gradle.kts
gradle.properties
gradlew
gradlew.bat
settings.gradle
It also enabled the Gradle integration in IDEA, which added an editor pane listing the available Gradle tasks so I could just double-click to (say) build a jar.
I dropped a couple of Kotlin files into src/main/kotlin and was able to build a jar. It didn’t run, though – no main class in manifest.
Gradle with the Kotlin DSL has an application
plugin. I thought that might do the trick, so I added it to the plugins
stanza in the build.gradle.kts
file:
plugins {
kotlin("jvm") version "1.3.40"
application
}
…and added a stanza specifying options, including the main class name:
application {
mainClassName = "myapp.MainKt"
}
No luck. It turns out the application
plugin is to make it easy to run the application from inside your IDE. However, it’s not too hard to change how the jar file is generated to add the manifest info, and take it from the application options so you don’t have to specify the class name twice:
tasks.withType<Jar> {
manifest {
attributes["Main-Class"] = application.mainClassName
}
}
I now had a Hello World application that would build, run, build into a jar, and run as a jar.
I’d already decided I wanted to use tinylog with its Kotlin API. That turns out to be simple enough. I opened up the build.gradle.kts
file again and found the dependencies
section and added to it:
dependencies {
// Improved Kotlin stdlib using JDK 8 features
implementation(kotlin("stdlib-jdk8"))
implementation("org.tinylog:tinylog-api-kotlin:2.0.0-M4.3")
implementation("org.tinylog:tinylog-impl:2.0.0-M4.3")
}
I added some logging calls and ran a new build. As expected, I got a jar with my code in, but it would no longer run because the library classes weren’t found.
I found a bunch of contradictory suggestions for how to build a fat JAR file. The method that works, which is the one JetBrains use themselves, is to use the Gradle Shadow Jar plugin.
The documentation for Gradle Shadow tells you how to use it if you’re scripting in Groovy, but not if you’re scripting in Kotlin. Turns out the incantation is as shown here:
plugins {
kotlin("jvm") version "1.3.40"
application
id("com.github.johnrengelman.shadow") version "5.0.0"
}
There’s a catch, though. The Shadow plugin only works with Gradle 5 and up, whereas IDEA bundles an older version of Gradle. I downloaded and unpacked a binary distribution of Gradle 5 point something, and navigated through IDEA preferences, Build Execution Deployment, Build Tools, Gradle. There I chose “use local Gradle distribution” and pointed it at the directory I’d just unpacked.
The machine did mysterious things for a worrying amount of time, then rebuilt the project using Gradle 5.
I had one more problem, which was that my dependencies soon grew to request two different versions of the same package — kotlin-reflect
. The solution was to add an explicit dependency to force a particular version:
dependencies {
implementation(kotlin("stdlib-jdk8"))
implementation("org.jetbrains.kotlin:kotlin-reflect:1.3.40")
implementation("org.tinylog:tinylog-api-kotlin:2.0.0-M4.3")
implementation("org.tinylog:tinylog-impl:2.0.0-M4.3")
}
With that, I had a new set of tasks under shadow
. I double-clicked shadowJar
and IDEA started Gradle, which built me a fat jar that worked.
go test -v ./...
or try to run continuous integration, the tests fall over from lack of memory.
There are lots of things that could be causing the problem, but one you should definitely check for is unclosed SQL database statements and connections.
You’re probably aware that the general pattern for SQL queries in Go, assuming you have a sql.DB
connection pool in the variable db
, is something like this:
rows, err := db.Query("select * from tablename where id = ?", id)
if err != nil {
return fmt.Errorf("can't query tablename for %d: %v", id, err)
}
defer func() {
err := rows.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "warning: failed to close SQL statement\n")
}
}()
for rows.Next() {
// process returned data
}
The problem is what happens if you forget that deferred rows.Close()
(or it fails for some reason). Some or all of the retrieved data sits around in memory, and the SQL statement stays open. Depending on the database, that may even prevent Go from being able to close the database, keeping all the database connection structures stuck in memory too. (I believe MariaDB will let you close a database connection even if there are unclosed statements, but DB2 most definitely will not.)
The end result? Database structures and data collect in memory every time that piece of code is called.
For this reason, it’s a good idea to make your unit tests which involve database operations check for unclosed DB connections and statements.
The main hurdle is that you need to write your own TestMain
method in any unit test file that performs database operations. Here’s what it will look like:
var db *sql.DB
func TestMain(m *testing.M) {
// Get test database
db = testutils.GetTestDatabase()
// Tell Go to return closed DB connections to the pool
// rather than leaving them idle
db.SetMaxIdleConns(0)
testutils.ResetTestData(db)
// This actually runs your unit tests
code := m.Run()
// Now we check to see if any connections were left open
c := db.Stats().OpenConnections
if c != 0 {
panic("connection left open")
}
os.Exit(code)
}
Of course once you have identified that DB connections are being left open, you need to identify which ones. In general, of course, it’ll be whichever one you wrote most recently. If you have a codebase which has never been tested like this before, though, you might have a lot of database leaks. In that case, the workaround is to check OpenConnections
in each individual test.
Nevertheless, Notes predated Java, JavaScript, Unicode, TCP/IP support in Windows, and many other things we now take for granted. Sometimes the age of Notes shows through in how the product works internally.
An example of this is the date and time handling. Originally Notes used basically the same time zone lists as Windows 1.0, which in turn was based on how MS-DOS did things. Dates and times were “automatically” converted to your time zone, and that was it. Back in the late 1990s it became clear that this was insufficient, so Notes was given a new type of field which could be used for selecting and storing time zone information. Internally, the time zone was stored as a set of undocumented keys and values packed into a string, for example:
Z=-1$DO=1$DL=3 -1 1 10 -1 1$ZX=130$ZN=W. Europe
Clearly Z is the Zulu offset, the number of hours from UTC. DO is Daylight Observed. DL is the rules for DST observance, ZX is likely the internal code number of the time zone in the old list, and ZN is the name of the time zone.
Unfortunately, time zones change quite regularly, because governments like to mess with them. It wasn’t long until changes were needed to the Notes and Domino time zone list. Unfortunately, the values in the string were stored in millions of user documents on customer servers around the world, so there was no way to easily rename (say) Tonga to Tongatapu. Nor was there any easy way to remove obsolete values. So instead, an extra layer of translation was added. The document might say ZN=Tonga
in the time zone field, but that would be displayed as Nuku’alofa in the front end, that being the largest city on Tongatapu.
In the 1990s, Java was the new hotness. Early versions of Java had a terrible API for date and time handling; simple problems like producing a timestamp in Internet format could take ten lines of code. Nevertheless,
IBM decided to bet big on Java for enterprise software development. Notes and Domino were rebuilt, practically from the ground up, using Java EE and Eclipse technologies. Amazingly, full backward compatibility was kept with the old Windows version of Notes. This meant that when you started being able to develop Domino applications in Java, you found that as well as the Java Calendar
and Date
data types, there was also a DateTime
class representing dates and times as handled by Notes since the 1980s.
Over the years, enough people complained about Java’s date and time APIs that it became clear that they needed replacing. Finally, Java 8 introduced a whole new date and time API.
If you haven’t learned Java 8 java.time
APIs, well, they are fantastic compared to the old ones. Many problems which used to require error-prone hacks, such as how to represent a date with no time, now have straightforward and robust solutions.
Unfortunately, because it took until Java 8, there’s a huge amount of code out there which uses the old APIs. Oracle published some notes on how to convert between the old and new classes, but basically left the problem as an exercise for the programmer.
So it was that I decided to sit down and work out a utility class with all the conversions, and a set of unit tests to verify them. I called the utility HodgePodge, partly after the state of most Java date/time code, and partly after the hodge/podge transformer of Principia Discordia. I hoped that my HodgePodge class would help me reduce the general amount of chaos in my code.
Next, I turned my eye towards doing the same thing for IBM Notes and Domino DateTime values and time zone fields – that is, building utility functions to convert them to and from the new Java time API, since modern Domino development is done in Java and JavaScript.
This proved to be a tougher task. The first step was to build a list of Notes/Domino time zones, and work out how to convert them to Java 8 time zones, which are based on the Olson time zone database. To do this, I had to consult a number of sources.
First of all, I assembled a test database with every possible Notes/Domino time zone value represented, so I could examine both the front-end and back-end representations of them. I could then cross-reference that with Microsoft’s time zone list for Windows, and a somewhat out-of-date IBM list of Notes 4 time zones that someone helpfully included in an article about decoding hexadecimal universally unique IDs.
With all that information, I could often find an obvious semantic match in the Java 8 timezone list. If I couldn’t, my next step was to consult the site zeitverschiebung.net, which seems to catalog every single time zone for every major city with typical German thoroughness, in each case providing the Olson time zone name – which is usually (but not always) found in the Java 8 list.
For an example of an easy one, consider ZN=Eastern Standard Time (Mexico)
. There’s no Java/Olson entry under Mexico for any kind of Mexican Eastern Standard Time, but the Notes UI displays “Chetumal” in the time zone selector. Chetumal isn’t in Olson either, but if I look up Chetumal on the handy German web site, it tells me that the appropriate time zone is Mexico/Cancun
. I know Cancun is a resort on the east coast of Mexico, which explains why it was called “Eastern Standard Time (Mexico)”, so that’s that one sorted.
It wasn’t long before I found some traps. How about ZN=US Mountain
? That’ll be US Mountain Time, such as is observed in Colorado, so America/Denver
, right? Wrong! The user interface control labels it Arizona. Arizona doesn’t observe DST, unlike Colorado, so the correct Java zone name is America/Arizona
. There’s a separate ZN=Mountain
for US Mountain Time.
Then there was ZN=Central America
. That appears as Central America in the UI as well. It’s a Microsoft time zone, their page tells us that it represents “Central America Standard Time”, whatever that is, but no examples are given of anywhere which uses that time zone. However, looking at the underlying data structure it says DO=0
, which means it doesn’t have DST. I went on vacation in Costa Rica and remembered that they don’t observe DST, because they’re a very sensible country. The offset matches, so America/Costa_Rica
will do for that.
Then there’s TZ=US Eastern
. You might think that that’s US Eastern Time (EST/EDT), but you’d be wrong because the value displayed to the end-user is Indiana (East). Indiana has been observing DST since 2006, so right now eastern Indiana is on EST/EDT – but there’s no guarantee that that will continue, so the best coding is probably America/Indiana/Indianapolis
.
This sounds tedious, right? So you might be wondering if I could just use the Z value (offset from UTC) and DO value (DST observed) to sort things out. The answer, sadly, is no. It turns out that there’s both a Central Europe Daylight Time and a Central European Daylight Time, both abbreviated to CEDT, and they’re semantically different, but they currently both have the same UTC offsets and both observe DST. So if you want to convert times and dates with best fidelity, you can’t just use the UTC offset (with or without DST), and you can’t rely on the abbreviation for the time zone.
There’s a bigger problem, though – when the Domino DateTime
object was created, Windows only supported integer time zone offsets – that is, whole numbers of hours. So the call to get the offset from UTC still only returns an integer. That’s a problem if you live in South Australia, where the time zone is +10:30. (And yes, quite a lot of people do live in Adelaide.)
So, what to do if the code needs a zoned time and date, but there’s no date/time field with a semantic value, just the offset found in the DateTime
object? I could build an OffsetDateTime
in Java, but there’s no way to get the necessary information. Or is there?
After thinking about the problem for a bit, I came up with a solution. The DateTime
class has a method to return the date and time as a string, in its original internal time zone. It also has a method to return the date and time as a string in GMT. I could grab both, subtract one from the other, and that would tell me what the internal time zone offset was!
Unfortunately, the string format used by the DateTime
object depends on the local i18n settings on the system where the code is running. So first, I had to write code which took the Notes/Domino i18n settings – provided as an International
object – interrogated them, and built a format string to parse the appropriate date and time format using Java’s new DateTimeFormatter
. I could then parse both my timestamps, calculate the time zone offset including the minutes, append that to the original timestamp, and parse that as an OffsetDateTime
!
Oh, but it’s not that easy either. There’s a bug in Java 8 where it can’t parse timestamps with time zone offsets for which the number of hours is a single digit. It’s fixed in Java 9, but there’s no Java 8 backport, and Java 9 isn’t LTS so I’d have to wait for Java 11. So instead, I had to assemble my own ZoneOffset
object, and combine it with a LocalDateTime
to produce an OffsetDateTime
.
Anyway, back to time zones. Australia was easy compared to some of the other values. Consider the time zone abbreviated as YW2
.
The YW2 time zone is stored in Notes as ZN=Mid-Atlantic
. The front end display is “Mid-Atlantic - Old”, which gives a clue as to the problem: no country actually uses that time zone any more. It used to be the time zone of Fernando de Noronha, an archipelago of 21 islands off the coast of Brazil with a population of around 2,800 inhabitants. However, Brazil’s eastern islands no longer observe daylight saving time (or horário de verão as they call it); only the west side of Brazil bothers with that.
So the Olson and Java time zone databases have no entry for “GMT-2:00 with DST”, hence nothing which corresponds to YW2. Which is a shame, as someone has managed to enter data in YW2 into one of the databases I manage, as I discovered while testing my code.
For that case, I decided to code YW2 as Etc/GMT+2
and accept that the data may be off by an hour for half the year, since I think the data is likely bogus to start with given the time zone attached. (Yes, GMT+2, because the Etc zones have + and - reversed to match POSIX and Microsoft conventions where positive time zones are west of Greenwich.)
Then there are a few values which are simply wrong in the current IBM Domino time zone list. The list has Samoa as UTC+13, but Samoa changed to UTC-11 (same as Niue) in 2011, so for a while it wasn’t in the Samoa Time Zone (defined by the US). Checking online suggests that Pago Pago (American Samoa) also moved time zone later on, so both Western Samoa and American Samoa are now UTC-11. Still, should I translate the time zone to American Samoa, Samoa (the nation), or even Pacific/Pago_Pago
? I decided that since Samoa is an actual nation, references to Samoa without additional qualification should be taken to mean the nation, and I put Pacific/Samoa
in the conversion function.
Finally, I went back and tested that all my test data produced valid semantic time zones when passed through the conversion functions, and results that were what I expected plus or minus an hour (because of DST being observed or not).
Overall, it was a gruelling and incredibly annoying experience, but I now have my web service providing events feeds in JSON with useful time zone information. To save any IBM customer from having to go through the same pain, I’m putting the code up on Github. Please let me know if you find any bugs!
]]>Suppose you use a TIMESTAMP WITHOUT TIME ZONE
data type, because you know all your values are going to be UTC. Unfortunately, when you read the values back via your Go code and convert them to strings, you get something like this:
2012-11-01 22:08:41 +0000 +0000
Why is Go giving you “+0000” twice? The answer is explained in Go issue #11712. If the time.Time
value only has a numeric time zone offset, and not a semantic timezone (such as ‘Central/CDT’ or ‘UTC’), the numeric offset is also used as the time zone’s name.
The value you wrote into the database was in UTC. However, PostgreSQL threw that away when the value was stored in a column of type TIMESTAMP WITHOUT TIME ZONE
. When PostgreSQL fishes the value out of a column of that type, it returns a string with no timezone information to the SQL driver. The Go pq
driver’s behavior is to assume that a timestamp with no time zone information has offset 0, rather than to assume that it’s UTC.
This is correct behavior on the driver’s part. There are multiple time zones with offset 0, and they aren’t identical to UTC. You could be in British Standard Time, or in Western European Time, in which case daylight saving time changes will need to be accounted for.
So, what if you really want your timestamps to be returned with a value in UTC? I did, so I started considering my options.
The first option I considered was to write code to mess with every timestamp I got back from my database, check if its zime zone offset was 0, and if so change its timezone to UTC.
The disadvantages of this approach are pretty obvious: I have to write the code, debug it, and remember to insert it everywhere I query a timestamp column.
In fact, the Go pq
driver seems to do something a bit like this – looking at the code for ParseTimestamp, it adjusts the time zone to be the current location’s time zone if the offset matches what PostgreSQL returns.
I didn’t want to have to remember to call a function to correct all my timestamp values after every query; I wanted a solution that would give me the right value when I called sql.Scan()
.
The second option I considered was to change the timestamp column to be a TIMESTAMP WITH TIME ZONE
, so that my time.Time
value’s time zone would be recorded in the database and hopefully returned later on.
However, when I investigated further, it turned out that that’s not what actually happens. In spite of the name, TIMESTAMP WITH TIME ZONE
columns do not have a stored time zone. If you read the documentation carefully, you’ll find this note in section 8.4:
For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone.
The original time zone is thrown away; there are no functions to return what it was. So TIMESTAMP WITH TIMEZONE
really means ’timestamp to which time zone calculations should be applied’, or ’timestamp which should have a time zone associated with it when read from the column’.
To test this, I wrote two timestamps into a TIMESTAMP WITH TIME ZONE
column and then read the values back, and this happened:
Value written Value read back by Go
2012-11-01T22:08:41Z 2012-11-01 17:08:41 -0500 CDT
2012-11-30T19:08:41Z 2012-11-30 13:08:41 -0600 CST
Both values were written while my current time zone was CST. So it seems PostgreSQL adjusted the values to be in what it thinks the local time zone would have been at the UTC stored moment internal to the timestamp. It certainly didn’t store the time zone of the value written into the column, or my local time zone when I wrote the values.
To make things worse, the adjustment was being made to the server time zone. That can be altered by the database server owner in postgresql.conf
, and good luck getting it changed if you’re using someone else’s server, like a cloud service provider’s server.
For maximum code robustness, I clearly couldn’t insist that the server be configured to return UTC timestamps from TIMESTAMP WITH TIME ZONE
columns.
What’s more, my entire goal was to avoid having PostgreSQL adjust with my timestamps. TIMESTAMP WITH TIME ZONE
seemed fundamentally the wrong thing to do, as it has the opposite effect, telling PostgreSQL to go ahead and try to perform time zone adjustments.
This led me back to my original choice of TIMESTAMP WITHOUT TIME ZONE
for all my timestamp columns. I wondered if I could get PostgreSQL to return a value which Go would convert to a time.Time
in UTC.
Since my experiment had shown that Go would give me a named time zone if the value it retrieved was a TIMESTAMP WITH TIME ZONE
, I decided to try using the PostgreSQL AT TIME ZONE
construct, also available as the function timezone(zone, timestamp)
:
SELECT ID, Timestamp AT TIME ZONE 'UTC' FROM UpdateLog
Or equivalently:
SELECT ID, timezone('UTC', Timestamp) FROM UpdateLog
Sure enough, Go now gave me values with a semantic time zone. Unfortunately, it was still the server’s local time zone, which the UTC values were adjusted to.
I did some more digging, and discovered that there’s a timezone
parameter you can set during a PostgreSQL session. If you do SET TIMEZONE TO 'UTC'
before executing your queries, you’ll find that all your timezone('UTC', value)
are returned through as UTC. Hurrah!
There’s just one more problem: it’s a pain having to SET TIMEZONE
for every database connection, not least because Go’s sql.DB
is a connection pool. How to make sure every connection gets the appropriate value?
The answer to that conundrum is that you can add parameter values to the connection string. So after I have fetched the URL of the cloud PostgreSQL server from the environment, I add the parameter:
u, uerr := url.Parse(dburi)
if uerr != nil {
return dburi, uerr
}
v := u.Query()
v.Set("timezone", "UTC")
u.RawQuery = v.Encode()
dburi = u.String()
This adds the parameter timezone=UTC
to the URL if it isn’t there, and overrides any timezone parameter already in the URL.
Having worked out that I could set the session time zone, wrap values in timezone('UTC', value)
, and get the results I wanted, I suddenly realized I was being dumb: timezone('UTC', value)
around a timestamp-without-timezone value, with the session timezone set to UTC, gives exactly the same results as reading a timestamp-with-timezone column with the session timezone set to UTC. No timezone()
adjustment call required.
So while it might seem that making the columns TIMESTAMP WITH TIME ZONE
is storing unnecessary information or causing unnecessary time zone adjustments, that’s not true – the stored information is just a UTC timestamp whether your timestamps are WITH TIME ZONE or not.
The difference between TIMESTAMP WITH TIMEZONE
and TIMESTAMP WITHOUT TIME ZONE
is whether your values get time zone adjusted when you read or write them. And if you set timezone=UTC
for the session, they don’t.
So in conclusion:
TIMESTAMP WITH TIME ZONE
so that you get a time zone in your retrieved values, and don’t worry because one isn’t being stored.timezone=UTC
in the connection URL so that the time zone you get in your retrieved values is the one you want, UTC.time.Time
values to whatever time zone you want for display purposes.When writing data, ideally convert to UTC first, though PostgreSQL will probably do the right thing as long as Go knows the right numeric offset for the time zone associated with the time.Time
value.
And remember, elect me to be Supreme Galactic Ruler. I will decree that all times and dates are to be in UTC and formatted as per ISO 8601, and I will send any violators to the Imperial Martian Salt Mines.
]]>The first is that when I SSHed from a Linux machine to a Mac, something was getting messed up or not initiated correctly in my terminal session. There were control codes which messed up the display, and no matter what I set the prompt environment variables or TERM to, they remained. The problem didn’t occur in the opposite direction.
The more recent issue was that Zsh’s automatic “here’s what I think you might have meant” code broke, and commands which were unrecognized were completely ignored with no error message. This was enough of a potential issue to make me look again at other shells.
Over the years I’ve gone from csh
to tcsh
to bash
to zsh
, so the prospect of another change of shell wasn’t that daunting. (For the curious, I started with csh
because it was the best shell available on Xenix in the mid 1980s.)
I don’t write code in shell script unless I absolutely have to, so scripting strength is relatively unimportant to me. All I really care about are ease of use and ease of configuration. Based on those criteria, my early impressions of fish
are positive.
It sets up my terminal correctly. SSH from Linux to Mac works fine now that it’s my login shell.
Terminal emulator tabs in Linux actually show what’s running in them without my having to do any special configuration.
The vi
command line editing works well, again without any messing with configuration.
The prompt is sensible by default – it shows vi mode, user name, hostname, and abbreviated pwd
by default. It’s the first shell where I haven’t felt the need to mess with the prompt.
Config is handled cleanly. Unlike the mess of startup scripts used by bash
, which installers often try to patch for you, there’s a single config.fish
for user preferences and a conf.d
directory for package-provided scripts.
Autocomplete works all over the place, thanks to fish
parsing man pages to work out the available options. It also autosuggests completions in gray without my having to hit tab, so I end up using the feature more.
Variables are all lists, making it easier to script operations on multiple files.
There’s a really handy ‘universal’ global variable type which applies to all fish
sessions, including ones already running.
Lacking a library of zsh or bash scripts, my first task was to convert my .zshenv
file to fish
syntax. The biggest part of that was rewriting the chunk of script which works out my PATH
. For the sake of cross-platform compatibility and consistency of behavior, I make my shell run through a list of possible path directories found on all the Unix machines I use, in the order I would want them, and construct a path from the ones which exist. It didn’t take long to rewrite, though I admit I found myself wondering if the syntax changes between bash/zsh and fish were really worth the pain caused.
Then I tackled .zshrc
, and the problem which inspired this article: ssh-agent
. If SSH password prompting has always worked for you, well, consider yourself lucky, because I now know what a ghastly pile of hacks it is behind the scenes.
For starters, GNOME’s graphical SSH agent is about as defective and lacking in basic functionality as I’ve come to expect from GNOME, in that it doesn’t understand elliptic curve keys or bcrypt-protected keys. For that reason I’d been using ssh-agent
, but it had never really remembered that I’d unlocked my SSH keys already, and I’d never bothered to dig in and find out why. I’d just been typing my key passphrase every time, or accepting that ssh-add
would only apply to the current shell.
Here’s the problem: When you run ssh-agent
it sets up a daemon with a socket for communication. It then outputs the location of the socket and the pid of the daemon. The shell is supposed to pick up those values, and then when you spawn a new shell that new shell can continue to use the existing daemon.
The problem is how the new shell is supposed to do so. The agent spits out the information in the form of a piece of shell script which the shell is supposed to execute, which you can see by using the -D
option to prevent the agent from daemonizing:
% ssh-agent -D
SSH_AUTH_SOCK=/tmp/ssh-Xxp2vbEK9x4k/agent.16030; export SSH_AUTH_SOCK;
echo Agent pid 16030;
^C
There’s also an option to kill the current agent daemon and clean up the socket.
Clearly there’s an issue around the fact that ssh-agent
outputs shell script. It knows how to output sh
and csh
, and that’s it – no fish
, no rc
, no emacs
… But that’s the most superficial issue.
The deeper problem is that the operation each shell actually wants to perform is: “If the information in the environment points at a live daemon and a socket file, use them. Otherwise, kill any daemon with the pid that’s in the environment, start a new daemon with a new pid and socket, and use those.”
If I were writing ssh-agent
I’d make that the default behavior, or at the very least a behavior available using command line options. No such luck with ssh-agent
though. You can either kill an existing daemon, or you can start a new one. You can’t do both, presumably because they’re logically different operations. You also can’t ask the agent to check which is necessary.
No problem, you think, and you write a script which checks the socket and PID and calls ssh-agent -k
to clean up if either of them are missing. And then you discover that if the environment points at a PID which isn’t a running daemon – say, if it crashed – the agent throws an error.
My summary would be that ssh-agent
is pedantically correct rather than useful, with functionality implemented without regard to how anyone would want to use it in real life.
As evidence for this, my .zshrc
(customized from a default provided by some Linux distro) simply called ssh-agent
and hoped for the best, and I’m not the only one to have done so. There are long-winded attempts to work around the SSH agent’s behavior, but I’d never installed any of them. I’d just lived with typing my passphrase a lot. I imagine there are lots of Linux users who don’t even know that it can be any other way.
There are various solutions out there to make the SSH agent work with fish
, of course. They mostly seem to be distributed as plugins and are fairly long and involve using grep
, sed
, and even /bin/sh
. I felt like I could do better.
After quite a lot of experimentation, I came up with a short self-contained fish script for ssh-agent. It uses ps
to test for a running agent, and kill
to kill one, but everything else is done inside fish. It will either display “Found ssh-agent” or “Started ssh-agent” depending on which operation it had to perform, so you can check its behavior when you open new tabs or start subshells. (Oh, and on macOS you don’t need it at all.)
Once I’d gotten ssh-agent
working, it turned out that none of the rest of my .zshrc
was necessary in fish
. It was 200+ lines of setup to make things work which simply worked by default in the new shell. So, if you don’t use your interactive shell as your main scripting language, give fish
a try.
Since I’m on vacation and I was drinking my morning coffee and feeling generally benevolent to the world, I wrote up a reply. This is that reply, recycled as a post and expanded.
Given the following information:
var add = function(a, b) { return a+b; };
var add2 = mybind(add, 2);
add2(5); // outputs 7
Write the mybind
function.
First of all, to keep things as clear as possible I’m going to follow Douglas Crockford’s suggestion in “JavaScript: The Good Parts” — a book which I strongly recommend — and put a space before the parentheses ( )
when they aren’t being used to invoke a function. So if you see a(b)
in my code without a space it means a
is a function and b
is an argument.
So, let’s run through the code from the problem.
var add = function (a, b) {
return a+b;
};
This one’s easy enough — we’re defining a function which adds two values together and returns the result.
var add2 = mybind(add, 2);
add2(5); // outputs 7
Now it gets tricky. Looking at the last line first, I’m going to assume that “outputs 7” means that we want it to return 7, so in my final code — which I’m testing with NodeJS — I’m going to do:
console.log( add2(5) );
I’ll consider the other possibility later.
Now add2(5)
returns 7
, so add2
must be a function which takes a single argument and returns a single argument.
But add2
is created by the mysterious mybind
function in the line before. So now we know that the mybind
function itself must return a function. We also know what kind of function it returns — one which takes a single argument, and returns a single argument. Finally, we know that mybind takes two arguments.
So let’s write a skeleton version of “mybind” which does all that, but leave all the insides empty:
var mybind = function (a, b) {
return function (x) {
// ??? compute r for result
return r
};
};
Clear so far? That’s most of the battle, all we need to do now is work out what goes in the // ???
line, given a
, b
and x
.
At this point I was still having trouble seeing the right answer before finishing my morning coffee, so I decided to give myself some help: I renamed the function arguments to remind me of what the example values are in the problem! So instead of calling the first parameter to mybind a
, I called it add
; I called the variable which is going to hold the value 2 in the problem two
, and so on. Once I did that, I had this:
var mybind = function (add, two) {
return function (five) {
// ??? compute seven for result
return seven;
};
};
At this point, the right answer will hopefully leap out at you the way it leapt out at me: remembering that add
is a function which adds, seven = add(five, two)
.
So, the final code is:
var mybind = function (add, two) {
return function (five) {
var seven = add(five, two);
return seven;
};
};
Of course, we can now rename the function arguments back to something general, and eliminate the unnecessary temporary variable. Here’s the final result, which I fed to NodeJS to check that it worked:
var add = function (a, b) {
return a + b;
};
var mybind = function (fn, a) {
return function (b) {
return fn(b, a);
};
};
var add2 = mybind(add, 2);
console.log(add2(5));
The only remaining issue is whether add2(5)
is really supposed to return 7 for the assumed REPL loop to display, or whether it was actually supposed to output 7 directly. That’s simple enough, though — it’s just a matter of changing mybind to (say) call console.log
instead of returning the value.
The functional programmers of the JavaScript community will find the above solution rather unsatisfactory. It’s far too verbose! If functional programming is about anything, it’s about making your code look like scary opaque math in order to save a few keystrokes.
Fortunately, ECMAScript now has a wonderful tool for doing that: arrow functions. Instead of the boring old:
function (a, b) { return a + b; };
we can type:
(a, b) => a + b;
Half as much typing! So let’s use this new syntax to rewrite our mybind
function:
let mybind = (fn, a) => (b) => fn(b, a);
Isn’t that so much better for confusing inexperienced programmers?
As you can probably tell, I’m not a fan of this shorthand syntax. I feel that it obscures function bodies, making them implicit, and at a glance it looks too similar to a simple expression. I’m sure mathematicians love it, but I came to computer science via science and it makes my brain hurt.
Functional programmers are also likely unsatisfied with my mentioning currying at the start of the article, because technically the problem is about partial application via function binding. So let me show why I thought “currying” when I saw the problem.
Here’s our problem and solution in shiny new ECMA syntax:
let add = (a, b) => a + b;
let mybind = (fn, a) => (b) => fn(b, a);
let add2 = mybind(add, 2);
console.log(add2(5));
Currying is taking a function with multiple arguments, and reducing it to a sequence of applications of functions with a single argument. So f(x,y,z) becomes f’(x)(y)(z), where f’ is the curried version of f. f’(x) returns an unnamed function, and when that function is called with the value y
that returns a function, and when that function is called with the value z
it returns the answer.
We can then go further and say that f(x,y,z) is curry(f)(x)(y)(z) where curry is the currying function which takes f and returns f’.
When I saw the problem, I imagined a curried add function:
let cadd = (a) => (b) => add(b, a);
The imaginary curried add cadd(a)
would return a function which, when applied to b
, would call add(a, b)
and return the result.
The cadd
function is similar to mybind, except that mybind
takes an additional argument for the function to apply instead of add
:
let mybind = (fn, a) => (b) => fn(b, a);
So let’s finish the currying process. Given that we know what curried add would look like, we can write the currying function which, given add
, returns cadd
:
let curry = (f) => (a) => (b) => f(b, a);
And sure enough:
let add = (a, b) => a + b;
let curry = (f) => (a) => (b) => f(a, b);
console.log( curry(add)(5)(2) ); // outputs 7
If you’re totally lost, don’t worry too much — this is pretty much useless, except for passing a computer science degree or terrifying people in technical interviews.
]]>I started out with little code organization in terms of files and directories. I eventually migrated to model, view and controller subdirectories, plus an internal/test directory for test utilities.
I’m not a big believer in ORMs. I’ve tried them in the Java world, and found that the benefits aren’t enough to make up for the downsides, at least not given my reasonable SQL-query-writing abilities. I eventually settled on a DAO model, with lightweight core object classes and FooMapper DAOs.
I think DAO verbs should match REST verbs. It’s tempting to implement UPSERT and use it for both PUT and POST, but then your REST API ends up overly forgiving.
The downside of using REST verbs is that it pushes some of the problem onto your service consumers – they need to know whether they’re writing an object which already exists, or a new one. On the plus side, it makes the DAO code pretty trivial.
I’ve found that a good naming pattern is to include the class name in the DAO method name if the method takes an ID as argument, but to exclude it as redundant if the method takes an object argument. This also means you can have both Delete(Foo) and DeleteFoo(id).
So I ended up with
Avoiding cyclomatic complexity seems to be much easier in Go than in Java. In Java, I regularly get PMD complaining about method complexity > 25; in Go, my highest (excluding test code) was 9.
I spent more time worrying about how to structure my code, and less time actually coding or working out how to do things. Eventually I’ll learn the Go patterns, but for now it’s all different. So, don’t expect to use the same patterns you would in Java or Ruby. For example, in Java it’s considered a mortal sin to expose fields directly. In Go, it’s standard practice, if you look at the standard library.
If something ends up being complicated, you’re probably doing it wrong. This is a very generally applicable rule, but particularly relevant when moving between programming languages with very different paradigms. For instance, I kept forgetting that in Go, you don’t declare that you’re going to implement an interface – you just do it.
Similarly, when I started writing my web service handlers, I found myself frustrated by the lack of generics. I soon had a ton of code working around the lack of generics using a shared interface, factories to instantiate objects, and long switch statements. After that, I used currying to build a function which would build a function to perform operations on an object of a given type, using reflection where necessary. After that, I ripped out all the clever code and just wrote plain old classes which implement the same interface. It would be nice if Go had generics to enforce that the classes have the same interface, but it’s not a huge deal.
Follow a learner’s path:
Think interfaces first, and make your interfaces small.
And of course, every time you find a bug, try to add a unit test that would have detected it.
If you return x, error
, error should always be non-nil if x is invalid.
Otherwise, you’re just asking for nil pointer runtime crashes.
For a field to be serialized as JSON or XML, it needs to be exported – but that means it has to be visible to users. If you have a Java background it’s tempting to have a hidden field and getter/setter pair, but then you need to write your own custom serializer. The Go way is to simply make the field visible and assume users will behave responsibly.
If you are building a web service, it might be tempting to try to mock the database and mock up HTTP requests for your unit tests. Then you realize you need to bypass the router, and pretty soon your test code is longer than the handler method you’re actually testing. Better to do a full functional unit test including router and database components. After all, a bad route change can kill your application as dead as a handler error.
Don’t try to build class relationships using struct embedding. Trust me, that way lies madness.
Don’t try to work around the lack of generics. If you can’t come up with a single interface, just go ahead and build independent classes.
There’s no Date type, so if you want one for your database records you’ll have to build it yourself.
After a long time in the Ruby/Rails and Java/JSF world, Go feels extremely “bare metal”. No login system provided, no flash messages, no sessions even! It can certainly trigger some impostor syndrome feelings.
In the end, though, there’s the benefit that there’s no “magic” being applied; I’ve found that magic do-it-for-you code often misbehaves, and when it does you can be powerless to fix it.
Ultimately, I still like Go, and want to carry on working with it. The radical simplicity sometimes makes for some annoyances, but it has upsides.
]]>Today the tech arrived and was unable to reproduce the problem. Even with port 4 connected to the switch — the highest traffic network segment — everything was normal.
Finally, he checked the network maintenance logs. It turned out that I just happened to run a speed test during the exact time period when Google was pushing out a firmware update to my network box. It probably dropped the link to 100Mbps while it was flashing the firmware and rebooting. The fiber UI is very barebones and doesn’t show that information at all, so I had no way to know that.
Making details invisible to users is great, until the moment when things don’t work as expected.
]]>It’s a lot simpler than anything involving SAML. Validating SAML requires a full implementation of XML Signature, which requires an implementation of XML Canonicalization, which requires a full XPath implementation. I’m not anti-XML in general, but I don’t think authenticating a user should require parsing, traversing and rearranging a DOM tree multiple times.
It’s more secure than OAuth2, as we’ll see below.
It can be implemented in situations where your web application can only reach the authentication provider via HTTPS — you don’t need to be able to make LDAP or Active Directory connections.
It’s an open standard.
The downsides of OpenID Connect?
It’s not very popular. Most Internet authentication providers seem to have rolled their own systems based on OAuth2.
It’s kind of a pain in the ass to implement.
The documentation is long and somewhat unapproachable.
This is a summary of the key information I wish someone had given me before I tried to make sense of OpenID Connect.
An OpenID Connect authentication provider has a set of four key endpoints:
An authentication endpoint. You direct the user’s browser to this via an HTTP 301 redirect in order to start the authentication process. The user is bounced back to your web application after logging in, with parameters added to that HTTP request. The parameters may be a code, an ID token, an access token, or some combination of the three.
A token endpoint. This is a REST API your application can use to obtain an id token or access token, given a code.
A user info endpoint. This is a REST API your application can use to obtain information about a user, given an access token. Typically it’s used to get information that doesn’t fit inside the ID token, such as the user’s avatar image. It also may not be supported, as it’s totally optional.
An introspection endpoint. This is a REST API you can pass an access token to. It will return information about the token, such as whether it’s valid, when it will expire, and so on.
There are other optional endpoints, but those are the most important.
As mentioned in the above descriptions, OpenID Connect involves passing around three key pieces of information:
An id_token
is what you typically want — key information about the logged-in user’s ID and their name.
The access_token
(or just token
in some places) is a token you can use to connect to the token endpoint, userinfo endpoint or introspection endpoint. It therefore allows you to obtain an ID token, get more detailed information about the user, or find out when the granted authentication will expire.
A code
is a one-time code you can use, accompanied by a secret ID and password you’ve pre-arranged with the authentication provider, to connect to the token endpoint and get a token.
Given the above, there are several ways to do OpenID Connect.
With Implicit Flow you bounce the user to the authentication endpoint, and when they return you’re passed an ID token (and optionally, an access token so you can look up information that doesn’t fit in the ID token).
The advantage of implicit flow is that it’s really easy to implement. The disadvantage is that it’s not very secure.
For example, browser malware such as a malicious extension can sniff the tokens from the response. Because the ID token is a standard format, it can collect the user’s account information. The stolen access token can be used with the authentication provider’s APIs as well.
There are some ways to mitigate the risk somewhat, but a much better option is…
With Authorization Flow you bounce the user to the authentication endpoint; but when they return, all your app gets is an opaque code. You then need to make an authenticated REST call to the token endpoint, using an ID and password (secret) you obtained by registering with the authentication provider.
Because the tokens are not obtained via the browser, you can avoid being open to token stealing via browser malware. In addition, the application’s ID and secret are never given to the browser, and without those the browser malware can’t obtain tokens from the token endpoint.
The obvious initial downside is that your web application has to make a connection via TLS to the authentication provider.
Hybrid flow is when you make a call which requests a code as well as tokens. It has all the security disadvantages of implicit flow, and I can’t see why you’d want a code if you’re going to be given tokens anyway.
It’s pretty clear that authentication flow is the one to choose for security reasons. That’s probably why the authentication providers I need to use at work don’t support implicit flow. So, let me go through the authentication flow in detail, from the point of view of a web application. The OpenID Core document has examples of HTTP requests and responses, so I’ll skip those and just try to briefly and clearly summarize the process.
The first step is to bounce the user via HTTP 301 to the authentication endpoint with an OAuth2 request. The request has a scope of openid
and a response type of code
, and a redirection URI pointing back to your web application.
The ID provider then asks the user to log in. This may involve pretty much anything you can imagine. In my case, there’s a company login and password, and then I get asked for a TOTP one-time code from my phone.
Once the user has logged in, the authentication provider bounces them back to the URL that was specified in the request. The code
value is added to the query part of the URL.
Next, your web application takes the code value and uses it to make a call to the token endpoint. The connection is made via HTTPS (TLS), and authenticated using an ID and secret (password) you were given when you registered your application with the authentication provider.
(The ID and secret may be passed as JSON values, or using regular HTTP Basic authentication. The latter is preferred, according to the standards.)
If all goes well and you weren’t given a fake code, you get back a hunk of JSON containing the tokens you requested. The one you’re probably most interested in is the id_token
, and this is where things get a bit tricky.
The id_token
is encoded and signed. You have to decode it, check the signature using the public key of the ID provider, check the token hasn’t expired, and only then can you rely on the information in it.
The standard to follow for the decoding and signing is JSON Web Tokens (JWT). If you get this far and paste the base64-encoded id_token
string that you got from the token endpoint into the jwt.io web site, you should see the info you want appear as the payload.
So your code needs to load a PEM public key, decode the JSON Web Token, and check its contents using the public key. If the signature is invalid, someone’s trying to impersonate the authentication provider, and you don’t trust the info in the ID token.
If the ID token validation succeeded and you also requested an access token, you might then use the access token to make a call to the profile endpoint, to get additional user information. (Home address, photo, you name it — it’s extensible, like LDAP.)
I’d love to say “Go here and use this library”, but at least for Go I haven’t found anything that will do the whole job.
There are many OAuth2 libraries, but none of them seem to support RSA signed JWT tokens.
There’s an openid2go project which looks as if it might work, but it relies on OpenID Connect Discovery being supported. Unfortunately, the authentication providers I need to use don’t have discovery enabled. Furthermore, it doesn’t look to me as if openid2go supports authentication flow.
What about at least handling the JWT part? Well, I took a look at the three Go packages linked from the JWT web site.
SermoDigital’s jose
package describes itself as “A comprehensive set of JWT, JWS and JWE libraries”. Unfortunately the documentation is lacking, and I can’t work out how to perform signature validation against a public key, assuming that’s actually implemented.
Next, jose2go. That one’s nice and simple. Given the id_token
text you got from the token endpoint as a string in the variable idtoken
, the process looks like this:
pubkeydata, err := ioutil.ReadFile("pubkey.crt")
// Check error and deal with it here
pubkey, err := Rsa.ReadPublic(pubkeydata)
// Check error and deal with it here
hdr, payload, err := jose.Decode(idtoken, pubkey)
// Check error and deal with it here
The ID token info you want is then in the hdr
variable, and payload
can be examined to make sure the expected signature algorithm was used — because you don’t want to accept unsigned tokens. I checked that a tiny change to the payload correctly caused a signature error at the Decode stage.
Finally, there’s jwt-go. This one works too, but the decoding and verifying process is slightly more involved because you need to supply a callback function to look up the appropriate key given the signing method. You get a Claims
map with the values you want, and a Valid
field indicating whether the validation succeeded. Again, a tiny mutation to the data was successfully detected.
After some trial and error I came up with some robust code to demonstrate examining a struct and altering its fields. I decided to write it up in detail here.
Let’s start with the struct:
type Name string
type Person struct {
FirstName Name
LastName Name
Age int
}
First of all we define a custom type, so that our fields aren’t all primitive types and our example is a bit more general. Then we assemble a simple struct.
Next, we instantiate an example struct of that type, and pass it to a function.
t := Person{"John", "Smith", 23}
reflectOn(t)
The signature for the function will be as follows:
func reflectOn(s interface{}) {
// Code here
}
Why the function call for a simple example?
Well, in real code you aren’t going to be performing reflection in the same function where you create an object. By passing an interface{}
argument, we manage to completely bypass all type safety, meaning our example will be forced to rely on reflection. In real code, of course, you’d ideally narrow the scope a bit with a more specific interface.
In addition, by putting the reflection code in a function we can call it twice:
reflectOn(t)
reflectOn(&t)
Now we can make sure our code deals with structs passed by reference as well as by value, and we can demonstrate the important difference that makes, concerning whether you can change the fields of the struct.
Recall that unlike (say) Java, Go is all about interfaces, not classes. A given object (struct) can implement any number of interfaces, just by providing the right methods. Object oriented programming is done via composition, rather than a class hierarchy.
So when you define a function in Go which operates on objects, you specify the interface which the objects must implement in order to be acceptable to the function. Our function takes an argument of type interface{}
. That’s the empty interface, one with no methods specified, so as per the spec absolutely anything implements it — even a primitive type such as an int
.
So our function accepts objects with any interface type. What it receives is an interface value in an interface variable.
If you’ve read about the implementation of the Go type system or at least tried to digest The Laws of Reflection, you’ll know that an interface value in Go is a tuple consisting of a type descriptor (“this is a string”) and the type-dependent data representing the value.
So, the first step for our reflection function is to extract those two things:
ptyp := reflect.TypeOf(s) // a reflect.Type
pval := reflect.ValueOf(s) // a reflect.Value
Next we want to look at the type of thing we were passed, to make sure it’s what we expect. However, the Type
we just extracted is the specific type of the value — a Person
in this case. What we really want to know is whether it’s a struct, before we try to go looking at its fields. So we look at the Kind
of the type, which we obtain from the Type
by calling Kind()
.
You might want to try printing out the value of ptyp.Kind()
. If you try it with the two function calls:
reflectOn(t)
reflectOn(&t)
…you will quickly discover that in the second case, the interface type’s Kind is Ptr
. So although in Go you can often ignore the distinction between a struct and a pointer to a struct, when it comes to reflection the difference is exposed.
So our function needs to know how to deal with pointers and get at the thing pointed to. The reflect package provides a method Elem()
which operates on a Value
and dereferences to get the Value
pointed at. A similar method does the same thing for the Type
. So:
var typ reflect.Type
var val reflect.Value
if ptyp.Kind() == reflect.Ptr {
fmt.Printf("Argument is a pointer, dereferencing.\n")
typ = ptyp.Elem()
val = pval.Elem()
} else {
fmt.Printf("Argument is %s.%s, a %s.\n", ptyp.PkgPath(), ptyp.Name(),
ptyp.Kind())
typ = ptyp
val = pval
}
At this point, our two new variables typ
and val
contain the Type
and Value
of the actual struct, whether we were given it as an actual value or via a pointer. Now we can make sure that it really is a struct:
if typ.Kind() != reflect.Struct {
fmt.Printf("Not a struct.\n")
return
}
If this seems like a lot of work, remember that in real code you would know whether you were going to call your function with a struct or a pointer to a struct and would just call Elem()
or not (as appropriate) in the first line or two of code.
Next, let’s examine the key difference between passing a struct by value and passing it by reference:
if val.CanSet() {
fmt.Printf("We can set values.\n")
} else {
fmt.Printf("We cannot set values.\n")
}
If you try the code so far, you’ll discover that if your function call is reflectOn(t)
then CanSet()
will report that you can’t set values. If it’s reflectOn(&t)
, you can set values. If you learned programming by learning Java, this probably makes no sense to you at all, but it goes back to the invention of function calls in programming languages. A brief digression is in order. If you’re a C or C++ programmer, you can skip to the next section.
Back in the 1950s, the programming language Algol 60 was being designed. One of its design goals was to support recursion. This meant allowing an unrestricted number of function calls — functions calling functions calling functions. To do this, Dijkstra invented the stack. (See: A Brief History Of The Stack.)
Each time a function was called:
In the 1960s, memory was scarce and computers were slow. A 1964 IBM 360 mainframe started out with 8KiB of memory and executed 34,500 instructions per second — so not even equivalent to a 1MHz clock. If you wanted to pass a string to a function, the idea of copying the entire string onto the stack and off again would have crippled performance. So instead, any argument whose data was larger than a few bytes would be replaced with a pointer to the argument.
The same methods were used for function calls in CPL, which was modeled on Algol. CPL gave way to the BCPL programming language, and its successor C.
Nowadays compilers use various tricks to speed up function calls. For example, if all the arguments will fit into processor registers, they get passed that way instead of via the stack. However, conceptually Go still uses the same stack-based argument passing as its programming language ancestors. One difference, however, is that Go will actually shove an entire string or other large data object onto the stack if you ask it to — conceptually, at least.
When we call our function via reflectOn(t)
, Go pushes an entire copy of the struct t
onto the stack. The function retrieves it as s
. The function doesn’t have any way to know where the copy came from. Whatever it does with the copy, the original will remain unchanged.
When we call our function via reflectOn(&t)
, Go pushes a pointer to the struct onto the stack. The function retrieves the pointer. At that moment, it can access the original structure — so any changes it makes will be visible when the function returns and the original structure is examined.
So although our code makes sure that typ
and val
are the Type
and Value
of the struct, in one case they are the type and value of a copy of the struct, and any changes we try to make will be ignored — so Go warns us of this by returning false
from val.CanSet()
. Notice that whether the value is settable is a property of the value and how we obtained it, not a property of the type of the structure; the struct’s type is identical in both cases.
We’ll get back to this in a few more lines of code. First, let’s see how we look at the fields of the struct. Logically, the fields of the struct and their types are defined in the type definition of the struct, so we would expect to use the typ
variable to access the individual fields. And so we do:
for i := 0; i < typ.NumField(); i++ {
sfld := typ.Field(i)
At this point we have a value representing a field. If you’re used to how Java reflection works you might expect it to be some sort of field class of a particular type that you can use to access the data, but in Go there’s another step to go through.
In Go, the .Field(int)
method, when called on a struct, always returns a special StructField
object. To get the actual type of the field, we need to call Type()
on the StructField
. Just as we examined the underlying type or ‘kind’ of our function argument, so we can do the same for the field:
tfld := sfld.Type // The Type of the StructField of the struct
kind := tfld.Kind() // The Kind of the Type of the StructField
OK, now how about the value? Java gives you a single Field object which you can interrogate for both type and value information. Go has two separate sets of objects to handle that. So just as we called .Field()
on the struct’s Type to get at the field’s type (via an intermediate StructField
), so we need to call .Field()
on the struct’s Value
to get the field’s value. This time, however, there’s no intermediate StructValue
:
vfld := val.Field(i)
fmt.Printf("struct field %d: name %s type %s kind %s value %v\n",
i, sfld.Name, tfld, kind, vfld)
Running the code at this stage will produce output like this:
struct field 0: name FirstName type main.Name kind string value John
struct field 1: name LastName type main.Name kind string value Smith
struct field 2: name Age type int kind int value 23
So, we’ve decoded our struct completely, down to field level. We’ve extracted both the specific types (including custom types we defined), and the underlying primitive types. We’ve even read out the data.
Now that we can read from the struct, let’s work out how to change it.
You might wonder whether setting a value is an operation you perform on the type, or an operation you perform on the value. In a dynamic language like Ruby, you’d expect to call a type-dependent method to set the value. But Go is statically typed, so you can’t change the type of a field at runtime — only its value. So to change a value you use a Set
method on the value of the individual field, as returned by the Field()
method of the struct’s Value
. And if you try to tell the Value
to take on a value of a different incompatible type, Go will panic.
Also, you need to use the Set
methods on the Value
of the individual field you want to change — not the interim StructField
. So let’s try it:
if kind == reflect.String && vfld.CanSet() {
fmt.Printf("Overwriting field %s\n", sfld.Name)
vfld.SetString("Anonymous")
}
Notice that the field’s Value
has its own CanSet()
method, just like the overall struct’s Value
does.
So now I can restate the part that confused the heck out of me: You can’t modify the value of a struct in Go using a Type
or StructField
. To perform a reflection operation in Go you need to go through two separate interrogation processes: first you start with the struct and retrieve all the type information you want and check it, then you start again at the struct and work down the value chain to the field value and change it.
You can interleave the operations, as I’ve done in this example code, but fundamentally you’re dealing with two different trees of information.
You can get the complete code on GitHub with added comments. If you run it, you’ll see quite clearly the behavior difference between calling with a struct, versus calling with a pointer to a struct:
First, passing the actual structure:
Argument is main.Person, a struct.
We cannot set values.
struct field 0: name FirstName type main.Name kind string value John
struct field 1: name LastName type main.Name kind string value Smith
struct field 2: name Age type int kind int value 23
After reflection:
John Smith, 23 years old
Now, passing a pointer to the structure:
Argument is a pointer, dereferencing.
We can set values.
struct field 0: name FirstName type main.Name kind string value John
Overwriting field FirstName
struct field 1: name LastName type main.Name kind string value Smith
Overwriting field LastName
struct field 2: name Age type int kind int value 23
After reflection:
Anonymous Anonymous, 23 years old
Hopefully that covers everything you need to know about reflecting on structs.
]]>One problem is that most people don’t understand boolean algebra, to the point that they don’t understand the difference between “and” and “or” in a set of search clauses. Unless your audience is highly technical, giving them a choice between “and” and “or” will only confuse them.
If you look at sites which successfully allow complex queries, they do it by using the type of UI component to express whether the choice is “and” or “or”. For example, NewEgg:
Clauses are implicitly “and”, except for sets of checkboxes. So the final search here is “Newegg Premier” AND (price $200-300 OR price $300-400).
This “implicit boolean operators” search pattern has become almost ubiquitous, though you do find the occasional site running older search software that still has drop-downs to choose “and” or “or” for each clause.
Another subtlety of search is the underlying conceptual model of the operation. To a programmer, search is straightforward: you start with an empty result set, you assemble a list of criteria, you go to the database and pick out items that match the criteria, and you add them to the result set.
That’s not the user’s conceptual model, though. To the user, you start with the set of every possible item, and you then eliminate items which don’t match the criteria. The conceptual model mirrors how you’d perform a search manually.
This becomes relevant when the user is given an advanced search interface, selects no criteria at all, and asks the system to search. Programmatically, the set of items which don’t match any criteria is the empty set; but what the user actually expects based on their mental model is all items.
Google catch this scenario by disabling the search buttons on the front page until the user types some text. They also ignore empty search string submissions using the Enter key. Interestingly, they leave the Search button visibly enabled, even though it doesn’t work. It would certainly be possible to have the button visibly disabled until the search text field contained a non-empty string; presumably they’ve done A/B testing that demonstrated that buttons initially being disabled led to user confusion.
I initially considered catching the “no criteria” case and redirecting the user to the browse UI instead. On reflection, though, I realized that selecting no criteria and hitting “Search” was a legitimate thing to do. The use case is “find out what the search results page will look like, then decide how many criteria I need to specify”.
Not a hard problem from a technical point of view, but I thought it showed an interesting subtlety involved in getting search right.
]]>Item 1 was actually news to me. Item 2 is something that ought to occur to you if you know how databases handle ACID updates and how SSDs handle writes, but obviously there are an awful lot of people who use both technologies with no idea of the underlying details.
Item 3 is an interesting one. The fundamental problem is related to item 2: the operating system basically has no way to know how the SSD is operating internally, and can’t make smart decisions to optimize performance to ensure low latency operations, even if they’re just reads.
Something similar happened with hard disks a few decades ago. In the 60s, 70s and 80s, the operating system generally handled the translation of a flat logical block address into individual values for cylinder, head and sector (CHS). In addition, the OS generally knew the cluster size of the underlying hardware — that is, the minimum number of sectors the hardware would read in a single operation.
So a request to read sector 48321 of the disk might translate into a request to read cylinder 3, head 2, sector 1 — and might actually result in sectors 0 thru 3 being read, and sector 1 passed back.
Because the OS knew about the disk geometry, it could arrange files and data structures in order to speed up access. Defragmenting, which most Windows users are somewhat familiar with, was only part of this.
Suppose you’re reading a large database file which is unfragmented. You can slurp up the first (say) 63 sectors by lowering the head to the disk and reading all the way around a single circular track. Now you need the next sector, so you instruct the head to move in to the next track. But while the head moves sideways, the disk is still spinning, so by the time the head is in place to read the next track it might be over (say) sector 11. So if you map sectors in the naïve way, you’ll have to wait for almost an entire revolution of the disk before sector 0 is under the head again. So ideally, you’ll skew adjacent tracks by enough sectors to give the head time to move to the next track.
Oh, but often hard disks would spin so fast that it wouldn’t be possible to shunt a sector of data across the bus to the memory quickly enough before the next sector on the track turned up under the head. So when I said you could slurp 63 sectors in a single operation — well, maybe you could, but not necessarily in that order. Often sectors would be interleaved on the track, 0 32 1 33 2 34 3 35 and so on. You’d read sector 0, wait for it to clear the bus, read sector 1, wait for it to clear the bus, and so on, reading the entire track in a single operation — but needing two disk revolutions.
There are other complexities. Maybe the drive has 4 heads. In that case, you might want to put the first track of data on head 0, the second on head 1, and so on, and then arrange it so that by the time you’ve read a track from head 3, head 0 will have moved across to the next track, and that track will be skewed enough that the next sector will be coming up. To do this, you need to know how long it takes to switch heads, because that takes time too.
Basically, every change of raw CHS address had a particular set of latency characteristics, whether you were switching head, switching sector, or switching cylinder. They weren’t simple fixed latencies, either. Cluster sizes could be different, and even sector sizes could be different. Optimizing it all was a dark art.
An aside: In the early days of MS-DOS hard disk support, there was a fixed limit on the number of sectors DOS could handle. But, people wanted bigger hard disks! Some manufacturers came up with a brilliant idea: They’d make their sectors twice as big, 1024 bytes rather than 512. They’d ship a special patched version of MS-DOS that knew about big sectors, and they’d be able to sell hard drives twice as big as their competitors could! Everyone would be happy, right up until Microsoft released a new version of MS-DOS, and the customer installed normal Microsoft MS-DOS on his computer, and MS-DOS treated every sector on his hard disk as half its actual size and scrambled all his files. That’s how I learned the intricacies of this stuff — working in data recovery…
Eventually people observed that tracks at the outside of a disk could easily squeeze in more sectors than the smaller tracks in the middle. Drives started to be built which had a variable number of sectors per track. This was a bit of a problem for CHS-based addressing, and often the drive would pretend to have a fixed number of sectors per track for simplicity.
Then in the mid 80s, SCSI started to become popular. With a SCSI hard disk, your computer simply asked for logical sector 48321, sending a request down the SCSI bus. The hard drive grabbed the data and sent it back. All of the messy details of cylinders, heads and sectors were hidden from the operating system. That meant the OS couldn’t use that information to optimize its requests. However, it also meant that you didn’t need to spend hours optimizing your hard disk’s addressing algorithm in order to get good performance. SCSI controllers started to include big chunks of RAM cache, and to do things like reading ahead a few sectors and caching the results on the off chance that you might ask for one of those sectors next. In addition, SCSI had a killer advantage: it didn’t require CPU time. But your disk performance was greatly dependent on the quality of the disk’s SCSI interface and its firmware. My first SCSI hard drive was generally great, but it was cheap (I was a student) and its latency was all over the place.
So that’s the point we’re at with SSDs. Some of them have a RAM cache, some don’t. Some have really good firmware, others not so much. If you want predictable performance and reliability, you need to be very careful who you buy SSDs from, and bear their long term characteristics in mind. For example, don’t skimp on RAM for OS-level caches, don’t swap to SSD, and make sure the machine is properly cooled.
]]>After solving the Day 5 problems pretty quickly, I went to the discussions and was a bit surprised to find that everyone was talking about regular expressions. I hadn’t used them. It had seemed pretty obvious to me that the state machines and string searches required could be handled explicitly, could do the job in linear time, and would likely be faster than a regexp engine.
But… what if I was wrong? What if I’d written the extra code for nothing?
I decided to implement a second pair of solutions using precompiled regexps and the PCRE library, make sure they passed the same tests, and benchmark all four. I had to install a PCRE library, because Go’s built-in regular expression library doesn’t support all the Perl-like features needed to solve the problem concisely.
Here’s what I got when I benchmarked:
BenchmarkNice1-4 2000 738282 ns/op
BenchmarkNice1RegExp-4 1000 2269376 ns/op
BenchmarkNice2-4 2000 901082 ns/op
BenchmarkNice2RegExp-4 300 4140755 ns/op
The second number in each row is the number of runs Go decided it needed to average to get a good measure of the speed.
My intuition was right, using a compiled language and a straightforward non-regexp state machine implementation beat a PCRE-based one, at least for this fairly simple problem.
Pondering performance some more, I realized that Unicode processing was likely impacting the performance of my code, what with Go handling strings as UTF-8 by default.
I decided that for fairness, I should make sure both the regexp and non-regexp implementations dealt properly with Unicode.
The problem specifically says that the only characters deemed to be vowels are aeiou
. However, part 2 talks about letters, which would logically include things like é
and ü
.
Also, nowhere does it state that the algorithm is allowed to assume that the input strings are only letters, so I decided to make sure my code could process things like a☃️ee⛄️ioo🌨ee🎅u
. This is complicated in some programming languages, because Santa does not reside in the the [Basic Multilingual Plane](https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane). Not a big deal in Go, though.
Once I’d done that and made sure all the tests still ran, I reran the benchmarks:
BenchmarkNice1-4 2000 674877 ns/op
BenchmarkNice1RegExp-4 500 2412329 ns/op
BenchmarkNice2-4 1000 2160843 ns/op
BenchmarkNice2RegExp-4 300 4806563 ns/op
I’d managed to speed up my part 1 code while reworking it, but part 2 was slower. Turning on UTF-8 processing for PCRE and fixing the regexps made all of them a bit slower too. Overall, the non-regexp solutions still win.
The moral of the story: Yes, regular expressions are really useful, but they’re not necessarily the first thing you should reach for.
Of course, the usual caveats about premature optimization and the benefits of writing as little code as possible still apply.
]]>On the whole, I like it. Some high points:
It’s reasonably terse, and there’s not much syntax to remember. Like Ruby, you can do a lot in not much code, and unlike (say) Java there isn’t really a big need for an IDE.
It inherits from Modula-2 a focus on compile-time efficiency. No header files, a compact grammar, and rigid dependency declaration make for rapid compilation times.
Performance is good, often comparable to C++.
Cross-compilation is easy. On my Mac I can GOOS=linux GOARCH=386 go build hello.go
and get a binary that I can just rsync to my Linux box and run there. And…
Deployment is easy: The compiler spits out a single binary, and that’s all you need. You don’t need any sort of Go runtime installed on the destination system.
Like Ruby, you don’t have to rigidly specify what interfaces your objects support; it’s all duck typed.
The speed of the tools and ease of deployment make programming in Go a curiously retro experience. It feels a bit like I’m back sitting in front of a copy of Turbo Pascal.
What don’t I like? Oh, a few things:
There are two different syntaxes for variable declaration, and you have to know and use both of them. (Well, it might be possible to do without :=
, but not without painful workarounds making the code ugly.)
No tail-call elimination (yet). Sometimes recursion is the best way to do something.
The standard logging library is poor. No levels, no level guards, and a gratuitously non-standard date/time format.
Variables aren’t immutable by default.
Pointers are visible.
I don’t view any of these as deal-killers, though. In fact, I’d like to go through a few things other people often cite as criticisms of Go, which I think are features rather than bugs…
Speaking as someone who writes Java for a living most weeks, I view Java’s approach to error-handling to be a well-intentioned experiment that has failed. Forcing bad programmers to pay attention to exceptions doesn’t work. In practice, they write code which catches and eats them, or fails to process them properly. (Try finding a single correct piece of example code to perform a JDBC query, dealing cleanly with all possible exceptions.)
Young programmers think they can handle memory manually, and do it right. About 40 years of industry security incidents suggests otherwise. A modern memory manager is almost as fast as well-written hand-optimized code. For almost all of the programmers almost all of the time, doing memory management manually makes as much sense as writing in assembler.
Sure, generics are useful in Java. They are also a major pain in Java. I’d rather wait and have them done properly in Go, than have a hurried ugly solution.
As long as you’re not a language snob, I recommend taking a look at Go. It doesn’t do everything right, but it does a lot of things right.
There’s a video of a talk by Rob Pike where he talks about Go’s focus on simplicity above almost everything else.
I like simple languages, and I think that’s why Go appeals to me. I like Scheme, and I like Ruby. (Ruby’s grammar is complicated, but to the human it’s very simple and consistent.) The simplicity is why I ditched Perl for Ruby. It’s why I prefer Objective-C to C++.
Java is also mostly pretty simple, it’s just annoyingly verbose — but once generics are involved, it can become really complicated; and now that it’s getting type inference, lambdas and type annotations it’s starting to look rather larger than I prefer. To see the direction it’s going, look at JavaScript. Generator functions, for example, seem to me to be a wonderful example of adding complexity to the language without actually solving the concurrency problems they’re supposed to help with.
Overall, I think Go is one of the best new languages. The only other ones that interest me right now are Rust and Swift. As far as Rust goes, I’m not completely convinced that squeezing that last bit of performance out of the system is worth the hassle of manual memory management. With regard to Swift, it seems similar in many ways to Go, but with the benefit of excellent Mac OS X support.
]]>Addison-Wesley
Like many other programmers, I learned to write C by reading “The C Programming Language” by Kernighan and Ritchie, the book known to hackers everywhere as “K&R”. I confess that I was initially drawn to “The Go Programming Language” out of sheer Kernighan brand awareness. Would it be the K&R of the Go world? I certainly hoped so, as I have strong preferences when it comes to programming language textbooks.
First of all, I like my textbooks to be concise. “The Go Programming Language” is almost exactly the same width and height as K&R, but about twice as thick. Some of that is down to paper thickness, though — it weighs in at 380 pages vs (my copy of) K&R’s 272 pages. It’s no programming doorstopper.
“The Go Programming Language” is aimed at people who are already programmers, another thing I look for. I’ve written code in well over a dozen programming languages at this point, so the last thing I need is a chapter on what variables are or what a pointer is.
My third major preference is for examples over theory. Knuth’s “Art of Computer Programming” books, for example, are far too mathematical for my taste. I find math fascinating, but when it comes down to it I’d prove De Morgan’s laws by writing a program to dump out truth tables, not by doing the math. I also like the examples in a book to be things I might want (or at least need) to do, and this is an area where this particular book shines. Examples include building an Internet chat server, building a parallel download utility, calling a JSON API, converting dates and times, and so on. Chapter 9 even goes through the construction of a useful concurrent non-blocking cache implementation.
The overall structure of the book resembles K&R, too. It opens by diving straight in to some examples, including a simple web server and code to draw some animated GIFs. By chapter 3 you’re drawing 3D surfaces and Mandelbrot sets. The general approach seems to be to take the reader on a whirlwind tour for a few chapters, convince them that it’s worth continuing to read the whole book, and then go back and cover topics in more detail. Sometimes a topic gets visited three or four times before the book gets to the bottom of it, so I recommend taking notes using an outliner or some other tool that lets you insert extra material later.
Mostly the book’s structure works, but occasionally details were omitted that I couldn’t help wondering about. For example, page 12 puts a *
in front of the os.File
type without any explanation of what it means. Even knowing that it indicates a pointer, I couldn’t work out why that type in particular needed to be a pointer in the example, when other structures didn’t. (That particular mystery wasn’t really resolved until chapter 6.) On several occasions I ended up searching the Internet for answers. I still don’t know what //!+
and //!-
comments mean in the example code, if anything.
Another thing that struck me as odd was that after covering structs, the book dived into JSON serialization and deserialization and templating, before it had even discussed how objects work. I’m being picky, though — the book did eventually cover the material, I just needed to relax and be patient and save my questions for the relevant chapter.
While the book is largely about the nuts and bolts of programming in Go, it does also cover some of what you might call the philosophy of the language — why it’s the way it is, and not like (say) C++, Rust or Scala. There’s also discussion of the benefits of encapsulation and its downsides, how to do unit testing, what recursion is, and so on.
There are two chapters on the problems of concurrency and shared data. This is necessary, given Go’s focus on concurrency — yet it’s also not enough. For example, the book talks quite extensively about race conditions and how to avoid them, but not so much about deadlocks (except as a side effect of bad mutex use). A separate book on concurrent programming would be advisable for those new to it; perhaps “Advanced Concurrent Programming With Go” is already in the works as the obvious sequel.
One thing K&R had which is missing in “The Go Programming Language” is a reference section. This book has no list of the standard libraries or APIs in Go 1.x, so you’ll need to refer to the web site at golang.org for that information. To give some specifics, the book doesn’t cover linked list data structures, logging, database access, or calling other processes. This is a bit of a shame, as at the time of writing the API documentation on the web is often minimal at best. On the other hand, I can understand that the APIs might still be too much of a moving target to be worth committing to paper, and including them all would have turned the book into a doorstop. I think the right decision was made, for now; just be aware that this is a book to teach you the core language, not a reference manual.
And with that said, my conclusion is that it’s an excellent book. By the time I was finishing reading it, things which had confused me during my early attempts at writing Go code were making perfect sense. If you’re a programmer and want to learn Go, this would be my recommendation for the first book to get.
]]>I had started programming in BASIC, messed with assembly language, then discovered Pascal. Both BASIC and Pascal were interpreted, at least in the implementations I had, so C was my first compiled language. Being able to produce machine code without writing assembler was a revelation. Sure, C had rough edges here and there — the type declarations often needed careful thought to decode — but it did the job like nothing else.
A year or so later I discovered Modula-2, again compiled. I wasn’t a big fan of the Pascal-style BEGIN…END blocks, having become used to the brevity of C’s curly brackets, but Modula-2’s import mechanism and limited multithreading made development less painful.
Still, I couldn’t help think that C and Modula-2 had some obvious things missing. The one that I found strangest was the inability to return multiple values from a function. If the compiler could push multiple arguments on the stack before calling the function code, why couldn’t it pop multiple results off the stack afterwards? Another thing I found annoying was manual memory management, particularly manually sizing and resizing arrays.
Later at university, I became interested in dataflow programming languages, experimental graphical programming environments where you wired together data channels between blocks representing computation units. The idea seemed sound, particularly with talk of massively parallel computers on the horizon, but I couldn’t help thinking that textual code was still the way to go.
I found myself imagining my ideal programming language. It would be superficially like C, but with the ability to return multiple values from functions. It would handle imports and libraries like Modula-2. It would allow parallelism, with communication by message passing, defining channels in and out of functions and wiring them together. And of course, it would support arbitrary size arrays.
Then I fell in love with Lisp, learned object oriented programming, graduated, got a job, and forgot the whole thing.
Fast forward to 2012. Google launched version 1.0 of a new programming language called Go. I ignored it for a while — you can’t pay attention to every new programming language — but after it failed to go away for a couple of years I decided to take a look at it. I soon noticed how much it seemed to resemble the imagined ideal programming language of my teenage years.
Programming language snobs have been scathing about Go, calling it a joke of a language. The funny thing about jokes, though, is that they often seem to win. Unix started out as a joke, and was mocked well into the 1990s — and now everyone carries a Unix computer in their pockets everywhere. Linux was a joke too, a crudely hacked-together clone of Unix with fundamentally the wrong architecture. The web was a joke — links that broke all the time, incompatibilities everywhere, and don’t even mention JavaScript. Plenty of people tried to build a non-Unix OS in C++, Ada or some other “real” programming language, and they all failed, often before getting to market.
I messed with Go, hacked together some simple programs, and decided it was something worth investigating further. The documentation on the web was definitely lacking, though — it was time to work my way through a good book.
Next: The book.
]]>Xenix was the first Unix I ever used. It’s how I ended up using the C Shell — I remember the Xenix box as only having csh, though it’s possible that it also had a really feature-poor version of sh. Either way, csh was the only sane option for interactive use.
When I got MINIX running at home, I stuck with csh. On real Unix, I switched to tcsh. Eventually Bash got support for Unicode, which was a compelling enough reason to switch to the Bourne family of shells. Then once Zsh got Unicode, I finally completed my destined path.
The less said about SCO’s business ventures after the Caldera acquisition, the better.
Not mentioned in the article is that Palm OS was modeled on classic Mac OS, and the hardware used low power variants of the same 680×0 series processors that 1980s Macs were built on. PalmOS had resource forks, four character file type and creator IDs — and no memory protection or virtual memory. This meant that Palm OS started to hit a technological wall a few years after the first PalmPilot; a single bad application could crash the entire device. As with classic Mac OS, there was no plausible technical way forward without basically starting from scratch, and by the end of the 90s it was clear to me that PalmOS was a dead end.
I think WebOS could have succeeded, if HP had stuck with it. The problem was, like the Apple Newton MessagePad, the software wasn’t quite ready by the time the hardware launched, and the hardware wasn’t quite the form factor people wanted. Better market research and a couple more iterations and I think they could have made it a going concern. I went with a BlackBerry instead, and watched the train wreck from a safe distance. (And then watched BlackBerry fail to deal with their own technological wall, and jumped again to Android.)
As an aside, I always expected Apple to sue Palm for patent or copyright violation.
I remember when Sun was looking for suitors, seeing employees saying “Anyone but IBM”. I suspect that hindsight may have caused them to re-evaluate that sentiment. Obviously I’m biased, but I think that IBM buying Sun would have been much better for almost all of Sun’s products.
Clearly OpenOffice would have lived for longer. In fact, IBM did its best to keep OpenOffice a going concern even while Oracle was busy killing its community. MySQL, too, would have been OK — IBM has long experience at maintaining multiple competing databases and supporting them all, and I think MySQL would have just been slotted into the portfolio alongside Informix, DB2, DB2 on i, Derby and IMS. Netbeans and Eclipse probably would have had a cage match, and Eclipse probably would have won out of inertia, but Netbeans features would have been rolled into Eclipse — and Java would have stayed an open technology.
The one Sun product I think would have had trouble at IBM is Solaris. AIX vs Solaris was still serious warfare at that point. Now, Linux is where it’s all happening (at least from my viewpoint), and I think Solaris would have ended up sidelined even if it had beaten out AIX for its new master’s affections. Still, Linux would have gotten ZFS and dtrace and the other things from Solaris that people actually like, and maybe SMF instead of systemd.
HP’s real problem was partnering with Intel to produce its new RISC-replacement. Given Intel’s history, they really should have known better. They then dumped the working Alpha CPU they had acquired by buying DEC, deciding to wait for Intel’s promised Itanic — and decided to let Intel handle all the design itself.
I guess that’s pretty much standard Carly Fiorina era HP, though — treat engineering as an expense you should avoid as much as possible, and make business decisions without their input.
The Danger Hiptop was a great product from a hardware and user experience standpoint, but when I discovered that it was entirely dependent on back-end cloud servers I steered clear and went with a BlackBerry instead. I don’t mind cloud services, but the product has to be able to work standalone. When Microsoft bought Danger I knew they were doomed. You could put together an entire article of disasters just by cataloging companies that have developed an intimate relationship with Microsoft: Microsoft and Vivo Software, Microsoft and RealNetworks, Microsoft and Kodak, Microsoft and Nokia, Microsoft and Groove Networks, Microsoft and Oddworld Inhabitants, Microsoft and Bungie, Microsoft and FASA Interactive, Microsoft and Connectix, Microsoft and Rare, Microsoft and WebTV…
Borland had two big problems which hurt sales of their compiler products: their languages were non-standard, and they were single platform.
Turbo Pascal was great, but it wasn’t standard ISO Pascal. Instead of migrating to (say) Modula-3, they added proprietary extensions to Pascal for OOP. They had C++, but they used their own layer of windowing APIs on top of Microsoft’s, which led to the horror of Borland Buttons.
So if you wrote your code for Borland C++ or Pascal, you were forever tied to Windows. That meant that once Microsoft had a workable set of development tools, Borland’s target market was people who loved Microsoft enough to want to use only Microsoft Windows as their OS, but didn’t like Microsoft enough to use their development tools. That market was approximately 0 people.
Novell’s problem was wasting its time trying to keep IPX alive and trying to be a Unix vendor. Once dial-up Internet was a thing and TCP/IP was usable even in small offices of Windows machines, IPX was dead. Windows for Workgroups in 1993 showed the future. If Novell had made a user-friendly fileserver OS by adding full TCP/IP services to NetWare, they could have had a sellable product; but practically nobody who used NetWare wanted to switch to Unix, and nobody who wanted to use Unix had any compelling reason to get it from Novell.
It was patently apparent, if you’ll pardon the pun, that Google was only interested in Motorola in order to use its ‘intellectual property’ as a defensive resource in the litigation wars started by Apple. I don’t call this one a failure. Motorola even produced some great handsets during the Google period — look at the Moto X and Moto G.
What has been a disaster, is Motorola since Lenovo took over.
Autonomy had some really neat software when I saw demos around 2000. Sure, it was all based on Bayesian inference, but it was useful stuff. Other than HP paying too much, the real problem with this merger is that HP has never been any good at selling software.
Nokia made a lot of mistakes in the 1990s. Like Palm and BlackBerry, they ignored their OS platform’s technical shortcomings. Like HP with WebOS, they made poor hardware design decisions.
However, by the early 2000s Nokia was heading in the direction of a Linux-based handset OS developed in an open source fashion, and they were about to start moving to Qt for their UI to make development and porting from other platforms easier. Maemo was almost at the point where it would have been workable for phones, it just needed a better touch UI, which Qt would likely have given it one.
It’s pretty clear to me that Microsoft wanted to kill Maemo before it was too late, and obtain Nokia’s patents so they could use them against Android. Elop may deny it, but I think this is another outwardly disastrous merger that went exactly the way it was intended to.
I mean, Microsoft had been trying to sell Windows-based mobile phones since 2000 without major success (vertical markets excepted). It makes no sense that Elop would kill Maemo and Symbian and bet the company on Windows Phone — until you realize that he joined Nokia from Microsoft.
Not convinced it went according to plan? Consider that it was Elop who made the decision to sell Nokia to Microsoft once its value had dropped by 85%, and that he got a $25m reward when that transaction was completed — with most of the cash coming from Microsoft.
Microsoft gave Elop the Xbox division to play with for a year, but he was invited to accept a horizontal promotion as soon the media had forgotten about him. They quietly killed Windows Phone, and have settled into their primary mobile strategy of collecting patent licensing fees on Android devices and forcing Android device manufacturers to bundle Office.
]]>makeitso
New in the POWER8 processor, this instruction allows data to push out to the coherence point as quickly as possible. An attempt to run the makeitso
instruction provides a hint that prededing stores are made visible with higher priority.
— Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8
]]>While Apple have promoted the feature as something aimed at improving the mobile user experience, others have pointed out that it’s transparently an attempt to harm Google. In truth, it’s both of those things.
The mobile web experience without a content blocker is wretched. Just yesterday I attempted to read an article on the New York Times, only to have it forcibly scroll my tablet to the top of the page in an endless loop to try to force me to read the ads. This new practice is probably a result of user studies I saw recently (no handy link, sorry) revealing that half of users now start scrolling down to avoid the ads before the page has even finished rendering.
Another good example of just how bad things can be is The Verge, who bloat every few kilobytes of page content with 2.6MB or more of ads and scripts. Then there are the sites still using Flash, who pop up large gray rectangles over their content.
On desktop, I don’t block ads. Instead, I block third party scripts, cookies and plugins. Given that major ad networks’ scripts have been used to spread malware, I view the script blocking as an essential security practice. The cookie blocking is simply to reduce the amount I’m tracked across the web. Flash is a security and privacy disaster that should have died ages ago, so I don’t have it installed.
But here’s the thing: blocking invasive and dangerous ads is, in practice, much like blocking all ads. Major sites like CNN and the New York Times show up ad-free, because all their ads attempt to track you and hijack control of your browser. This shows that there’s a big problem with the online advertising industry.
Newspaper ads and billboards don’t interrupt people’s navigating the city or track them as they travel, yet they’re still effective. But somehow because it was possible to do those things on the web, advertisers managed to persuade site owners that it was necessary. Well, it isn’t, and now we’re seeing the backlash.
The sites now screaming about the evils of ad blockers need to adjust their approach. If you give people a choice of invasive ads or no ads, they’ll take no ads every time. So instead, you need to rein in your demands and ask people to limit their blocking.
So, encourage users to install Privacy Badger in place of a full-on ad blocker, or to turn on the option to allow non-invasive ads in AdBlock Plus. Go back to safe scriptless ads that don’t try to install cross-site tracking cookies. People will see your ads and still keep their privacy and security, and everyone will be happy.
]]>My browser used a two-pane interface resembling a frameset, with links in a list on the right of the document being viewed. It wasn’t terribly elaborate — after all, it was only an undergraduate project — but it’s worth remembering that this was before Tim Berners-Lee developed his first web browser, and several years before Netscape introduced framesets.
I’ve scanned the document to PDF and filed it away, mostly because there’s a chance it might one day be useful as a demonstration of prior art to invalidate a software patent. I don’t plan on publishing it here, because it really isn’t anything special; I’m just writing this note for the benefit of anyone who suddenly needs prior art around framesets, hyperlinks and extrinsic links, who might want to contact me.
]]>For each of the two groups, Google assembled a list of the top five security practices followed. You probably won’t be surprised to learn that the two groups had almost completely different lists.
I thought I’d write and explain why the experts make the choices they do, and what it is about the non-experts’ choices that make them relatively poor ones.
First on the experts’ list is install software updates.
I do this. It’s part of my job, as a server sysadmin, but I do it at home as well. As soon as a software update is available which includes any bug fixes, I install it. This morning I upgraded WordPress, for example. In fact, I don’t just wait to be told about bug fixes — I look out for announcements of security issues on prominent tech news sites, and go find the fixes.
Most other people aren’t as diligent. My spouse has a bad habit of leaving software updates for weeks. Yes, it often requires rebooting, but really, it’s important! 99% of attacks happen as a result of known security flaws that people have been too lazy to install the fix for. That’s why Microsoft is going to be requiring that users accept automatic updates for Windows 10.
Second on the experts’ security practices list is use unique passwords. This is absolutely crucial. Every few months a web site gets hacked and the passwords are stolen and leaked. If it happens to you and you’ve used the same password everywhere, you have to hurry and change your password before some bad guy cross-references your accounts. Worst case for me, they get the password to the one site that was hacked.
I go a step further. For accounts where security is particularly important — like bank accounts — I use a unique user ID different from the one I use anywhere else. A criminal won’t be able to find out my bank login just because some forum got hacked.
Of course, using unique passwords means having too many passwords to memorize. That’s why an important step towards better online security is using some sort of password manager — whether it’s a paper notebook you keep in your office, an application, or a fancy online service like LastPass. That’s item 5 on the experts’ list.
Item 3 on the experts’ list is use two-factor authentication. I do this too, using the free open source FreeOTP. It implements TOTP, which is the Internet standard for time-dependent two-factor authentication. TOTP is the basis of Google Authenticator, so anywhere Google Authenticator works, FreeOTP works.
Two-factor authentication is pretty simple once you set it up. When you log in, you’re asked for the secret code. You run the app on your phone, and it shows you a 6-digit number, which you enter into the login form. That’s it, you’re done. The clever parts that make this good for security are:
This immediately helps with all kinds of security problems:
Two-factor authentication doesn’t prevent all attacks, of course, but it’s a massive improvement over just a login and password.
Unfortunately, there are a few sites that still don’t support TOTP — including Apple, Twitter, eBay and PayPal. I don’t use SMS-based two factor authentication because SMS isn’t reliable and doesn’t necessarily work if you’re overseas, meaning it’s all too easy to get locked out or have to wait half an hour to log in.
Item four on the experts’ list is use strong passwords. You might wonder why it’s so far down the list. Well, using a strong password is less important than using a unique one. If you use unique weak passwords, well, it probably won’t be that big of a deal — most web sites limit the number of attempts to log in remotely or how quickly you can try again if you get the password wrong. The real danger of a weak password is that any hacker getting the encrypted password file will be able to crack the password immediately. As long as I use unique passwords, that only lets them get into my account on the site that was hacked.
Contrast that with the situation if you use the same strong password everywhere. Eventually a web site will be hacked that doesn’t properly encrypt its passwords, and the hackers will now have your password to every site you use. Much, much worse. So unique passwords are much more important than strong ones — not that I’m encouraging you to use weak passwords, by any means.
As an aside, you might surprised by just how long and complicated a password has to be in order to be a strong password. The bare minimum password length you should be using is 12 characters, and that’s assuming you use random combinations of upper- and lower-case letters, numbers and symbols.
Now let’s go through the non-experts’ list of security practices, and consider why they aren’t very useful.
First on the non-expert list is to use antivirus software. I’m honestly not sure how important that is on Windows these days, but I’ll confess that I haven’t used Mac or Linux antivirus software in years. I haven’t heard of anyone I know getting a Mac or Linux virus in years either. Trojans, yes, but not viruses.
Second on the non-experts’ list is strong passwords, and I’ve already discussed why that’s less important than you think — but I’ll point out that it’s also trivial to use strong passwords anyway if you do what the experts do, and use a password manager.
Third on the non-experts’ list is to change passwords frequently. A few web sites seem to think this is good practice, but it usually isn’t.
First of all, if you have a strong password, there is nothing to be gained by changing it. If it’ll take a hundred years to crack it, changing it to another strong password gains you nothing.
Even if you don’t have a strong password, there’s the downside that changing passwords are harder to remember. That means people are more likely to write them down or leave them on a Post-It note on the monitor.
But the big reason why I don’t change passwords often is that the real reason for changing them isn’t to make them harder to guess. Rather, it’s to limit the amount of time for which a stolen password can be used. If someone steals my password and I don’t change it for a year, they can use that password for a year to snoop on my account.
Except… I use two factor authentication. But even ignoring that, most hackers today aren’t going to quietly snoop. If you read stories about victims of hacks, you’ll learn that today’s crooks generally steal a password and then immediately change it, to lock the rightful owner out of the account while they steal whatever they’re after.
So password changing is really only protecting you against a particular class of attacker — say, the jealous ex-boyfriend who wants to read your e-mail, or the corporate competitor who wants to spy on your sales prospects. For things like your Amazon account or your bank account, that’s not the kind of attack you need to worry about.
Number 4 on the non-experts’ lists is to only visit web sites you know.
Well, that’s an easy guideline to evaluate — it’s completely useless. Consider all the well-known companies whose systems have been hacked: Home Depot, Target, Sony, AT&T, Nieman Marcus, Michaels, Yahoo!, eBay, Evernote, Apple, JPMorganChase, Snapsave… and that’s just a partial list of the hacks in 2014 alone!
Now consider how many of the rest of the sites you know and visit run ad banners, and ponder the fact that in the same year, Google’s Doubleclick ad network exposed millions of computers to malware.
No, restricting yourself only to familiar big name web sites isn’t going to do anything for your security. In fact, since the big names are big targets and seem to be doing a terrible job of protecting you, you’re probably safer visiting some obscure web forum nobody has heard of. In a 2012 study by Symantec, the most dangerous sites to visit were religious sites — you were safer visiting sleazy porn sites than the site of your well-known local church.
The final non-expert security practice is don’t share personal information. That’s not a terrible principle to follow, but it’s way down the list of effective security measures. The problem is that you need to share personal information with many companies in order to do business with them, and criminals don’t get your personal information by asking you to share it — they get it by hacking the database where it has been collected.
The most recent example is the data breach at the Office of Personnel Management. That’s the agency that performs 90% of US government background checks. So sorry, but you don’t get to decide whether to share your personal information with the OPM — at least, not if you want a government job. And that database got stolen because as a cost-saving measure, the OPM had decided to use offshore contractors to manage their systems — so as part of that deal, they handed root access to their systems to a Chinese national living in China and another person living in Argentina. Remember, this is the database that contains all the background check information for US spies.
Similarly, the Hollywood stars who had their personal lives exposed by the Sony hack didn’t really have any choice about whether to share information with Sony, if they wanted a career in Hollywood. And while you might have read that the “hack” was the work of North Korea, the evidence suggests that it was the work of Russian hackers tipped off by a disgruntled insider.
Finally, let me throw in a vital security practice which was missing from the experts’ list: backing up.
As more and more of our data gets stored in the cloud, or on Internet-connected systems, it becomes more and more likely that a security incident will result in data being wiped. Last year, a source code hosting system called Code Spaces was hacked, and the hackers wiped all their data. Plenty of small web sites get hacked too, and forums get wiped.
WordPress is a notorious target; you don’t have to go far to find stories of sites being wiped by hackers. That’s why I keep backups of everything I post. Not just copies, either — rolling incremental backups which will let me roll back some number of days, in case it takes a while for me to notice that something is wrong. The backups are stored on a machine in my office, which you can’t get to from the web server.
So, back up your data. Use Apple Time Machine or something like it. Keep backups of everything you store in the cloud.
]]>I hate browsing the web on my phone.
I do it all the time, of course — we all do. Just looking at the stats for The Verge, our mobile traffic is up 70 percent from last year, while desktop traffic is up only 11 percent. That trend isn’t going back; phones are just too convenient, beckoning us to waste more and more of our time gazing at their ever-larger screens.
But man, the web browsers on phones are terrible. They are an abomination of bad user experience, poor performance, and overall disdain for the open web that kicked off the modern tech revolution. […]
Now, I happen to work a media company, and I happen to run a website that can be bloated and slow. Some of this is our fault: The Verge is ultra-complicated, we have huge images, and we serve ads from our own direct sales and a variety of programmatic networks. Our video player is annoying. (I swear a better one is coming, for real this time.) We could do a lot of things to make our site load faster, and we’re doing them.
I couldn’t resist doing a quick analysis of page speed for that very article:
And yes, if I emulate a mobile device, it’s the same 2.6MB of data and 81 JavaScripts.
Gosh, I wonder why The Verge is miserable to read on mobile, eh?
Expecting mobile browser makers to magically solve your site’s performance problems is unrealistic. Mobile network data has high latency and (comparatively) low speed, and that’s inherent to the technology. You simply must engineer your web site with mobile browsers in mind.
For those who have ignored web performance, the current trend of more and more traffic coming from mobile browsers is going to lead to a painful reckoning.
]]>ldapsearch
.You get:
ldap_start_tls: Can't contact LDAP server (-1)
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
Possible solution:
You’re using the -Z
option (along with -h
and -p
) to specify the host and port and request TLS. That option doesn’t work.
Instead, just use the -H
option and specify the parameters as an LDAPS URL.
For example:
ldapsearch -x -Z -h localhost -p 636 -b 'dc=example' '(objectClass=*)'
becomes:
ldapsearch -x -H ldaps://localhost:636/ -b 'dc=example' '(objectClass=*)'
This is actually mentioned on the OpenLDAP FAQ if you look hard enough:
Most clients now have a -Z flag which enables sending the StartTLS extended operation to the server. This extended operation initiates TLS negotiation. To use ldaps://, one must use -H ldaps://.
(Emphasis mine.) I’ve no idea what -Z
is for, given that it isn’t needed if you use -H with an LDAPS URL, and doesn’t work if you don’t.
Having worked out how to request a TLS connection, it still doesn’t work, so you try adding the -d 1
argument to turn on some debugging. In the client’s debug logs you see either:
TLS: error: tlsm_PR_Recv returned 0 - error 21:Is a directory
or
TLS: error: tlsm_PR_Recv returned 0 - error 13:Permission denied
Solution:
You probably have SELinux enabled. OpenLDAP doesn’t work with SELinux. Disable it and reboot the server, or work out how to persuade SELinux to let openldap open its own config files.
]]>Component | Status |
---|---|
systemd | Replaces init, cron, inetd, udev, locale, acpid, atd, su. |
systemd-journald | Replaces syslog, klog. |
systemd-logind | Replaces getty, login, xdm. (*) |
systemd-networkd | Replaces ifup, ifdown, tcpwrapper, hostname, dhcpd. |
systemd-journal-gatewayd | Provides HTTP server for systemd-journal. |
systemd-timesync | Replaces NTP. |
systemd-resolved | Replaces resolvconf, bind, powerdns-recursor, dnsmasq. |
systemd-automount | Replaces autofs. |
systemd-readahead | No longer supported because everyone who matters has SSDs. |
systemd-machined | Replaces VirtualBox, VMware. (Coming soon.) |
systemctl-cat | Replaces cat. Really. |
systemd-udevd | Replaces udev. |
systemd-firewall | Replaces iptables. |
systemd-journal-remote | Replaces logstash/elasticsearch with a new logging protocol over HTTP. |
gummiboot | To be incorporated, replacing grub. |
Also, here come the security vulnerabilities:
…systemd-resolved does not implement any of the hardening recommendations of rfc5452
(*) OK, so it’s a bit more complicated than that. Eventually the plan is to get rid of X11 for Wayland, but currently X11 is still allowed. However, systemd is part of the PAM configuration now, and systemd starts your X session with display manager and window manager. So far login and getty still service legacy non-console ttys.
]]>One misconception I’ve seen posted across the web is that you’re not in trouble if you don’t use bash as your shell, or that you’re safe if you have dash as /bin/sh
. Sadly, that’s not true. GNU tools like env
will use bash to execute other programs no matter what your personal shell is. Other programs you use may invoke bash too — for example, on Ubuntu the zcat
utility is a bash script. So are Firefox and Chrome.
As for exploits, basically any program which sets environment variables based on user input and uses the shell to execute something can become a vector. There’s a working exploit for Linux DHCP, so you can get pwned simply by connecting to a WiFi network. There are at least two worms spreading via the web, where you can pwn servers just by setting your web browser’s user agent to something cunning. There’s an IRC worm active, and an exploit via email has been demonstrated for some server-side software. SSH restricted mode is affected, becoming SSH totally unrestricted mode. And that’s just the stuff I know about.
At work, a colleague just asked what needs to be upgraded. My answer: Every Linux system on the planet. Prioritize ones which accept data from outside your organization (no matter how they do it). You’ll eventually want to patch everything running a Linux distibution, because the bug is also perfect for use crafting local root exploits.
Yes, it really is that bad. Even Windows systems with Cygwin are affected. Many embedded systems use busybox rather than a full shell, so (for example) your router might be safe, but I advise that you check: See if /bin/bash
exists and whether /bin/bash --version
reports that it really is bash. One small silver lining is that Android phones aren’t affected unless the user has chosen to install a bash shell, so we aren’t looking at a mobile phone worm apocalypse yet.
Oh, and if you patched your Linux system yesterday, patch again, because the first patch didn’t completely solve the problem, and there was a second patch released this morning.
Apple, meanwhile, still doesn’t have a patch for their old forked version of bash. Instead, they have a statement that the issue is not a problem for Mac users, so I’m sure hackers are developing OS X exploits right now.
Some people have been warning about the dangers of bash scripting for years. I’ve been bash-averse for decades; in fact, I never write shell scripts unless I absolutely have to because shell in general has so many pitfalls compared to (say) Ruby.
Of course, I’ve still had to deal with the fallout from shellshock, but I must admit I’m chuckling about the fact that purely by chance, I decided last week was the week I would finally get to grips with using Ansible to push software updates to all the servers I look after.
Another bash basher is David Jones, who has now written about why bash is bad, and provides some tips on turning your bash scripts into standard shell scripts.
If you really must continue to use bash, use unofficial bash strict mode.
]]>With systemd, every daemon has to hand over socket handling to systemd. That means portable daemons will have two different code paths, depending on whether they’re running on Linux or not. That’s a testing and reliability headache for developers.
Because systemd controls the sockets, it will end up being a dependency of every daemon, unless distributions ship both systemd and non-systemd packages for all their daemons. No doubt that’s good for forcing systemd adoption, but it’s going to be a pain for everyone else.
Got a daemon you want to run that hasn’t been systemd-ized? Good luck with that.
Seriously, it’s 2012 and you’re trying to introduce a new configuration file syntax modeled on Windows .INI files?
Systemd has a deliberately undocumented binary log file format, in an attempt to replace syslog. And no, you can’t turn off systemd’s syslog replacement and use a standard syslog.
With systemd written in C and controlling all the TCP/IP ports, it will become a primary attack vector for crackers and malware. Because it’s the init daemon, every time a systemd security update is pushed, you’ll need to reboot. But don’t worry, I’m sure the author of PulseAudio can write the kind of bug-free code necessary for good security, right?
Regarding item 5, the document arguing in favor of systemd having its own logging system gives a number of bogus reasons, so let’s go through them:
The message data is generally not authenticated
It can be if you configure syslog that way.
The data logged is very free-form. Automated log-analyzers need to parse human language strings to a) identify message types, and b) parse parameters from them.
The timestamps generally do not carry timezone information, even though some newer specifications define support for it.
I wrote about that in 2005 and a fix has been available since 2010. Here’s a log line from my server:
<22>1 2014-05-18T14:22:56.950703-05:00 castor postfix 3139 - - D4F38C1317: removed
Syslog is only one of many log systems on local machines. Separate logs are kept for utmp/wtmp, lastlog, audit, kernel logs, firmware logs, and a multitude of application-specific log formats.
Adding systemd is not going to magically stop bad software from writing its own logs in its own format, any more than the existence of syslog did. And the fact that syslog writes to multiple separate log files rather than one huge database is a feature, not a bug.
Reading log files is simple but very inefficient. Many key log operations have a complexity of O(n). Indexing is generally not available.
If only you could syslog to a database… Oh, wait, you can.
The syslog network protocol is very simple, but also very limited. Since it generally supports only a push transfer model, and does not employ store-and-forward, problems such as Thundering Herd or packet loss severely hamper its use.
Whereas systemd sends network log events over HTTP, a protocol known for its reliability, efficiency and store-and-forward support?
Log files are easily manipulable by attackers, providing easy ways to hide attack information from the administrator.
Also fixable with properly configured standard syslog.
Unless manually scripted by the administrator a user either gets full access to the log files, or no access at all.
…which is why syslog writes to multiple log files, so that access can be handled on a granular basis.
The meta data stored for log entries is limited, and lacking key bits of information, such as service name, audit session or monotonic timestamps.
Repetition, see above.
Automatic rotation of log files is available, but less than ideal in most implementations: instead of watching disk usage continuously to enforce disk usage limits rotation is only attempted in fixed time intervals, thus leaving the door open to many DoS attacks.
It’s not hard to imagine how this could be fixed by integrating rotation into syslog. In fact, there are already implementations via logging to channels.
Compression in the log structure on disk is generally available but usually only as effect of rotation and has a negative effect on the already bad complexity behaviour of many key log operations.
You can log to zip files if you care mostly about disk space, or log to a database if you care mostly about speed.
Classic Syslog traditionally is not useful to handle early boot or late shutdown logging, even though recent improvements (for example in systemd) made this work.
That’s why we have the kernel ring buffer and klogd, and then transfer those initial log records to syslog later on.
Binary data cannot be logged, which in some cases is essential (Examples: ATA SMART blobs or SCSI sense data, firmware dumps)
I see that as a feature. You log the binary blob as-is in a directory, then log its name and the other metadata in the syslog.
Basically, I don’t see any functionality in systemd’s journal that necessitates replacing syslog.
]]>
As a child growing up with a home computer, I was fascinated by Teletext. Unfortunately, we didn’t have a TV with a Teletext decoder, as those were pretty pricey. My only chance to enjoy real Teletext was when visiting relatives. However, I had a BBC Computer, which had a Teletext mode known as Mode 7.
In Mode 7, you could put a control character at a particular point on the screen, and all character cells to the right of that cell would display block graphics in a particular color when filled with a character code 160 or higher. Each character cell was split into 6 pixels, for 64 possible combinations represented by characters 160 to 255. Something similar exists today in the form of the Unicode block-drawing characters.
The Mode 7 screen was 40×25, so your maximum graphical resolution was 78×75, with the left column of cells being used up switching on graphics mode. Even for the 1980s that was pretty awful, but it had the benefit that an entire screen of text was only 1000 bytes. On a system with just 32K of memory in total, where video RAM was taken out of total system RAM, that was a major benefit. Plus, it meant you could fit 100 pages on a floppy disk!
As well as the poor resolution, there were obviously limits on which colors could be next to other colors, and how complicated your shapes could be. But the biggest problem was that working out which character codes you needed to (say) draw a box on the screen was a major pain. So in 1984 I wrote a teletext graphics editor which would allow me to create my own Teletext pages, with an easier way to draw lines and shapes and select colors. It also eventually supported automatic generation of banner text, text written in pixels. (Which, because the pixels were so big, was huge.)
I shared the program with friends at school, where we had a lab of BBC Micros. I also sent a copy to Personal Computer World magazine (PCW), who (to my surprise) published it. (This was back in the days when computer magazines printed listings in BASIC which readers would type in for themselves, and also back in the days when PCW wasn’t all IBM PCs all the time.)
Encouraged by my unexpected success, I went on to develop my own network-aware Teletext information system for the school. Users could create as many pages as they wanted, stored in their network filespace. (An Amcom E-net system with a hard disk.) Each page could link to other pages by username and page number. You could select page links using cursor keys then hit a key to go to that page. It was basically a primitive multi-user hypertext system, like the web — but in 1985, 5 years before the first web browser. I had been inspired by reading about Ted Nelson’s IBM and Brown University hypertext project of the 1960s, so I don’t claim any genius level originality; hypertext was something that was going to happen, we were just waiting for networks to get good enough.
I didn’t stop with text and graphics, though. I extended the system to allow downloadable executable code, so you could use it to build a hypertextual menu of educational software which you could launch right from the menu screens.
For some reason, nobody else at school was as excited by the possibilities as I was. Eventually I got an Atari ST, went to university, and started building hypertext systems on Unix. The code for my network teletext system is long lost at this point. I don’t even have the floppy disks. I wish I had kept them, if for no other reason than to provide examples of prior art to fight patent trolls.
I finally got my own Teletext TV in 1990, and enjoyed it for a few years before emigrating to a country that had apparently never heard of anything that advanced.
For more examples of Teletext pages, see the Teletext museum. It’s now being rediscovered as an artistic medium.
]]>I know I’m going to get a ton of hate mail, but there is a hierarchy of usefulness of technology roles in an organization. Developer is at the top, followed by sysadmin and DBA. QA teams, “operations” people, release coordinators and the like are at the bottom of the totem pole. Why is it arranged like this?
Because each role can do the job of all roles below it if necessary.
This is, sadly, not the case.
I’ve known plenty of people who could crank out code, but who couldn’t maintain a stable and secure production environment to save their lives. In fact, earlier this year I got pulled in to fix up the security issues on a production system that had been installed and maintained by developers until flags were raised about its dozens of gaping security holes. Anyone who can write Java code in Eclipse on Windows can obviously set up a secure production Linux server with a J2EE stack, right? Wrong.
Similarly, managing a release requires project management skills which many developers simply do not have, not to mention some everyday human interaction skills which many of them have somehow managed not to develop. Even making sure the technical issues get dealt with can be a problem for developers. At one company I worked for, one of the development teams had been doing their own release management. It was eventually discovered that there was only one desktop PC that could actually assemble a release build of the company’s main product. Perhaps the developers theoretically could have implemented a system that allowed repeatable builds to be triggered on any machine, or at least documented the required build environment, but it didn’t look that way. Whether they lacked the skills or simply didn’t value the task — or quite possibly both — the outcome was the same, a disaster waiting to happen.
Then there’s QA. Developers love to view them as basically trained apes banging randomly on keyboards until something breaks, but testing software well requires a kind of malicious naïvety and inventiveness. You have to learn to think like an end user who knows nothing about the internals of the system. That’s a hard trick to pull off if you wrote it, which is why no matter how much you test your code yourself, someone will always find an ‘obvious’ and embarrassing bug.
More generally, the ’totem pole’ idea reflects a common geek mindset: “Anything I haven’t studied and don’t want to do, must obviously be trivial.” Ask mathematicians what they think of physicists, ask physicists what they think of chemists, and then ask chemists what they think of mathematicians.
I’ve found that most things get more and more complicated and interesting the more you examine them. I know next to nothing about sports, for example, but I know that there are people who spend hours on statistical analyses, tactical studies, research into biochemistry and sports medicine, the engineering of sporting equipment, and so on.
So if you’re inclined to think of documentation as pointless time-wasting anyone could do, or you think that product management is trivial, try studying them a bit. You might be surprised.
Yes, even marketing.
]]>Before you start dealing with file streams, see if Java NIO will do all the work for you. If you just need to read the lines of text from the file into memory, the Files
class has a convenience method readAllLines()
that will do the job as a one-liner.
If you need to do something more complicated, like obtain a file data stream you can parse to a parser, or parse files that might be bigger than available memory, then continue on…
Decide whether you want file input, file output, or files you can both read and write.
If you want input or output, you also have to decide whether you want the files to be binary data or text.
Direction | Type | What to obtain | How to obtain it |
---|---|---|---|
Input | Binary | InputStream |
Files.newInputStream(Path) |
Input | Text | Reader |
Files.newBufferedReader(Path) |
Output | Binary | OutputStream |
Files.newOutputStream(Path) |
Output | Text | Writer |
Files.newBufferedWriter(Path) |
Both | Binary | SeekableByteChannel |
Files.newByteChannel(Path, ...) |
These are the NIO (new IO) methods for Java 1.7, and they all take Path objects as arguments. To get a Path object for a file:
Path path = FileSystems.getDefault().getPath("file.txt");
So your code would look like:
Path path = FileSystems.getDefault().getPath("file.txt");
Reader rdr = Files.newBufferedReader(path);
Note use of the abstract Reader
class. The object will actually be a BufferedReader
wrapping a FileReader
wrapping an InputStream
, but by using just the abstract type the code can be reused with any kind of reader.
You can also use the older Java I/O classes, which are your only choice for Java older than 1.7:
Direction | Type | What to obtain | How to obtain it |
---|---|---|---|
Input | Binary | InputStream |
new InputStream(File) |
Input | Text | Reader |
new BufferedReader(new InputStreamReader(new InputStream(File))) |
Output | Binary | OutputStream |
new OutputStream(File) |
Output | Text | Writer |
new BufferedWriter(new OutputStreamWriter(new OutputStream(File))) |
Both | Binary | RandomAccessFile |
new RandomAccessFile(File) |
For the older I/O classes, you need a File
object. If you’re using a current version of Java, you can turn a Path
object into a file object by calling its getFile()
method. Or for any version of Java, you can construct a File
object from a path string via its normal constructor.
As you might have noticed, getting Reader
and Writer
objects in old I/O is a pain. You might be tempted to use the FileReader
and FileWriter
classes, which make it convenient to go straight from a File
to a Reader
or Writer
. Unfortunately, FileReader
and FileWriter
use the default character encoding for the Java platform; and sadly, that’s the legacy UTF-16 encoding, which is almost certainly not what you want unless you’re running Windows, and possibly not even then. Yes, you can change the encoding, but that requires a command-line argument to the Java VM at run time, which is ugly.
If you have a SeekableByteChannel or RandomAccessFile, it’s going to be binary and unbuffered and rather inconvenient, so you can stop reading at this point.
If you have a binary InputStream
or OutputStream
at this point, decide if you want buffering for your binary data.
If you do, wrap your Stream in a BufferedOutputStream
or BufferedInputStream
as appropriate.
If you have a text Reader
or Writer
, Java NIO will have automatically given you a buffered one. That’s the right decision, so I included the buffering in the old I/O table for step 1. You really don’t want to do unbuffered text I/O, particularly not with UTF-8 text.
If you have a Writer
, you’ll probably want to wrap it in a PrintWriter
to make life easier.
If you have a Reader
, you’ll probably want to parse the input in some way. I recommend investigating the Scanner
class.
Here’s a list of some I/O classes to avoid using, why you should avoid them, and what to use instead:
Class | Why you shouldn’t use it | What to use instead |
---|---|---|
StringBufferInputStream | It doesn’t convert characters properly. | StringReader |
LineNumberInputStream | It doesn’t convert characters properly. | LineNumberReader |
FileReader | It reads UTF-16. | InputStreamReader |
FileWriter | It writes UTF-16. | OutputStreamWriter |
I’m not sure I’d go that far, but I will confess that I don’t write unit tests for all, or even most of my code. My view is that there are certain problem domains that are complicated, or where programmers are highly prone to error. Off the top of my head, some examples:
That’s definitely not an exhaustive list. But if you’re writing code that does something on that list, then I think you absolutely should have bunch of unit tests. In fact, just this week I wrote a couple of methods for performing some date calculations in Java, and thoroughly unit-tested them with as many pathological examples as I could find on the calendar.
But in big companies, a lot of code is fairly mundane manipulation of data, and much of it is written to tight commercial constraints—meaning the company needs it done quickly, ideally yesterday. I tend to find that in those circumstances, the extra time spent to write code cleanly is a clear win — especially once the throwaway code becomes a permanent mission-critical system. The extra time to unit test everything, on the other hand, doesn’t really stand up to cost/benefit analysis.
]]>We needed to be able to send e-mail from the private network, via the corporate e-mail servers. I could set up an SMTP server on the 10 network easily enough, but there was no way it could initiate a connection to any other SMTP server. We couldn’t even initiate an SMTP connection inbound to it.
Colleagues suggested schemes involving POP3 or IMAP to pull down mail and re-send it. There was discussion whether we would be allowed to use a scripted SSH connection from a system running the VPN software, to trigger a temporary port tunnel and then prod the MTA to attempt delivery. However, I had a better idea, because I remembered how e-mail had been delivered in the 1970s through to the early 1990s, when all we had was dial-up connections.
The software in question is called UUCP, the Unix to Unix Copy Protocol. The great thing about UUCP is that all it needs is a way to send and receive bytes via a terminal-type connection. In fact, it doesn’t even need that; back when UUCP was in common use, there were quite a few networks or modems that only supported 7-bit data, so UUCP can escape data to make it work over 7-bit links too. UUCP also doesn’t need the connection to be reliable, or even there all the time—it can be triggered to perform data transfers in batch mode, then shut down.
I vaguely remembered that the free software Taylor UUCP had been given some sort of SSH support. Investigation revealed that protocol t was aimed at TCP/IP connections, where you can assume that the bytes you send will be received accurately at the other end. (Unless, of course, the entire connection fails for some reason, in which case UUCP will detect the problem and retry later.)
Taylor UUCP was still available prepackaged for RHEL, and Postfix still had all the hooks for UUCP support available. I set up the two servers, one on the corporate network and one on the 10 network. The corporate machine had been set up to allow SSH (and only SSH) connections to the 10 network. I set things up so that UUCP ran chrooted under a restricted ID, and I was also able to set the 10 network server’s UUCP user to have uucico
as login shell—so even if the ID is compromised, it can’t be used to attack either server.
So now Postfix accepts mail from machines on the isolated network, and passes it to UUCP. UUCP turns the mail into UUCP jobs, and queues them. A few times an hour, a cron job on the the internal network machine starts up uucico
, which calls SSH to open a connection to the uucico
on the isolated server. They then exchange all the pending UUCP jobs—bidirectionally—and disconnect. Finally, UUCP on both servers processes the batches it just received, and passes the e-mail on to Postfix. And all without even any port forwarding.
It’s worth mentioning that you could go a step further, and not even allow the SSH connection, if you ran a serial cable between the two servers. IBM Notes still supports serial connections too, allowing secure e-mail between machines that can’t route TCP/IP packets.
So if you ever find yourself in a similar bind because of firewall issues or corporate security policies, remember that we used to transfer files and send e-mail before there were even TCP/IP networks to firewall. There are tutorials for how to set up UUCP over SSH, and unlike any port forwarding kludge you’re likely to come up with, UUCP is time-tested software. It may be a little arcane, but it works.
]]>But first, a disclaimer: As with everything else on my web site, this represents my opinion, not IBM’s. I’ve done my best to be accurate, but inevitably I’ve had to skate over details here and there. You’re encouraged to follow the links and read more!
So, once upon a time, over 40 years ago, computers filled rooms, and everyone used dumb terminals for their computer access. These terminals were expensive to manufacture, being built from discrete components. They had complicated decoding logic which had gradually grown over the years, as the computers they were connected to had gained more features.
In those days, the basic unit of computation wasn’t necessarily an 8 bit byte. The popular PDP-8 used 12-bit words, with 7-bit ASCII characters that always had the 8th bit set to 1. The IBM System/360 used 8 bit bytes, but in the EBCDIC character set. Older IBM systems used 6-bit BCD coding, based on punched cards. The CDC Cyber range of mainframes used 12-bit bytes and 60-bit words, but 6-bit characters in CDC’s own display code. And each manufacturer had its own set of control codes for moving the cursor around the screen.
Because of all this variation, the Computer Terminal Corporation (CTC) decided to build a universal terminal which would have its own internal instruction set. You would then be able to load in a tape which would map from your choice of proprietary terminal control codes to the terminal’s internal set. In other words, it was going to be a hardware computer designed for running terminal emulators. It was to be called the Datapoint 2200.
This programmable terminal was designed to have a processor built from individual TTL components — small ICs with typically a dozen or so transistors on them, at most a few hundred. The most popular TTL chips were (and still are) the Texas Instruments (TI) 7400 series. But because of the number of components needed, the Datapoint 2200 was going to be expensive to build, heavy, and would need fans to keep all the circuit boards cool. Previous CTC terminals had suffered from major overheating problems.
In the early 1970s integrated circuits based on Large Scale Integration (LSI) were becoming commercially available. These new chips had thousands of transistors on them, allowing an entire circuit board of components to be condensed into a single chip. So CTC approached a startup company that was designing LSI microchips, and asked them if they could replace the two circuit boards of the Datapoint 2200’s processor design with a single chip.
That startup company was called Intel. They initially mostly made memory chips, but in 1971 they launched the first CPU on a chip, the 4-bit Intel 4004, which they had built for a Japanese company that used it to build desktop calculators.
A year later, Intel finished taking the 100 or so chips for the 2200 terminal emulator’s processor and condensing them down to a single IC design known as the 1201. TI manufactured some samples. The new chip wasn’t any good; the instruction set was buggy. Intel were convinced they could fix the bugs, but CTC weren’t prepared to wait, and launched the 2200 using the full-size CPU circuit boards instead.
Nevertheless, Intel had the 4004’s designer work on the 1201. The bugs were fixed, and the result was delivered to CTC in 1972. But by that time, CTC had upgraded their terminal design to a new model with a built-in hard disk drive, and the Intel chip wasn’t powerful enough to handle that. CTC decided to cancel the contract with Intel and walk away from the deal.
Intel decided to take the fixed 1201, rename it the 8008, and see if they could get anyone to buy it. A team at California State University Sacramento did, and built a small microcomputer around it, complete with BASIC language interpreter. Before long, a couple of commercial 8-bit personal computers were on sale with the Intel 8008 inside.
Based on the feedback from 8008 users, Intel added some new instructions which provided a way to use two 8-bit registers together as if they were a 16-bit register. They also expanded the stack pointer and program counter to 16-bit, allowing up to a 64K of RAM. Though it seems comical now, it was a huge amount at the time; in those days, an IBM mainframe only had 8K or 16K of high-speed storage, and the PDP-11 that UNIX was developed on only had 24K of RAM.
The new expanded 8008 was launched as the Intel 8080. It was used for a number of early personal computers, particularly the Altair 8800 which was sold as a kit through ads in electronics hobbyist magazines.
However, by 1976, Intel had competition. Federico Faggin, the technical genius who had led the project to fix the Intel 1201, had left Intel to form his own company, Zilog. They launched a CPU called the Z-80, cunningly designed to be backward compatible with Intel 8080 code. It had more instructions and registers than the 8080, to make programming easier; it even had two separate sets of registers you could swap between. It required fewer support components than Intel’s chip, and became wildly popular. The Z-80 was used in the Tandy TRS-80, launched in 1977, the first popular pre-built personal computer to be available in malls across America. (The Z-80 also later became part of the Sinclair ZX80, ZX81 and Spectrum computers.)
Meanwhile, Motorola had designed a minimalist 8-bit processor called the 6800, targeted at computer peripherals, test equipment, and terminals. A Motorola engineer named Chuck Peddle left the company and helped found MOS Technology; there, he designed an enhanced chip compatible with the 6800 instruction set. It was a radical, minimalist design for a general purpose computer, having a tiny instruction set and a mere 3 registers (compared to the 7 available on the 8080 and the two sets of 8 general purpose registers found in the Z-80). The new MOS CPU was known as the 6502. Its lean design meant that it could be manufactured and sold for a fraction of the price of any Intel processor; it launched for just $25, while the 8080 had launched at $360. The 6502 became the basis of the Apple II, BBC Micro, Atari 2600, Nintendo NES, Commodore VIC-20, Commodore 64, and many other home computers of the late 70s and early 80s. It has since come to be viewed as perhaps the first RISC design.
Intel were forced to fight back. They started designing the Intel iAPX 432, which would be their first 32 bit computer, and would ensure their technological dominance during the 1980s. It would be programmed in the US DoD-approved language of the future — Ada. It would even have built-in support for object-oriented programming.
In the mean time, Intel needed a stopgap solution to claw back some market share. They launched the 8085, which was an 8080 that needed fewer support components — it could talk directly to RAM chips and only needed a single voltage power supply. It wasn’t a big hit, though; it was pricey, and was only ever used in a few models of computer. However, one of those computers was the IBM System/23 Datamaster.
IBM had noticed businesses starting to use Apple II and TRS-80 computers for word processing, spreadsheets, and other business tasks. The microcomputers were starting to make it harder to sell IBM mainframes. In response, the company developed a series of standalone desktop business computers. The IBM 5100 for scientists, the IBM 5110 for handling accounting and ledgers, and the IBM Displaywriter for word processing.
This strategy of single-purpose computers was of limited success. Small businesses would have a hard time justifying purchase of a $10,000 IBM 5110 and a $7,800 Displaywriter when they could buy a TRS-80 for $2,500 and use it for both word processing and accounting. IBM realized it needed something a little more general purpose, but at the same time it didn’t want to launch anything that might destroy sales of high-end workstations or mainframes. The Intel 8085 seemed like a good choice of CPU; it was basically the guts of a souped-up dumb terminal, and was already inferior to the Z-80. It became the guts of the System/23 Datamaster, which offered both word processing and accounting at a new low price of $9,830 (including printer).
The Datamaster was IBM’s cheapest business computer ever, but it still failed to compete with the early 1980s home computers which were being repurposed as business machines. IBM decided it needed to compete directly with the likes of the TRS-80 Model III and Apple II Plus. A team was assembled to design and build an IBM PC.
Meanwhile, Intel had been having trouble with the iAPX 432. As a temporary stopgap measure, they put together another 8000 series CPU aimed at stealing back business from Zilog. While it was a ground-up redesign, Intel made it back-compatible with source code for the 8008, 8080 or 8085, so if you needed to you could reassemble your assembly language source code for your 8008 and run it on the new chip, called the 8086. Unlike the 8085, though, it had a proper 16 bit mode, and new instructions designed to help with implementation of modern (for the time) programming languages like Pascal. The basic design of the 8086 was thrown together in about 3 months, and in production just over 2 years later.
By this time, Z-80 based CP/M had become the OS of choice for serious business computing. It was the first OS to provide a certain amount of cross-platform compatibility; you could run CP/M applications like Wordstar or dBase on a TRS-80, a MITS Altair, a DEC, an Apple II with a Z-80 card added to it, or even the revolutionary Osbourne 1 portable computer.
At IBM, the PC team considered using IBM’s new RISC CPU, the 801. However, it was considered a mainframe-class processor, and using it what was intended to be a cheap personal computer would have raised political and technical hurdles. Some of the team had worked on the Datamaster, so the Intel 8000-series CPUs were familiar to them. The 8086 was squarely aimed at the Z-80, which meant it ought to be easy to get CP/M ported to it. It was so slow that it couldn’t possibly compete with IBM’s more expensive business systems. Best of all, IBM had already licensed the option of making 8086 chips itself.
The 8086-based IBM PC was to launch in 1981. Negotiations to have CP/M ready at launch hadn’t gone well. IBM’s CEO at the time was John Opel, who was on the board of United Way of America with Mary Gates. Mary apparently mentioned that her son Bill wrote microcomputer software; at the time, he was selling Microsoft BASIC to hobbyists. IBM asked Microsoft if they could supply an operating system similar to CP/M but for an 8086-based computer.
Microsoft didn’t have anything like that. So Bill Gates said yes, and then he and Paul Allen shopped around and eventually bought all rights to an unauthorized CP/M clone called 86-DOS from Seattle Computer Products, for $50,000. 86-DOS was quickly renamed MS-DOS. It was sold on a per-copy basis to IBM and shipped with the IBM PC as PC-DOS, but Microsoft made sure the contract was non-exclusive so they could sell it to other companies as well, at $50 per copy.
In the meantime, Intel had continued with its stopgap attempts to compete with the Z-80, and had produced a cut-down version of the 8086, called the 8088. It had the same registers and address space, but moved all the data across an 8-bit bus like the Z-80, instead of a 16-bit bus. This meant performance was worse, but you needed even less in the way of support circuits. The IBM PC eventually shipped with the 8088 inside.
A couple of years later, Intel’s iAPX was finally ready. It was so complicated that it wouldn’t fit on a single chip using the technology available at the time, so it was shipped on two separate chips — one to fetch and decode the instructions, and a second one to execute them. It was substantially slower than the Motorola 68000, launched four years earlier, and PCs started to appear based on the 68000.
Intel threw out the iAPX design, and went back to the 8086. They added a new way of handling memory addresses, so that it could deal with up to 16MB of RAM; however, it also had an 8086-compatible mode so you could keep running your DOS programs. The new 80286 also had memory protection options, and doubled the number of instructions executed per clock cycle. It used twice as many transistors as the 8086, and could barely be squeezed on a single IC, but it was faster than the iAPX, and was used in the IBM Personal Computer/AT — the AT standing for Advanced Technology.
It was only advanced for a few months, though. Later in 1984 the Macintosh launched, closely followed in 1985 by the Amiga and Atari ST. All three used the Motorola 68000, and all three made the text-based IBM PC look like a dumb terminal. Throwing high resolution graphics around would require a lot more RAM, so Intel expanded the 8086 design again, to a full 32 bit design that could compete with the 68000 series. This was called the 80386.
Of course, Intel knew it couldn’t keep bolting more features on the 8086 forever, so in 1984 it started designing two new RISC CPUs, called the i860 and i960. This time, the i960 would be the do-everything high end design aimed at government and military applications — and still to be programmed in the language of the future, Ada — and the i860, which would be a more manageable general purpose RISC CPU for desktop computers.
In the mean time, Motorola had launched the 68030, a full 32 bit CPU with all the memory protection and other features of the 80386. It started to appear in UNIX workstations like the NeXT cube, as well as the Macintosh and Amiga. Intel needed to compete, so in 1989 they launched both the new i860, and the 80486.
The 80486 added high speed on-chip cache like the 68030. Also like the 68030, it had a dedicated on-chip floating point processor. It was quickly incorporated into PC designs, while the i860 was quietly ignored.
At this point, Intel realized that the DOS and Windows based PC industry wasn’t going to be willing to migrate to a whole new architecture. So they set out to build a completely new CPU from the ground up — but one that would still be able to run legacy 8086 software.
Meanwhile, AMD had produced a chip that was code-compatible with the 80386. Unfortunately for Intel, the AMD Am386 was also faster than the 80386, and almost as fast as the 80486 because of its use of instruction pipelining and internal cache. It ran hot compared to Intel’s processor, but customers just put more fans in their computers.
For the next decade or so, Intel and AMD engaged in a technological battle. The Pentium, Pentium Pro, Pentium II and Pentium III were released, all based on adding incremental improvements to the old 80486 design and building from there. So the 80486 became the P5 microarchitecture, which became the P6 microarchitecture of the Pentium Pro.
But Intel still wanted a new, clean chip design. So in the late 90s, they started working on a new ground-up chip design called NetBurst. They went crazy with the pipelining and cache. Also, because people tended to evaluate competing PCs based on their CPU speed in MHz or GHz, Intel designed the new Pentium 4 so that they would eventually be able to up the clock speed to 10GHz.
At the same time, IBM’s POWER architecture and Sun’s SPARC were carving up the high-end server and workstation market. Intel again felt it needed something to compete. i860 had failed, so another new RISC project was started. The result was called Itanium, a true 64-bit CPU. It would even support the language of the future — Ada. This time, Microsoft was on board, and would port Windows to run on Itanium. Everyone was agreed that it would utterly destroy POWER and SPARC.
Pentium 4 launched in 2000 with saturation advertising. It soon seemed as though the “bing bong” Pentium jingle was a mandatory part of every TV ad break. Of course, the chips were much more expensive than AMD’s offerings — after all, someone has to pay for all that advertising — but the strategy seemed to work. People didn’t even notice that the Pentium 4 needed to run at a 15% higher clock speed just to match the performance of the old P6 Pentium chips.
The Pentium 4 wasn’t really suitable for laptops, though, because of its high power requirements. So as a stopgap, Intel went back to the old P6 Pentium Pro era design, made a few improvements, and started selling it as the Pentium M.
Itanium launched in 2001. Hardly anyone noticed, and it quickly got dubbed ‘Itanic’. Over the next four years it quietly sank.
By 2003, AMD was making significant gains in the high performance x86 server market. Intel tried launching a Pentium 4 Extreme Edition at $999 per chip, but the next week AMD launched Athlon 64. It was a true 64 bit x86, but it could still run 32 bit x86 code. So it was more compatible than Itanium, and faster and more expandable than Pentium 4.
In 2004, Intel released a Pentium 4 with an implementation of AMD’s 64 bit x86 instructions. It was another humiliating stopgap measure, but worse was to come. The chip’s architecture wasn’t working out the way it had been planned — instead of the planned 10GHz, the fastest it was ever pushed was 3.8GHz. Beyond that, it had major thermal problems, chewing and dissipating so much power that it was tough to keep the chip from melting.
The Pentium 4 wasn’t going to cut it against the competition. So Intel threw away both their new architectures, Pentium 4/NetBurst and Itanium, and went back to the old P6 Pentium Pro design yet again. This time they took the old Pentium M core, stuck two of them on a single chip, added more cache, and called the result Core Duo. The ’new’ architecture was called Core, and it was a success — the personal computer industry was moving towards laptops, and at the same time people were starting to demand smooth multitasking and multimedia, listening to MP3s or watching videos while editing documents. Core Duo’s comparatively low power requirements and dual cores made for a great laptop experience. By 2006 they had extended the Core architecture to add AMD’s 64 bit instructions, and Core 2 was born. Core i5 and Core i7 followed, and here we are.
So in summary: The Intel 1201 was designed to operate a dumb terminal, but didn’t work. It fixed up and condensed onto a single chip as the 8008. The 8008 was hacked to support 16 bit, and became the 8080. The 8080 was condensed onto fewer chips as the 8085. The 8086 was a quick hack designed to be a new 16 bit chip, but compatible with the 8085. The 8086 was temporarily hacked to support more memory as the 80286, given bolted-on 32 bit support as the 80386, then upgraded to become the 80486 and Pentium series. The 80486 became a stopgap CPU for laptops, the Pentium M. That then became the CPU core at the center of the Core series, which had a clone of AMD’s 64 bit support bolted on as the Core 2.
So to summarize the summary, we’re still running CPUs based on the design of a 1971 dumb terminal, with more and more hacks bolted on. Intel have desperately tried to replace the mess multiple times, but every fundamentally new CPU architecture they’ve come up with has been sub-par in some major respect, and failed.
How long can this last? Well, there are signs that the era of the x86 might finally be ending, as the personal computer era ends. More and more people use mobile phones and tablets as their primary computing devices, and Intel has failed to get any traction in that market space. Instead, all the popular devices are based on ARM, a pure and clean RISC design by Acorn, inspired by the classic 6502. Recently, the most popular laptop on Amazon has been the ChromeBook — running on ARM. Devices are starting to appear powered by the 64 bit ARM design released in late 2011. At the high end, IBM’s POWER architecture has reached clock speeds of up to 5GHz, and massive supercomputing projects like Watson are built on it. Intel is in the middle, and being squeezed from both sides. RISC may yet have the last laugh.
]]>You might think I’d use Rails. Once upon a time I thought I’d write applications in Rails too. I bought the book, wrote some toy apps, and set it aside until I next needed it.
In the meantime, Rails got bigger and bigger. It adopted controversial trendy libraries. Its limitations started to become apparent, and even the hipsters started abandoning it. When I looked at Rails 3, I decided it was no longer heading in a direction I wanted to go.
I’m not a big fan of frameworks anyway. “Why I Hate Frameworks” discusses part of the reason—they tend to result in too much abstraction. They also tend to result in lots of indirect external dependencies; your platform choices, for example, can end up being made for you. And then there’s the opinionated way they tend to dictate your other choices. To me, a framework is a library with delusions of grandeur; it’s like hiring a plumber because you need the sink fixed, and having him lecture you about how you should landscape your front yard.
That’s not to say that there isn’t a place for frameworks. Sometimes you are doing something that’s very complicated but standardized, like building a GUI application. Sometimes you’re a beginner, and you want someone to tell you what to do and how to do it. But mostly, frameworks annoy me.
There are a few exceptions, though. One is the framework I’ve been using for my trivial web application. It’s called Sinatra, and is not much more than a thin layer of syntactic sugar sprinkled on top of HTTP.
I don’t need widgets or a sophisticated templating system, as the look and feel has been supplied to me by the corporate web design team. I don’t even need a database, because my application just needs to pull data from one system and push/pull it with another, via REST APIs. The Ruby rest-client Gem makes that easy; it basically does for REST calls what Sinatra does for HTTP servers. So Sinatra gets the job done. Rails is a faster way to get CRUD apps started, but if you have a problem that isn’t CRUD, I’d suggest checking out Sinatra.
Ruby tends to lend itself to these kinds of domain-specific languages. For instance, here’s a function which implements locking using a temporary lock file and the POSIX flock call, to ensure exclusive access between multiple processes:
LOCK_FILE='/tmp/lockfile.tmp'
def exclusively
File.open(LOCK_FILE, File::CREAT|File::RDWR, 0644) do |file|
file.flock(File::LOCK_EX)
yield
file.flock(File::LOCK_UN)
end
end
You can then put a lock around some access to a resource like this:
puts "Entering exclusivity zone..."
exclusively do
puts "I'm in!"
sleep 20 # simulate access to slow resource
puts "Exiting exclusivity zone..."
end
For more elaborate lock file requirements, there’s the lockfile gem.
]]>Performance seemed really slow, so I decided to measure just how slow. I grabbed my stopwatch class, hacked it in, replaced the output database with a file, and looked at the results in seconds taken per thousand records read:
JPA 2:
81.54 s/krec
77.82 s/krec
81.89 s/krec
89.60 s/krec
Taking 81 seconds per thousand records might seem pretty good, but it meant my data pump was going to take over 5 hours each night. And that assumed multiple threads so that the writing part of the problem didn’t slow it down any further.
At this point, I could have done the Rails hipster thing, and said “Oh dear, relational databases are slow and useless, let’s use NoSQL”, but instead I decided to investigate further. To find out how much of the overhead was down to JPA 2, I hacked together some raw JDBC code to compare:
JDBC:
0.42 s/krec
0.42 s/krec
0.42 s/krec
0.42 s/krec
So, about 200x faster. Not looking good for JPA 2 and EclipseLink. I turned up the logging level, and took a look at the actual SQL being executed.
The actual column retrieval part of the query looked fine; exactly how I’d written it in the JDBC code, in fact. The problem is that it was wrapped in this:
SELECT * FROM (
SELECT * FROM (
SELECT EL_TEMP.*, ROWNUMBER() OVER() AS EL_ROWNM
FROM ( sensible query goes here ) AS EL_TEMP)
AS EL_TEMP2 WHERE EL_ROWNM <= ?)
AS EL_TEMP3 WHERE EL_ROWNM > ?
This was being done to chop the result table into reasonable slices. I was using a wrapper class which called JPA’s setFirstResult()
/setMaxResult()
to page through the results, cache them, and hand the objects back to my code. JPA was then generating the nested queries to implement this pagination. Timing the paginated query revealed that it was responsible for the slowdown.
I couldn’t eliminate paging; the dataset was just too big. So I tried altering the code to make the cache and page size much larger. That helped a bit, and allowed me to get the delay down to 4.21s/krec. Still 10x slower than raw JDBC, but 20x faster than it had been.
I looked around for some information about how to best do result pagination with DB2, and found a developerWorks article. I used the techniques from that article to put together a paged query equivalent to the one EclipseLink was generating, ran them both through DB2’s command-line performance measurement tool, and verified that both gave the same output.
The EclipseLink query was 3x faster than the new one.
Next, I decided to see how fast the DB2 command line client could dump the data. It was only slightly faster than the raw JDBC; 0.399 s/krec rather than 0.42.
I reached several conclusions.
Firstly, EclipseLink is pretty good from a performance point of view. Given the problem of issuing paginated queries against a database, chances are EclipseLink will do better than a random non-expert programmer writing raw SQL.
However, that might not be good enough. Pagination can be expensive, so if you’re batch-processing data you might be much better off avoiding ORM entirely. So like most things in software development, ORM is not always the right answer.
I’m also disappointed that there’s still no good way to use JPA to get your query results in Iterator form, so that pagination becomes unnecessary.
]]>This is handy, as it means you can distribute both the utility and the documentation in a single file, and symlink the script into the appropriate man page.
You can also, of course, look for the standard --help
command line option, and make your utility call man with its own source file as argument to provide the help.
Note that if your utility makes extensive use of command line options, you’ll probably want to use GetoptLong.
]]>Uncaught TypeError: Object [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object], has no method ‘delete’
Thanks, Chrome. The variable name would have sufficed.
]]>y = x % 1;
is equivalent to
y = x - Math.floor(x);
because ECMA-262 says:
… the floating-point remainder r from a dividend n and a divisor d is defined by the mathematical relation r = n − (d * q) where q is an integer that is negative only if n/d is negative and positive only if n/d is positive, and whose magnitude is as large as possible without exceeding the magnitude of the true mathematical quotient of n and d.
So if d is 1 then q is the largest integer that’s smaller than n/1, so r = n - Math.floor(n).
I will not be using this feature in my code.
]]>Here are some reasons.
It sounds trite, but TeX has a robustness and reliability that other software lacks. Recently, there was a discussion of a bug in LuaTeX triggered when a document hits 3,987 pages. I can’t imagine creating a document with over 1,000 complex pages using something like Microsoft Word or OpenOffice Writer.
If you’ve ever wondered why your word processor is formatting something a particular way, well, sometimes there’s really no way to answer the question. Microsoft’s OOXML spec resorted to defining many aspects of formatting by saying that they had to be the same as particular versions of Word; even Microsoft couldn’t specify what that behavior actually was. Pro typesetting software like Adobe InDesign isn’t much better when it comes to maintaining compatibility between versions.
With TeX, there are fewer ghosts in the machine. Even if the details underneath are pretty scary, at least you can always see exactly what macros you are applying to a piece of text and what parameters you have tweaked. Nothing is hidden, and there’s more of a feeling of being in control.
I’ve used a lot of word processors over the years. Wordwise, View, LocoScript, 1st Word Plus, MacWrite, WordStar, WordPerfect, AppleWorks, DisplayWrite, Word Pro, ClarisWorks… They’re all dead and gone now. Good luck reading any files you still have stored in their special file formats. In fact, there have been discussions on Macintouch recently about the problems of converting a large number of legacy ClarisWorks documents into something readable, and that product was only declared end-of-life in 2007. Yet if I still had a copy of my dissertation, I could typeset and print it today, because I wrote it in TeX.
ConTeXt is the newest and most rapidly changing TeX macro package. Even so, switching from a 2 year old version to today’s bleeding-edge beta is mostly a non-event.
Using TeX, I can produce finished documents on Mac, Windows, Linux, or any Unix-like OS. I don’t need a high-end CPU, I don’t even need a high resolution display.
TeX files are just UTF-8 text files. Edit them with TextWrangler, vim, Emacs, jEdit, pretty much any text editor you want. It helps to have syntax highlighting, but it’s not essential.
Edit on your laptop, on someone else’s laptop, on a tablet, on your phone. Anywhere there’s a text editor.
Put your documents in version control. They’re just text, so you can diff them. They’re small and compress really well, so you can keep every version forever and not run out of space. Store them on a remote server and edit via SSH. Transfer them in a fraction of a second across a modern network connection.
I use the word ‘cheap’ rather than ‘free’ because really, I’m not at all averse to paying for software. However, my budget is limited.
There are some solid proprietary page layout systems that can deal robustly with large documents. Adobe FrameMaker, for example. MSRP for that is $999. If you want something a bit cheaper, there’s InDesign, for $699. Both are a bit pricey for my liking.
Or there’s TeX, which is where FrameMaker got its typesetting algorithms from anyway. Cost: $0. I think it’s something of a bargain.
Recently, a lot of writers have been discovering the joy of ultra-minimalistic tools which get out of the way and let you focus on the actual writing. There are programs like WriteRoom, DarkRoom, jDarkRoom, Byword, OmmWriter, and so on, which attempt to remove distractions and give you just text on an otherwise empty screen. Then there are tools like Markdown, reStructuredText and wiki syntax, markup languages which allow you to edit plain text with minimal annotations and convert it into something with pretty formatting later on.
All of these tools are rediscovering something that was lost when we moved from word processors like WordPerfect, which used plain text with embedded formatting commands, and entered the WYSIWYG world of the 90s with its endless buttons, sliders, ribbons and rulers.
Being a markup language, TeX has that minimalist essence. It’s just text.
Sometimes I’m writing some documentation, and I suddenly realize I’ll have to repeatedly refer to some kind of thing — a set of menu entries, say — and format them in a special way throughout the text.
In a normal word processor, I stop writing. I decide whether I need a paragraph style or a text style. I decide what existing style it needs to be based on. I click around to create my new style and give it a name. I adjust its formatting to be distinctive so I can see what I’m editing. I map it to a key. And then I try to remember what I was about to write before my multi-minute jaunt into wordprocessorland.
In TeX, I make up an imaginary macro off the top of my head — \Menu{File — Quit}
— and keep on writing. After I’m done writing whatever was in my head, then I worry about extraneous details such as how it’ll look on the page and whether it needs to be based on the look of anything else. Even if I decide I need a more elaborate calling pattern — say, if the macro needs some arguments — it doesn’t matter, because the plain text nature of my made-up-on-the-spot macro makes it easy to find and replace.
Obviously, actually applying the formatting is harder than with a word processor. But once I’ve done it once, I can re-use that magic incantation endlessly.
One place a surprising number of word processors fall down is incorporating images into the text. Somehow it seems to be acceptable to offer fonts that are infinitely scalable vectors, and then fail to offer any reliable way to insert and adjust a vector diagram. Want to bring your process diagram into OpenOffice? Sorry, can’t read the SVG, why don’t you just draw it again with the horrible OpenOffice Draw?
In business, it’s not at all unusual to receive documents that have diagrams as horrible resized bitmaps, or tables as embedded objects that look like crap, for pretty much this reason.
With TeX, I can use any vector graphics tool I like, and there’s almost always a way to get PDF out — either directly, or via a print-to-PDF driver. Once I have PDF, it’s painless to place it onto the page as vectors. So like everything else, it scales correctly for the display. The only things in my documents that end up as ugly bitmaps are the screen captures, and that’s the way it should be.
There are more reasons if you’re a mathematician or scientist. I’m just listing reasons why a person who wants to write ordinary everyday business and personal documents might choose to use TeX to do so.
Nevertheless, I would absolutely use something more modern that met the same basic requirements. If you know of a word processor that runs on any modern computer, can deal with 5,000 page documents, imports vector art in standard formats and handles it in vector form, writes to tagged PDF with working hyperlinks, is guaranteed to be around and supported for the next 40 years, will let me edit without having to learn a new user interface or have anything but my text on the screen, has output the quality of FrameMaker or InDesign, is arbitrarily extensible, and costs under $100, do please let me know.
]]>TeX, on the other hand, had METAFONT — a system for defining vector fonts, which could then be rasterized at any size, for any printer that could handle a bitmap image. So for years, TeX was the only way a lot of people could create documents using a wide range of typeface sizes and styles.
Then around 1990, Adobe Type Manager (ATM) went on the market. If you installed it on your Macintosh or Microsoft Windows system, you could use PostScript fonts on your desktop at any size, and ATM would handle the rasterization for your screen — and for your printer, if it wasn’t a Postscript printer. Suddenly you could have any font size you wanted, anywhere.
Fast forward 20 years, and vector fonts are ubiquitous; even my phone has vector fonts. Type 1, TrueType, OpenType… but not METAFONT. Like Plain TeX, METAFONT was way too complicated for most graphic designers. So nowadays, like everyone else, I want to be able to use the dozens of attractive vector typefaces installed on my computer, rather than the rather anemic selection of METAFONT fonts.
Then there was TeX’s way of getting output to the printer. First you ran your source TeX file through TeX itself; that spat out a DVI file with just the typeset text. Then you took the DVI file, and ran it through a converter which would insert the diagrams from their files, embed a set of fonts generated by METAFONT, and output a PostScript file. Then you either sent your PostScript file to your PostScript printer, if you were lucky enough to have one; or else you ran the PostScript through another converter (such as GhostScript) to turn it into your printer’s proprietary printer dump format (such as ESC/P or PCL, and copied that to your printer. It was all tedious but necessary in 1990. Today, it looks insane.
Then there’s the whole character encoding issue. Back in 1990, the only thing that could reliably be translated between different computer systems was plain ASCII text, so TeX had macros for curly quotes, accented characters, typographic dashes, dingbats, non-Roman alphabets, and so on. Nowadays, every major OS supports Unicode, so you can put Cyrillic or Greek text in your document on the Mac and be reasonably confident it’ll still be there if you open the file in Linux.
So TeX has been adapted to the modern world. pdfTeX got rid of the DVI stage, and went straight from TeX source files to PDF files; it later added support for microtypography. XeTeX got rid of METAFONT fonts, and allowed direct access to OS-installed vector fonts; it also added Unicode support. LuaTeX extended pdfTeX by adding Lua as an extra scripting language, allowing more complicated functionality to be supported than was feasible with TeX macros, and also added OS font support. LuaTeX has now been adopted as the successor to pdfTeX.
Since I wanted the most up-to-date Unicode and font support, I set about trying ConTeXt on Linux and Mac, using both LuaTeX and XeTeX. I created a document using a custom set of fonts — one OpenType, one Type 1, and one TrueType.
XeTeX was far easier to set up. Whereas LuaTeX requires that you set up environment variables and run a script to scan your fonts, XeTeX just calls the OS font routines. However, LuaTeX gets points because it’s the only supported engine for the latest ConTeXt release, Mark IV.
On the minus side, XeTeX on Linux is a bit buggier than on the Mac, as it’s a recent port; I had trouble with some Type 1 fonts. LuaTeX is not without bugs either: the ConTeXt mailing list recently discussed a bug triggered by a 5,000 page document, which caused LuaTeX to crash on page 3,987. That bug is fixed in a recent beta. Meanwhile, if anyone wants to try assembling a 4,000 page document in Microsoft Office, I’d be interested to know if it’s possible.
Since my documents are well under 3,987 pages, I’ve been happy with LuaTeX so far. So I had picked my TeX platform: ConTeXt running on LuaTeX. Now I had to sort out my macros and other necessary tools.
]]>TeX is two layers of software. Underneath is the core of TeX, written in a variant of Pascal. These days it gets translated to C before being compiled to a binary. On top of the TeX core, you have a set of macros which provide all the handy commands
you use to typeset documents.
There are a number of different macro packages. Knuth’s own is known as “Plain TeX”; it’s what I used back when I last wrote TeX documents. It’s extremely flexible, and I managed to make it format my dissertation in a way that was so un-TeX-like that the examiners asked what I used. Unfortunately, Plain TeX is rather ugly to use. For example, here’s the code of a Plain TeX macro for placing two pieces of text side by side:
\def\xsplit#1#2#3#4#5{{
\setbox1=\vbox{\hsize= #1 #4} % First column
\setbox2=\vbox{\hsize= #3 #5} % Second column
% If the boxes are output side by side at this point, they
% will be aligned at the bottom instead of the top.
\ifdim\ht2>\ht1 % Column two is longer than column one
% Fill bottom of column one with glue
\setbox1=\vbox to \ht2{\hsize= #1 #4 \vfill}
\else
\ifdim\ht1>\ht2 % Column one is longer than column two
% Fill bottom of column two with glue
\setbox2=\vbox to \ht1{\hsize= #3 #5 \vfill}
\fi
\fi
\hbox{\box1\hskip#2\box2}}}
I remember spending most of a summer vacation poring over a copy of Knuth’s The TeXbook, slowly assembling my own set of macros. I based my page layout on that of Apple’s Macintosh Human Interface Guidelines. I had found the book particularly readable, and reasoned that Apple probably knew what it was doing as far as designing page layouts for technical documentation. Bending TeX to my will wasn’t much fun.
It’s a problem lots of people had. So to get away from all that, Leslie Lamport wrote an alternate set of macros called LaTeX. He provided standard templates for letters, technical articles, reports, books, and overhead projector slides. (Kids: Ask an old person what an overhead projector was.) LaTeX also had macros for bibliographies, tabular data, simple diagrams, indexes, and pretty much everything else needed for academic documents. Its book — LaTeX: A Document Preparation System — was about half the thickness of Knuth’s book, more focused on end users, and came with a handy quick reference card.
LaTeX spread through academia faster than Far Side cartoons. It was particularly popular with mathematicians, physicists, chemists, computer scientists, and anyone else who needed to be able to typeset mathematical equations. (It was also about the only way to typeset Tibetan in 1990, which led to my helping out some humanities students.) LaTeX is probably the most popular TeX macro package. There’s even a full graphical editor for LaTeX, so you can avoid the markup language entirely. If you have a mental image of what a TeX document looks like, chances are it’s the look of a standard LaTeX template.
I had toyed with returning to TeX a year or so ago, and picked up a copy of Lamport’s book in order to try out LaTeX. There’s a problem with LaTeX, though. If you don’t care about page design, it makes it really easy to put together a document that looks exactly as specified by its templates; but if you want to design your own page layout from scratch, you quickly enter a world of pain. That’s probably why almost all LaTeX documents look alike.
So I did some more research, and decided to try a newcomer to the TeX package market: ConTeXt. Its development started in 1990; it attempts to make TeX behave in ways familiar to people used to modern DTP packages. It’s more flexible than LaTeX, yet easier to use than Plain TeX. Here’s how you put two paragraphs side by side in ConTeXt:
\defineparagraphs[sidebyside][n=2]
\setupparagraphs[sidebyside][1][width=.45textwidth]
\setupparagraphs[sidebyside][2][width=.45textwidth]
\startsidebyside
First paragraph goes here.
\sidebyside
Second paragraph goes here.
\stopsidebyside
It’s still a bit long-winded, but that’s because it’s completely general. The first chunk is the setup; you can adjust the number of columns, define each column’s width differently, give different columns different text styles, and so on, and give each setup its own name and pair of macros
to apply to your paragraphs, as in the second chunk of text.
So, I had chosen a macro package; but there were more decisions to make…
]]>It started with a simple enough task: I had some existing documentation, and I needed to extend the “How to perform common tasks” section. There were two sub-headings to add, each of which needed a few bulleted paragraphs of instructions.
I fired up LibreOffice, opened the document, and started typing — but something was wrong. When I clicked to turn my instructions into a bulleted list, the indentation was wrong. It didn’t match the similar bulleted lists above or below in the document.
I assumed I had mixed up the styles somehow, perhaps applying bullets to the wrong paragraph styles, so I checked the style of the correct paragraphs, then applied that style to the new ones. Still wrong.
Perhaps my correct paragraphs had somehow been modified? I selected both the correct and incorrect ones, and applied the style they were supposed to be in. They stayed determinedly different.
I tried fiddling with the rules manually, then re-applying the styles. The rulers reset to the tabbing defined in the styles, but the text continued to be indented incorrectly. I even tried creating a whole new style, in case the style definition had somehow become corrupt. Still the newly-type text would not match indents with the existing paragraphs.
Finally, I pondered the possibility that LibreOffice had a pretty severe formatting bug. I opened the document in IBM Lotus Symphony, which was forked from OpenOffice 1.x and (of course) reads and writes standard ODF documents. Hurrah! My paragraphs were all indented correctly!
I thought I was done. I made a few last edits, went to the top of the document, and refreshed the table of contents.
Oh dear.
Now my table of contents had garbage in, entries with a page number and no text, pointing at pages that had no headings on (including the front cover).
I cussed, saved the document, and quit Symphony. I opened the document in LibreOffice again, and refreshed the table of contents — now it looked right once more. I nervously scrolled down, and was somewhat surprised to discover that my paragraphs were all indented correctly. I hurriedly saved out a PDF before anything else went wrong.
So it was that I was forcibly reminded of how much I hate office suites, and WYSIWYG word processing in general. That well-known commercial office suite from Redmond is no better than the ODF gang; a few quick searches will unearth countless tales of horror involving corrupt files, misnumbered pages, font problems, and so on.
Meanwhile, I had another piece of documentation to start writing, and the thought of doing so in LibreOffice now filled me with a mixture of rage and dread.
I’m not a luddite, I quite like GUI software when it works. On the Mac, Apple’s iWork suite does a good job, at least with the kind of short document I find myself writing. However, I don’t have a Mac at work, so that wasn’t an option. So I returned to a piece of software I hadn’t used in 20 years: TeX.
Madness? Perhaps. I appreciate that TeX is perhaps not a tool for everyone; though it’s really no worse than word processors of the 80s, which ordinary people nevertheless learned to use. I suspect that most people just don’t have the patience any more. They don’t want to sit down and learn something, which is why we have Microsoft Word, and why hardly any office documents even use the style system properly. Which, I suppose, is how a major formatting bug gets into released code without being noticed.
It turns out that quite a lot has happened in the TeX world in the last couple of decades. Obviously computers have gotten a lot faster; I remember watching my 8MHz Atari ST grinding away at my dissertation. As I recall, it took about a second per page to typeset my text to DVI; I would then wait about 2 seconds per page to flip through and inspect the pages in the DVI viewer.
These days the console output scrolls past too quickly to read. TeX dumps its output directly to PDF, which loads instantly into an integrated PDF viewer, which of course supports antialiased text. Plus, my screen has four times the resolution. So even before considering software features, it’s a very different experience these days.
So what of the software? Modern TeX supports OpenType fonts, and accepts diagrams in PDF or bitmap format. It handles Unicode, so you don’t need to use escapes for accented letters and other special characters. It can place hypertext links in your PDF documents, and use color and other effects. And yet, TeX documents written in 1980 are not merely readable today; they can still be typeset into beautiful documents.
So from my point of view TeX has a lot of advantages:
If necessary, I can always turn my TeX into HTML, RTF, ODF, or whatever. But for now, I’m going to try writing new documents in TeX.
Want to see what TeX can do these days? Try some pages from a German chess book typeset with it, or take a look at sample spreads from humanities books typeset with TeX.
]]>The vexed question: Is Java pass by value, or pass by reference?
My short answer, carefully phrased: Java passes both object references and primitive types by value.
I’ll now unpack that answer and explain it, and talk about why there’s so much confusion. To do so, I need to go back to the very early days of programming…
Once general purpose computers were built, it was quickly discovered that it would be really useful if you could call a chunk of code multiple times, from multiple places.
For instance, most early CPUs (including popular 8 bit CPUs like the Z80 and 6502) didn’t have a general purpose multiply instructions. It was very useful to be able to write a piece of code that would multiply two numbers together and return the result. Because you’d want to multiply numbers a lot from different parts of your program, there needed to be a standard way for the code to know where to jump to when it was done computing the answer for you.
In the early days, this was done in one of two ways. One was to have the caller put a return address into a register. When the multiply routine was done, it would jump to whatever address was in that register, resuming the main program, which could then use the result of the multiplication.
The PDP-8 used a slightly different approach: the instruction to call a subroutine would store the return address in a reserved piece of memory immediately before the subroutine. The subroutine code would be called, would find the return address immediately before it in memory, and could therefore jump back there.
The problem, of course, is that soon programmers wanted their subroutines to be able to call other subroutines, or even call themselves. Both of these methods of passing return addresses failed to allow that. The PDP-8 approach was also ugly because it meant you couldn’t store subroutines in ROM.
So by the time of the PDP-11 and 6800, the call stack had been invented. With a call stack, when you called your multiply subroutine the address of the next instruction after the call was placed on the call stack automatically by the CALL instruction. The processor then jumped to the subroutine code. The code did its multiplication and put the result in a register as before, then pulled the address off the top of the stack and called it — hence jumping the CPU back to the point immediately after the CALL instruction. Popular 8-bit home computer CPUs including the 6502 and Z80 had hardware call stack support.
This was a big improvement, but still not ideal. The problem was, the subroutines and the code that called them had to agree on which registers to use for which values before the call, and which registers to find the results in after the call.
So later CPUs, like the 68000, added instructions to make it easy to push additional values onto the stack — not just return addresses. Now to call your multiply subroutine you’d push the two values onto the stack and CALL the subroutine. The subroutine would pull two values from the stack, multiply them, and push the result onto the stack before returning. Your code would then pull the result from the stack and use it. The combination of return address and subroutine parameters was called a stack frame. This call model was the one used on the PDP-11, and hence became the call model for the C programming language.
Now, stack-based parameters work well, but there’s a potential problem once you start dealing with values that can occupy a large amount of memory — such as strings. How do you pass a string to a subroutine?
This problem had been tackled before the invention of call stacks, of course. The general answer was that rather than copying the data around everywhere, you would just pass around a pointer to the start of the string. Not only is it a lot more efficient, you also don’t use up as much stack space — and in early computers, stack space was a significant concern.
So in most languages, when you passed an integer and a string to a function, what actually happened was that the value of the integer was placed on the stack; but a pointer to the string was placed on the stack, rather than the value of all the characters in the string itself. These two approaches were named ‘call by value’ and ‘call by reference’, respectively. (For example, see “Semantic Models of Parameter Passing”, Richard E. Fairley, 1973.)
When object-oriented programming was invented, the exact same issue applied to objects. The value of an entire object, with all its methods and fields, could be very large. Rather than placing the actual value of the object on the stack, instead a reference to the object was placed on the stack.
For now, though, let’s stick with strings. When you pass a string to a C function, the function may be able to change the text of the string, but it can’t change the fact that your string variable will still be looking for the string at the same memory address once the subroutine returns.
So at the language level, function parameters are conceptually passed by value, even though those values may actually be the addresses of an object or string. Hence K&R states:
One aspect of C functions may be unfamiliar to programmers who are used to some other languages, particularly Fortran. In C, all function arguments are passed “by value”. This means that the called function is given the values of its arguments in temporary variables rather than the originals.
This description is clear for C, because C always makes pointers visible. Your string is manipulated using a char *
pointer variable. It’s handed to the subroutine as a char *
. It’s clear that the subroutine gets a copy of that pointer, not access to your pointer. Whatever the subroutine does, the caller’s pointer variable will still point to the same address when the subroutine is done.
Java handles parameter passing the same way in the JVM. Primitive values such as ints have their values placed on the stack; objects have the a copy of the address of the object placed on the stack. The called code can mess with the contents of the object, but when it returns, your variable will still point at the same object. That is, strictly speaking in Java the value of a variable is which object it refers to, not the value of the object.
This is, of course, a highly unusual definition of “value”.
James Gosling goes to the bank and finds all the money has been emptied from his account. When he complains, the teller says “But your account has the same value it had yesterday — your account number still references the exact same account!”
Another reason why people get confused about Java’s call semantics is that you never actually see the addresses of objects. Unlike in C, where you can mess with pointers (and indeed have to), Java hides that detail of the machine-level implementation. You’re simply not allowed to access the JVM’s pointers in the high level Java language. Variable names which are bound to the address of an object are automatically and invisibly dereferenced to the referenced object when necessary.
As I put it in my earlier article about final parameters, “The Java Thing a
is really like C++ Thing *a
, and Java’s a.method()
is really C++’s a->method()
, not a.method()
.”
So in Java, you don’t need to explicitly dereference your object variable in order to call one of its methods; the compiler does it for you.
So when you pass a bunch of variables to a method, Java passes both the variables which were assigned object values and the variables which were assigned primitive values by value; it makes copies of all of them to give to the method code. It’s just that the variables which were assigned object values actually contain references to the object data, which will be automatically dereferenced when necessary. Most of the time this is very convenient, and it makes the language a lot safer by prohibiting various kinds of misbehavior with pointers.
So in Java,
int x = 3;
String y = "three";
are two totally different operations. They only look like they’re the same concept because of some syntactic sleight-of-hand performed by the compiler.
If someone asks you what the value of y is after the above code runs, you might say that it’s the string “three”. In fact, though, the true value of y is a reference to the string “three”. You just never see that true value.
In C, on the other hand, the difference is explicit:
int x = 3;
char *y = "three";
The asterisk is the clue that two different sorts of things are being done here.
This distinction also comes up when considering the ==
operator. Java’s ==
compares the true underlying value of variables; x == y
only if x and y are both references to the exact same object. To see if x and y have the same object value when dereferenced, you need to use a method such as equals()
or compareTo()
. Worse still, the default implementation of equals()
is ==
, unless the class overrides it.
So what would call-by-reference look like? Well, in C we can do this:
int x = 3;
f(&x);
This calls the function f, passing it a reference (the address of) the value in the variable x. The function is thus able to change the value in x, as it is then seen in the calling code.
This brings up a second reason why the discussion on Reddit is a trainwreck. People say that a language is call-by-value when a subroutine cannot change the value of a variable passed as a parameter. Unfortunately, because Java invisibly dereferences your object pointer variables, that’s not really true:
static void f(StringBuffer o) {
o.delete(0, o.length());
o.append("This is impossible.");
}
public static void main(String[] args) {
StringBuffer x = new StringBuffer("Java is call by value.");
f(x);
System.out.println(x);
}
In the above code, f()
clearly changes what we’d colloquially call the “value of x”, as we see it at Java language level. However, it does so without changing which object x points to. To rephase in Java language specification terms, before the method call x is a reference to a string object which has the value “Java is call by value.” After the method call, x is still the same reference to the same string object, which now has the value “This is impossible.” But again, you never see the references which are being compared, so the situation is confusing.
Or as I wrote in Java annoyance: final parameters:
It seems that the designers of Java wanted to make sure nobody confused their object pointers with the evil manipulable pointers of C and C++, so they decided to call them references instead. So Java books usually say that objects are stored as references, leading people to think that objects are passed by reference. They’re not, they’re passed by value, it’s just that the value [on the stack at JVM level] is a pointer.
If you’re still confused at this point… well, I can’t say I blame you. We should probably both blame James Gosling.
]]>Superficially, multi-line comments seem like a handy thing to have, especially if you often comment out blocks of code temporarily during development. However, as soon as you allow multi-line comments, you need to decide what to do about nested multi-line comments — that is, what should happen when the ‘start comment’ sequence occurs inside a comment.
Option 1 is to allow it and make it not special. This is bad because it makes it very easy to end up with a run-away comment:
/* set variables
param = 0
/* set maximum size */
maxs = 100
Here ‘param’ ends up unset. This kind of bug can be really hard to locate.
Option 2 is to allow the start-comment sequence within a comment, and make comment pairs stack, so you need as many end-comment sequences as start-comment sequences. This at least makes errors like the above cause a compile-time failure, but it still makes them a pain to locate; you end up having to use something like vim’s ‘%’ command to play match-the-comment-delimiters until you spot which pair is mismatched. That’s no fun, especially once your comments contain pages of documentation. Ask a Java programmer…
Option 3 is to disallow nested comments, in which case multi-line comments can’t be used to comment out blocks of code which contain multi-line comments. That makes them pretty useless for temporary disabling of code; it would also cripple their use in rdoc.
Single-line comments have none of the above problems. You always know the comment will end when the next line starts. You can always comment out a block of code, no matter how many comments or comments-within-comments it contains. You never get runaway comments, and you never have to match up comment delimiters. The only downside to single-line comments is that it’s harder to comment out and un-comment-out multiple lines. That’s easily solved by using any good text editor.
All of which is why C++ introduced single-line comments, and C99 added them to regular C.
So in short, multi-line comments have no significant advantages, make parsing harder, and make errors more likely. I think Matz was entirely right not to put them in Ruby. As a rule, the only place I use multi-line comments is for JavaDoc, and that’s only because I have to.
]]>About a year later I upgraded to an Atari ST, with an 8MHz 68000 and the ability to handle 320x200 resolution in 16 colors. I had Computer Concepts’ FaST BASIC, a version of BBC BASIC on a cartridge for the ST, and soon had fractals rendering on the Atari. I built a fairly elaborate menu and dialog system for adjusting parameters, and added mouse-based selection and zooming. Rendering speed was improved, but it still took a few minutes to put together a complete image.
The trend continued over the next decade: every time I got access to a newer and faster computer, one of the first things I would do would be to see how quickly it could draw fractals. This continued until the mid 90s, at which point I could move my mouse around the Mandelbrot set and see the corresponding Julia sets drawn in real time.
By then, fractal imagery had become commonplace on album covers and posters. My fascination waned. Yet Benoît Mandelbrot’s influence stayed with me; his fractal geometry had changed my view of the world. One single simple equation, simple enough for a teenager to understand — and yet, hidden inside it was infinite complexity. Simple iteration could generate images that looked like objects from the natural world. Entire universes could be encoded in a tiny algorithm, if you were clever enough.
]]>Good product design starts with a single problem. The product becomes a way to solve that problem. Technologies are developed and added strictly as needed to solve the core problem. Keeping that focus is perhaps the hardest part of product design; it’s always tempting to add one more feature or option, often simply because it’s technically easy to do so.
Another way of looking at it is that there’s a circular flow:
I’m simplifying; in reality there’s continuous information flow between the stages. They’re all important, though — for instance, if you marginalize design and let marketing dictate directly to product management and engineering, you can get a disaster like Buzz.
With engineering-driven development, the flow is linear, and rather different:
A special case of the above is when engineers decide that an existing product needs to be reimplemented using a new and more fashionable programming language or software technology.
I’ve seen quite a lot of engineering-driven development over the years. Without naming names, there’s a division at IBM that has a long history of it; you might be able to think of a few of their dead technology-driven products.
It’s not that engineering-driven development results in products that are useless. They usually do perform useful functions, it’s just that they don’t do any of them particularly well — or at least, not as well as products that were actually designed to solve specific problems.
So it was with Google Wave. It was a pretty good way to collaborate on documents, but it was much more complicated and harder to learn than Google Docs. It was a neat tool for D&D games, but not as good as (say) Fantasy Grounds. It was a replacement for e-mail, except nobody really wanted their e-mail replaced. It could help you plan when to have a meeting, except it was much more complicated than (say) Presdo. It was a good way to chat with friends, except it didn’t connect to XMPP and you had to log in to it. It was a useful tool for project planning, but lacked the focus of (say) BaseCamp. The live simultaneous updates feature was cool for working on a document, but the UI was much more cluttered and intrusive than Google’s own EtherPad.
The last of these comparisons indicates another sad feature of engineering-driven products: they often result in better products being killed, because the older narrowly-focused product with last year’s technology is perceived as competing with the big new project with all the cool stuff in it.
If you ask an engineer why an engineering-driven product failed, they’ll usually say that it’s all because marketing failed to communicate the product’s awesomeness to the world. They understand the product, and they’re sure that if only everyone else could understand it, they’d love it too. They vow to make the next product even more awesome, by sticking even more disparate technologies into it. And so the cycle continues for as long as engineers are allowed to design products, until the company goes bust.
There are rumors that Google are working on a competitor to Facebook, perhaps focused around social gaming. I do hope that this time, they’re not letting engineers design the product, because Facebook could really use some open competition.
]]>However, energy usage isn’t the whole story. According to studies the DST shift will cause around 17% more car accidents on Monday, with a smaller increase in the fall because it’s easier to sleep in than to get up early.
DST doesn’t benefit farmers either, in spite of what you might have heard. They lobbied against its introduction. Cattle don’t care what the clocks say, so DST just means the farmer has to get up and milk them at a different clock time.
For IT people, of course, DST is more annoying. It means checking every system that keeps time, to verify that it made the change on the correct date, and adjusting systems that don’t adjust themselves. That’s assuming your government doesn’t decide to change the date of the DST switchover, like the Bush administration decided to, causing hundreds of millions of dollars to be spent patching embedded systems, servers and desktops. (Since Java, Linux and IBM Lotus Domino all use their own independent time zone rule databases, I had to apply three separate fixes to some servers.)
Even non-IT people are starting to catch on to the fact that DST is an annoying waste of time and energy. Can we get rid of this stupid practice, please?
]]>Apparently, a firmware bug in the real time clock in some models of PS3 lead the RTC to think that 2010 is a leap year. This confused the PS3’s OS, which was then unable to connect to PlayStation Network. This, in turn, caused havok with all kinds of games, even single player games with no network play, because PS3 trophy support requires network connectivity. It also disabled all DRM-protected content. Sony advised people not to use their PS3 until engineers could find a fix, as content might get corrupted. In the event, the ‘fix’ was to wait until the next day, when the RTC moved to a valid (but incorrect) date, and the OS would once again recognize it.
I happened to be using my PS3 when the date hit March 1st 2010 UTC (or February 29th 2010 PS3 time, you might say). I was playing “inFAMOUS”, a game by Sucker Punch. I got an error message saying I’d been logged out of PSN; a little while later I got a trophy sync error. I finished the game that evening, so I qualified for more trophies, but of course I won’t have gotten them. I didn’t particularly care about the trophies, and PSN failures aren’t exactly unheard of, so I thought no more of the issue.
Not all games are as well-written as Sucker Punch’s, however. Many people reported that “Heavy Rain” crashed on startup; apparently it assumes that if the network is enabled, the PSN trophy server must be reachable. “Flower” also died.
There are a few observations I’d make about this whole situation:
Firstly, yes, it’s a stupid error to find in code. However, if you read RISKS Digest you’ll see the same error happening in a big system every few years.
Secondly, the scale of the failure was undoubtedly due to the system brittleness required for DRM. The existence of time-limited rentals requires that the DRM system not trust the system date, and fail to authenticate if the date appears incorrect. This means that even indefinite rentals (which are dishonestly called “purchases”) ended up with copy protection errors when you tried to run them; people reported that none of their downloaded games worked.
Thirdly, this kind of slapdash QA is all too typical of the video game industry nowadays. My impression is that the quality of game software is getting worse, now that consoles are assumed to have an Internet connection. It used to be expensive to recall and replace faulty discs, so games spent a long time in QA; now, the developers will just throw a faulty game out so long as it’s more or less playable, and plan to fix it with a point release. Games even get released to consoles with sound that doesn’t work properly (“The Orange Box” for PS3, for example). Sometimes a patch eventually fixes the problems, sometimes it doesn’t.
]]>LotusScript is a bit different. It’s modeled on Microsoft’s Visual BASIC, a decision that probably seemed sensible in the pre-JavaScript early 90s when Lotus picked a scripting language for Notes. As a result, LotusScript has three different null values. This can lead to some unexpected run-time crashes.
The first null value is Empty
. This is the value of a variable of type Variant
that has not yet had a value assigned. You can’t directly assign something to be Empty, but you can do something like this:
Dim empty As Variant
x = empty ' x is now empty too
You can’t make non-Variant variables Empty; you’ll get a run-time error if you try.
The second null value is Null
. It’s also used with Variants. Whereas Empty is the default null value of a Variant, Null is the usual null value resulting from an assignment expression. That is, it’s the null value you would typically return from a function that returns a Variant, in order to indicate an error, exception or missing value to the caller. (Though as noted above, you could return Empty if you really wanted to.)
Then there’s the third null value: Nothing
. Nothing is primarily used with non-Variant variables. For example, NotesView.GetDocumentByKey will return Nothing if the document isn’t found. Also, if you declare an object variable but don’t initialize it, it is set to Nothing by default. Unlike Empty, however, you can directly assign variables to have a value of Nothing. So basically, it plays the role of both Null and Empty, but for object variables.
But while Nothing is primarily used with non-Variant types, you can also assign Notes objects to Variant variables, which means that Variants can end up being Nothing, depending on your code. Which led to my discovering a couple of interesting pitfalls.
To make life exciting, each kind of null in LotusScript has a different way of testing for it. Isnull
and Isempty
test for Null and Empty values respectively; but for Nothing, the syntax to check is expr Is Nothing
.
(Why? My guess is that when object orientation was bolted onto BASIC, the developers at Microsoft decided to provide for null object references by creating a special predefined object called Nothing, because that was the easiest thing to code. Probably nobody cared that the language already had Null and Empty. Similarly, adding a new “Is” operator was probably easier than making the existing comparison operator work on objects.)
Not only is the syntax different for testing Nothing values, the behavior is different too. While Isnull and Isempty are safe when passed Nothing, expr Is Nothing
will crash with a run-time error if the expression is Null or Empty. So if you have a Variant which may contain Nothing as well as either Empty or Null, you need to test it with Isnull and/or Isempty before testing for Nothing.
A second pitfall is that you can print Null and Empty values — they display as “Null” and "" (empty string) respectively — but a Nothing value will cause a run-time error if you try to print it.
The two pitfalls can combine, of course. If you want to print or log your Variant’s value for debugging purposes, you need to check for Nothing in order to avoid a crash when printing; however, that means you need to check for Null and Empty first, to avoid having the check for Nothing crash. So you end up with:
If Isnull(v) Or Isempty(v) Then
Print v
Elseif v Is Nothing Then
Print "v is Nothing"
Else
Print v
End If
Here’s a summary:
Null value | Applies to | Purpose | Test for it using | Outputs as |
---|---|---|---|---|
Empty |
Variants | Value of uninitialized Variant | IsEmpty(expr) |
"" |
Null |
Variants | Indicates an error, exception or missing value | Isnull(expr) or expr = Null |
“Null” |
Nothing |
Variants or object variables | Unitialized object, error, exception or missing value | expr Is Nothing |
Run-time error |
On the bright side, it could be worse, as Visual BASIC has four more null values to worry about: Missing, vbNullString, vbNull, and vbNullChar.
]]>While yum eventually takes away the pain of applying software updates — so long as you’re not in a hurry — there are all kinds of other tasks it doesn’t handle. I therefore rely on an RPM cheat sheet I’ve assembled over the years. Here’s a copy for anyone else who might find it useful.
Command | Purpose |
---|---|
rpm -e package-1.2.3-gm |
Erase named package from system |
rpm -i package-1.2.3-gm.rpm |
Install package file into system |
rpm -U package-1.2.3-gm |
Upgrade package |
rpm -F package1 package2 |
Upgrade packages to latest versions in an appropriate order |
rpm -Fvh * |
Upgrade a bunch of packages from the matching RPM files |
rpm -q package |
Returns full name and version of installed package |
rpm -qp package.rpm |
Same for a non-installed package |
rpm -qi package |
Returns misc details about package |
rpm -qpi package.rpm |
Same for a non-installed package |
rpm -qa |
List all installed packages |
rpm -qf file |
Which package put this file here? |
rpm -ql package |
List of files in an installed package |
rpm -qlp package.rpm |
Same for files in a non-installed package |
rpm -qRl package |
List dependencies of an installed package |
rpm -qR package.rpm |
List dependencies of a non-installed package |
rpm -ivh package.src.rpm |
Install a source RPM, spec goes into /usr/src/RedHat/SPECS |
rpmbuild -ba package.spec |
Build from installed source installed |
Also, to list installed software packages in order of size, with descriptions:
rpm -qa --queryformat="%10{size} %{NAME} - %{SUMMARY}n" | sort -k1,1nr
--test
to see what a command would do without actually doing it.-v
for verbose output.Here’s an example using a random database template:
bytes | file |
---|---|
9437184 | dct.ntf |
2618157 | dct.ntf.gz |
2256744 | dct.ntf.bz2 |
1806715 | dct.ntf.lz |
1784916 | dct.ntf.7z |
1775175 | dct.ntf.lzma |
Switching from bzip2 to LZMA is a bigger improvement than the switch from gzip to bzip2 was. However, notice that there are three LZMA-compressed files. The first was created using lzip; the second, using 7-zip; the third, using lzma-utils, now renamed xz or xz-utils.
I don’t like that there are competing LZMA archive formats, but since that’s the world I’m in, I decided I had to make a choice which one to use. The obvious answer is to go with the one with the best compression, but in my view 0.3% isn’t enough of a difference to make that the sole criterion. So I’ve tried my usual approach to evaluating open source software: I’ve looked at the user interface and documentation.
7-zip fails immediately because it doesn’t behave like a Unix program. It produces 10 lines of output when successfully compressing a single file, and there doesn’t seem to be any way to get it to shut up. So, it’s a two horse race.
The xz-utils site has no documentation I can find. Searching the Ubuntu documentation site locates the xz man page, however.
In contrast, lzip’s web site has a link to go straight to the user manual and tutorial. (Yes, there’s also a standard Unix man page.)
Comparing the two, I see that xz has many more options. It has all kinds of tweaks to specify how much memory it uses, tweak various internal details of the LZMA algorithm, and filter the data. None of these options are adequately explained. To quote Ted Nelson quoting Roger Gregory, “An option means the programmer didn’t have a clear idea of what the module was supposed to do.” Or as Steve Krug puts it, “Don’t make me think.”
In contrast, lzip’s user interface is much simpler, and closer to the Unix philosophy of “do one thing, and do it well”. The only two tweaks to the LZMA algorithm lzip provides are adequately explained if you know the basics of how compression algorithms tend to work, and there’s a table showing how they correspond to the compression levels -0 to -9. The only borderline gratuitous option is to split the compressed file into chunks, and that’s at least a useful one. It also gets the SI units right.
So, lzip wins by a landslide on UI and documentation.
You might be thinking I’m being superficial here; surely documentation alone isn’t a good way to evaluate software? So this time, I took a look at the source code.
Lzip’s source is around 5,680 lines of code (excluding comments), supplied with an autoconfigure script and test suite. It compiles to a 85,753 byte binary.
XZ Utils’ source is around 31,183 lines of code (excluding comments). It compiles to a 516,779 byte binary.
In XZ’s favor, its source code is much better commented (30% vs 10% comment-to-code ratio). Then again, at 6× the size, it had better be. So on balance, I think lzip still wins.
So once again, my “look at the documentation” heuristic worked. The question is, are there any good exceptions to prove the rule? That is, examples of excellent code that has terrible documentation?
Update: The GNU project decided to go with xz rather than lzip, of course, and implemented xz support rather than lzip support in GNU tar.
]]>Back in the 1990s everyone was excited about “cyberspace”. We were going to build a whole new world in virtual reality, with virtual banks, virtual shopping malls, and virtual libraries. We would drive around in virtual cars and be represented by 3D avatars that looked just like us. Even the web would be replaced with cyberspace — remember VRML?
I always thought that was a stupid idea. We already have a world that’s far higher resolution and more interactive than can possibly be experienced via a screen, or even via special goggles and exotic input devices. Rather than have massively powerful computers try to simulate a virtual world, what made more sense was for lots of small and cheap computers to become ubiquitous in the real world we already have.
Rather than a virtual library, I want a real library where the books all have RFID, and an augmented reality application can just guide me to the book I want. Rather than a virtual mall, let me search the real mall from my phone.
There’s certainly a place for virtual worlds and cutting-edge hardware; broadly speaking, that place is the video game industry, at least as far as the average person is concerned.
Ask a photographer what the best camera is, and he’ll probably tell you: it’s the camera you actually have with you. A cheap camera in your pocket is better than a $3000 SLR at home. Similarly, you’ll get more done at the coffee shop with the $300 laptop that you carry in your shoulder bag than the sleek 17″ behemoth you leave at home.
]]>Of course, that’s not why 90% of Fortune 500 companies use it. It’s one of those rather dull products that runs invisibly, year after year, processing millions of transactions reliably and actually getting the answers right. Key parts of its behavior are formally specified in Z notation. It’s the back-end software that processes your ATM transactions and airline reservations. Yet in spite of that, it seems to have a fan following — there’s an I ♥ CICS group on Facebook. If you’re wondering what you’re missing, the University of Maryland has some screenshots of their CICS system. Of course, these days you can do your CICS development using Eclipse and give your applications a web interface.
Another really old IBM product that’s still in service is IMS, a database so old that it predates the relational model, let alone SQL. Built to track all the parts required to build the Saturn V rocket for the Apollo moon missions, it’s still being used today, 41 years later. It’s the product you use if your database is 60 terabytes in size. After CICS has processed your ATM transactions, chances are at some point the movement of the money will be noted in the Federal Reserve’s massive IMS database.
Like CICS, IMS has kept up with the times, with support added for web technologies. Ironically, the fact that XML is hierarchical makes IMS a better match for XML processing than the more modern relational databases — you can translate your entire document directly into IMS fields, preserving the hierarchical element structure directly, and then perform XPath queries via IMS’s JDBC interface.
I don’t use CICS or IMS myself. I only occasionally use a 5250 emulator, mostly to perform System i administration tasks; I haven’t logged in to z/OS in over a year. The mainframe I do use runs Linux, and I like it that way — Unix is quite old enough for me. Yet there’s something oddly fascinating about these mainframe products and their continued existence; it’s almost like discovering there are trilobites still living in the back of your filing cabinet.
]]>I’ve spent a lot of my career in the computer industry working on computer-based calendar systems and organizers of various kinds. Here are a few pitfalls to be aware of.
All of our local time zones are defined as offsets from UTC. For precision, UTC is defined by atomic clocks, whereas older time standards such as GMT were based on astronomical observations. However, UTC is kept in sync with older time standards by having leap seconds added or subtracted as necessary. Whenever UTC drifts far enough from UT1 or astronomical time, a leap second is scheduled.
Lots of systems store the date and time as the number of seconds since (say) January 1st 1970. The problem is, the exact dates on which the leap seconds in our calendar will occur cannot be determined in advance, because they depend upon the subtly varying rotational speed of the earth. The earth’s rotation changed measurably after the 2004 tsunami in Asia, for example. Hence given a date and time in the future, there is no way to calculate how to represent it exactly as a number of seconds from an epoch moment.
Another way of looking at it, is that if you have any epoch dates in 2006 stored in your computer, they’re now going to be off by a second because of the leap second announced for 2005, until your date and time library is updated.
Of course, you can ignore leap seconds. But if you do, any calculation that asks for the number of seconds between two moments in time may give the wrong answer. It won’t be very wrong, just a few seconds off, but that might be wrong enough to cause a very expensive error. (See The Fractious Leap Second Debate for more on this subject.)
It’s also really difficult to get the right values of epoch time internationally, as different countries have different numbers of legally counted leap seconds.
Continuing the list:
Consider the following sequence:
Oct 31 01:55 Log in from 192.168.0.1
Oct 31 02:05 Log in from 192.168.0.2
Are these two events 10 minutes apart, or 70 minutes apart? Without the time zone information, you have no way to tell.
Of course, the right thing to do is timestamp in UTC.
On a related note:
The reason for this is simple: June 5th in Australia is not the same day as June 5th in San Jose. Specifically, June 5th in Australia is ending and everyone’s going to bed before June 5th in San Jose gets started. An event that happens on June 5th in Australia may happen on June 4th in San Jose, so you need to know the time zone of the event in order to display the date correctly to the user.
If you don’t store a time zone, then basically there’s an implicit time zone, most likely the time zone that will be in effect on the date in question, at the location where the date is being measured. For an event, that will be the location where the event is happening. Obviously, implicit time zones aren’t very helpful for end users distant from the event in question, because they have to (a) realize without prompting that the correction has to be made and the date has to be adjusted, and (b) guess what the time zone is and how it will affect the date as seen from their time zone.
In Lotus Notes, you’ll need to use a formula like this:
ost := @If(StartTime = ""; @Time(9; 0; 0); @Time(StartTime));
tm := @TimeMerge(StartDate; ost; TimeZone);
@If(StartTime = ""; @Date(tm); tm);
This treats a date with no time as being basically the same as a date starting at 9am in the appropriate time zone. The choice of 9am is arbitrary and based on conventional business hours; there’s no answer which is always precisely correct.
In other words…
Consider the following schedule:
2004-10-10 10:00 Team conference call.
2004-10-10 12:00 Lunch with Mike.
2004-10-10 13:00 Conference session 2: Designing Groupware
Suppose Mike and I fly 8 hours west. What happens? The first item on my schedule is a phone call. It happens at the same instant in universal time, no matter where I happen to be. Therefore to show me the right time, the software has to shift the displayed time by 8 hours. It might have to shift the date too, as a result.
The second item on my calendar is going to happen at noon in whatever time zone Mike and I happen to be in as we travel. We may not even know in advance what that time zone is going to be. So the right thing for the software to do is probably not to shift the time at all.
The third item is a physical event, but in this case the time and date are always going to be relative to the time zone in effect at the location where the event is taking place. So, the correct thing to do is almost certainly to leave the date and time alone.
So should the software adjust the display of the dates and times in this example? The “correct” answer is “definitely yes, maybe, and definitely not”.
One approach might be to have a flag in each event specifying whether the time and date should be corrected when the user changes time zone, or whether they should be allowed to float and effectively change. The flag could be part of the time zone field — if you specify a time zone field the date and time are adjusted, otherwise they float. I’ve never seen an electronic organizer that actually worked that way, though.
Lotus Notes does not change the time and date of an event when you change time zone. This is a common misconception. Notes actually leaves the date and time exactly the same; it is not modified in any way. An event stored and displayed as starting at instant T, is still stored and displayed as starting at instant T. However, the way instant T is indicated depends on what time zone you are in, so it looks like a different time or date, but it actually isn’t.
In contrast, the Palm OS date book does alter all the times and dates when you change time zones. If you change the time zone from San Jose to Australia, the system date and time will change, but all your events will be moved. If you set an alarm for a net radio show or a conference call, the alarm will probably happen at the wrong time. See immediately above for more explanation.
]]>Now suppose that several pages of code later, perhaps in a different piece of code entirely, you decide to use the value of the variable. At that point, you’ve forgetten that you didn’t use a StringBuffer, so you do the usual thing and call the toString() method.
Here’s some code to simulate what happens:
char []s = "Results of some computation".toCharArray();
// Lots of code later
System.out.println(s.toString());
When I compile and run the above, it outputs [C@1bc4459
, which isn’t at all what you might hope.
There are two obvious questions which occur to me. The first is to wonder if this behavior of Java is deliberate. Is it considered useful in some way, or just an accident of implementation?
The second and more important question is: How can I detect such errors before code goes into production? Eclipse doesn’t catch it, and nor does PMD.
]]>When you type in a web URL, the address gets looked up in DNS, and turned into a raw IP address. It’s a bit like looking up a telephone number. The number is what the network actually understands.
The early 1970s Internet had 256 host addresses, and an additional 16 network addresses. Those ran out pretty quickly, so in 1981 the Internet was redesigned to have the familiar 4 byte addresses in common use today; for example, 208.77.188.166.
Unfortunately, the people who redesigned the Internet in 1981 still didn’t really expect everyone to use it. They certainly didn’t expect that people would be connecting coffee machines, telephones and toasters to it. Our house has 10 devices that connect to the Internet, and we’re not particularly gadget-obsessed.
That’s why your home Internet access is probably via a wireless router. The router uses NAT to make everything in your home appear as a single device with a single address, at least as far as the rest of the Internet is concerned. The downside is, you end up with all kinds of problems with incoming connections, because the router has to know where to send each packet of incoming data. The shortage of IP addresses is also why most ISPs make you pay extra if you want a static IP.
So if you’ve ever had trouble chatting via IM, if you’ve ever had trouble with video chat or voice over IP, if you’ve ever had difficulties sharing files, if you’ve ever been unable to set up a web site or FTP server on your home computer — it’s all indirectly because of the IPv4 address shortage.
There’s a solution: IPv6. It expands the address space from 4×8 = 32 bits to 128 bits. That’s over 4 quadrillion IP addresses for each star system in the observable universe. If we assume they’re all like the solar system, and that every planet has an earth-like population, that’s 67,000 IP addresses per person. Not that the address space is being carved out that way, but it illustrates that there’s plenty of room for growth.
As well as solving the address shortage problem and getting rid of NAT issues, there are a lot of other technical reasons why IPv6 is better than IPv4, and it was standardized 10 years ago. So why aren’t we using it?
One problem is that because America built the Internet, America also got in early and hogged all the IPv4 addresses. So while the addresses might be running out, most US corporations aren’t really feeling the pain yet. This is why most of the ISPs offering IPv6 are in Asia.
A bigger issue is that the IPv6 and IPv4 Internets were initially disconnected. That is, if you had an IPv6 connection, you couldn’t reach IPv4 hosts, and vice versa. That problem has only recently been solved in a painless manner.
The first part of the solution is 6to4. It provides a standard way to take an IPv4 address, like the address issued by your ISP to your cable modem, and convert it to IPv6. Your router can take IPv6 traffic from your network, wrap it up inside IPv4 packets — setting the IPv6 it calculated as the reply-to address — and send the data to a 6to4 gateway. The gateway is a system which has both an IPv4 and an IPv6 connection. It unpacks your data and sends it over the real IPv6 Internet. The IPv6 Internet can then reply by stuffing IPv6 data inside IPv4 packets and sending them to your router. Your router unpacks the data and sends it across your home network as real IPv6.
The net effect of all this is that with the right router, it can look as if every machine on your home network is connected directly to the IPv6 Internet, even if your ISP only offers an IPv4 connection.
RFC 3068 is the second piece of the solution. It defines a magic IP address, 192.88.99.1, which is automatically routed to your closest 6to4 gateway.
So, if you can ping 192.88.99.1, you can have IPv6 right now — as long as you have a router that understands IPv6 and 6to4.
Once again, Apple are leading the industry. While big names like Linksys haven’t bothered to implement 6to4 in home routers yet, Apple’s Airport Extreme (and Time Capsule) make it simple.
You might need to take your computer off the network then reconnect in order to force it to notice, but then you should find that your computer suddenly has an IPv6 address that starts with “2002:”, the standard prefix for 6to4. If that’s the case, you should then be able to browse IPv6-only web sites like ipv6.google.com. You can also go to www.ipv6.org to verify your IPv6 address; and if you go there from different machines on your home network, you should see that each of your machines now has a distinct address.
]]>Q: What does it mean when a method I’m calling declares that its parameters are final?
A: To you, nothing. It’s a safety feature for the author of the method.
Q: OK, so when should I declare method parameters as final?
A: Absolutely always.
Q: Isn’t that just annoying busywork?
A: Pretty much so, yes. You must be new to Java.
And now, the lengthy discussion.
Way back in the days when the closest thing to a decent programming language available to me was C or Pascal, I disliked functions with side effects. At the time they mostly seemed to me to be a nasty hack to get around the fact that there was no easy syntax for returning multiple values.
Today, I’m even less of a functional programmer than I was in college; but I still see functions with side effects on their arguments as evil. Sometimes a necessary evil for performance reasons, but evil nevertheless. (Obviously side effects on an object by its own methods aren’t anywhere near as big of a problem, though I like that Ruby distinguishes them from side-effect-free methods by a special naming convention.)
So my deeply ingrained habit is to treat function/method parameters as unmodifiable, whatever the programming language. I had actually forgotten that this was a habit until I started running PMD on my Java code, and promptly had it warn me that every single method parameter ought to be declared final
. This in turn made me think about Java’s method call semantics, and how badly chosen and useless the word “final” is on method parameters.
In C and C++, the const
keyword is used all the time when library functions accept string parameters, and provides the caller with a guarantee that the function will not modify the string. For example:
int strcmp(const char *s1, const char *s2);
You might naively expect that in Java, final
performs a similar function — but it doesn’t really.
First off, in Java all Strings are immutable. So whether arguments are final or not, your strings will never be modified by being passed to a method.
Secondly, in Java all objects are passed as a value consisting of a pointer to the object. The Java Thing a
is really like C++ Thing *a
, and Java’s a.method()
is really C++’s a->method()
, not a.method()
.
To look at it the other way, if you could have an actual object in a variable x, Java’s foo(x)
would really be doing a C++ foo(&x)
— but in Java, you never get to touch the actual object x.
It seems that the designers of Java wanted to make sure nobody confused their object pointers with the evil manipulable pointers of C and C++, so they decided to call them references instead. So Java books usually say that objects are stored as references, leading people to think that objects are passed by reference. They’re not, they’re passed by value, it’s just that the value is a pointer. If you’re confused again, try going back a couple of paragraphs.
Anyhow, the upshot of pass-by-value with objects represented by pointer values, is that as a caller of a Java method, the presence or absence of final
on parameters tells you absolutely nothing. Even if the parameters are all final, the method can still modify the value of any objects you pass in, unless the objects themselves are special immutable ones like the String class. Contrariwise, even if the parameters aren’t final, the method can’t make your variables end up pointing at a different object.
This is undoubtedly why JavaDoc ignores final
on parameters when generating documentation.
What about the value to the author of the method? Well, reassigning parameters is pretty much universally viewed as a bad thing to do, and Eclipse will warn you about it.
If you’re not convinced it’s unwise, consider the case where you have a constructor with a parameter ’name’ that gets set as the initial value of a field, also called ’name’. If parameters are final, there’s no way to accidentally modify the parameter when you think you’re modifying the field. Yes, Eclipse warns about parameters masking fields too, but still… more safety is good if the cost is low.
So let’s consider what would happen if we made all parameters final
by default.
There may be a rare occasion where you genuinely need to change values passed in to a method before performing some computation. Since every object in Java is manipulated as a pointer, it’s computationally trivial to introduce a temporary variable that’s a copy of the parameter, and then either change it to point to a different object or not. So instead of
void foo(Object x) {
if (x == null) { x = SOME_DEFAULT; }
...
}
you’d have to do
void foo(Object x) {
Object c = x;
if (c == null) { c = SOME_DEFAULT; }
...
}
No big problem.
How about performance? Well, it turns out that the final-ness of parameters isn’t part of the method signature. I also checked, and the compiled class files are byte-identical whether your parameters are final or not. So, no performance benefit either way.
Finally, no pun intended, the use of final
on parameters is required when using inner (nested) classes for listeners or iteration. This crops up all the time in Swing programming, and a Google search shows that it’s a common cause of intense confusion.
So I don’t see any value to making the final-ness of parameters a matter of choice for method authors either. They should all be final, always. You won’t see any significant performance change, but you’ll get a bit more safety, and you’ll be a bit less likely to encounter a weird error message when attempting to use listeners and iterators.
So the actual word final
on method parameters is just noise, providing no benefit to API users, programmers with good coding habits, or anyone using a tool like Eclipse that warns about reassigning parameter variables. The language should have made final parameters at least the default, and ideally the only option. It annoys me that I end up with my method declarations cluttered with final
, final
, final
.
Computers work in binary. They measure things in binary numbers internally, at the hardware level. While you are seeing letters and decimal numbers on this page, the computer just sees streams of 1s and 0s. The successive digits in binary (moving left) count 1s, 2s, 4s, 8s, and so on, just like the digits in a regular base 10 number are worth 1, 10, 100, 1000. Powers of 10, powers of 2.
Way back in the mists of primordial computer history—or the 60s, as we also know it—someone decided it would be a good idea to define a kilobyte as 1024 bytes, rather than 1000, because 1024 is the closest power of 2 to 1000. That is, a 10 bit binary number can count from 0 to 1023, just like a 3 digit decimal number counts from 0 to 999.
The problem is, this unusual definition of “kilo” wasn’t given its own symbol at the time; they just used the same “kilo” prefix used for the metric system. Nor was the unusual definition of “kilo” universally used, even in the computer industry. For instance, while the memory of the computer was measured in binary kilobytes, the speed was always measured in decimal kilohertz.
As the years went on, computers got more memory, and got faster, and their network connections got faster. We ended up with more and more ways to store data. And people kept on randomly choosing whether to use base 2 ‘computer’ units, or normal base 10 units, when measuring computer stuff.
Right now, CDs are measured in base 2, as per the official CD standard—your 650MB CD contains 650×1024×1024 bytes. On the other hand, DVDs are always measured in base 10 units—your 4.7GB writeable DVD has the normal 4,700,000,000 bytes.
The size of the memory in your computer is measured in base 2 units (megabytes) if it’s being used as RAM, but base 10 units if it’s being used as long term storage (SSD). The bus speed of the memory is always measured in base 10 units (megahertz), even if the quantity is base 2 units.
The file you download has a size, almost certainly reported in base 2 units by your web browser. But, you’ve guessed it, the speed of your network connection is always measured in base 10 units. A 1Mbps cable modem gives you 1,000,000 bits per second, not 1,048,576.
The hard disk in a Macintosh? Always specified in base 10 units. If you get a 40GB disk, you get 40,000,000,000 bytes. The disk space reported by Mac OS X? Always binary units. Even Apple aren’t consistent.
Let me be blunt: This is a mess.
There is no logic to it. There is no consistency to it. You can’t work out whether a given measurement is base 10 or base 2, you just have to magically know—or guess, and hope that if you’re wrong the difference isn’t too important.
There is a solution to this problem. The IEC has a set of official binary prefixes. When you want to refer to something in base 2 units, you can use the appropriate binary prefix instead of using the closest base 10 metric prefix, and your meaning will be crystal clear. (That still leaves the problem of what to do if you’re measuring one of the many computer-related things that are measured in base 10, but if we get everyone using binary prefixes it won’t be a problem any more, will it?
And that brings me to the thing I actually want to write about: knee-jerk conservatism.
It turns out that there are a lot of computer programmers who really get pissed off by the idea of having to write MiB for base-2 megabytes. “Megabytes have always been base 2, and always been written as MB”, they sneer. “Everyone knows that 1MB is 1024KB, unless you’re talking about DVDs, or reading manufacturer specs for a hard drive, or buying an SSD, and that’s just the manufacturers being stupid. Everyone knows that ‘K’ on a computer means 1024; except for speeds, where it means 1000, except for file download speeds where it means 1024, except when it’s the speed of your modem, when it’s 1000. Everyone knows that. What, are you stupid?”
I find it quite fascinating, really. Engineers generally pride themselves on consistency and clarity, yet when it comes to being consistent and clear in their use of measurements, well, you’d think you were asking them to drink decaf or something.
Change which makes things easier, more consistent, and less ambiguous is good change. It should be embraced. Clinging to confusing and inconsistent ways of working, just because it’s what you’re used to, doesn’t make you look superior—it makes you look like an ass. You’re not clinging to consistency with the past, because the past usage was not consistent. The computer industry has never been consistent in its use of units, it’s not being consistent now—but it’s time for it to start. And there’s only one way to do that.
If you measure in base 2 units, report in base 2 units using the base 2 prefixes.
If you measure in base 10 units, report in base 10 units using the base 10 prefixes.
This is not a big plot to make you report disk sizes in base 10 if you don’t want to. Go on measuring your hard disk in GiB and whining about the hard drive manufacturer conspiracy to defraud you, if you want; I don’t care. I just want you and your software to be clear, correct and unambiguous when you provide information to me. Leaving me to guess the meaning of ‘K’ and ‘G’ based on context is not good enough. It is not unambiguous; see above.
Now, get with the program already. All of you. Tell your friends. If anyone whines, point them at this article. And someone get Steve Jobs to lay down the law at Apple, their continuing inconsistency is really annoying me…
]]>