Main Contents

Thoughts on version control

September 18, 2008

I’ve used a number of version control systems, including CVS, PVCS, Subversion, and Arch. While people are scathing about CVS, I have to say that to my mind, the worst of the four is Arch. I tried using it because it had the right back-end storage, and discovered belatedly that its user interface was ghastly, with commands I found it impossible to remember how to use. Judging from its mailing list archives, Arch is dead, so I guess everyone else had the same experience I did.

Again considering those four, SVN was probably the closest to comfortable for me, and I still use it with projects like Ruby and its libraries. But for my own use, I’ve moved to something newer that I like better: Bazaar.

Bazaar was an attempt to do Arch again, but with a usable command set and the horrible file naming conventions removed. The developers clearly thought about the average user’s key requirements more carefully than the SVN developers.

For example, the first major release of SVN was designed to use versioned WebDAV for its file storage. Hands up everyone who has a versioning WebDAV server set up? Yeah, me neither. So they also gave it a dedicated server backend–but they made it store all your data in a virtual filesystem layered on top of Berkeley DB files. Those tended to become corrupt and need recovery. But hey, once they made SVN work on ordinary filesystems, it was pretty good.

No such stupidity with bzr. All you need is an FTP or SFTP server. If you want to publish changes but not allow people to commit, you can just point an ordinary HTTP server at your bzr repository and people can use the http URL to check out the code. This means if you have cheap $5 web hosting, you can run a bzr repository.

The commands are simple too. Suppose you’ve started a Java project in Eclipse and want to add it to bzr and publish it on a server somewhere.

$ cd ~/eclipse/workspace/MyProject
$ bzr init
$ bzr add src/**
added src
added src/com
added src/com/example
added src/com/example/myproject
added src/com/example/myproject/MyProject.java
added src/com/example/myproject/MyProjectTest.java
$ bzr push --create-prefix sftp://server.example.com/srv/bzr/projects/myproject/trunk
Created new branch.

Done. Now you can work on the code for a while, bzr commit each time you have it in a good state, and when you’re ready to publish the new revision just bzr push and it’ll use the same URL as last time.

If you later decide you’re too lazy to remember to bzr push and prefer the SVN/CVS way of working where there’s a central repository, then do:

$ bzr bind sftp://server.example.com/srv/bzr/projects/myproject/trunk

Now whenever you commit, your local copy will automatically be pushed to the server. Starting to work with someone else’s existing repository is easy too:

$ bzr get http://repo.example.com/projects/something/trunk

When you’ve made a bunch of changes you want to send to the owner for consideration, bzr send -o patches will bundle up all the necessary info into a file you can just e-mail–you don’t need a place to publish your branch. Or if you prefer the ‘auto-push’ model and are given commit access to the remote repository, you can do your initial checkout with

$ bzr checkout sftp://repo.example.com/projects/something/trunk

and then just bzr commit and bzr push your changes.

Of course, you can change your mind later and bind and unbind as you wish, or as your permissions change.

It’s also worth noting that by default, a bound branch (checkout) has all the necessary info to let you keep working if you find yourself unexpectedly without a network connection. Again, the designers of bzr obviously thought a lot about the way people work in the real world.

Meanwhile, it seems like the new hotness in version control is Git. Several open source projects I use have switched to it. I have to say, I don’t understand why.

There’s some discussion of git limbo from a Gnome developer that I think deserves reading before being tempted to use git. The idea of being able to do a partial commit easily is obviously very powerful, but it seems to me like leaving a loaded gun lying around.

I’ve worked with people who were in the habit of doing partial commits, and they were also in the habit of making the main trunk unbuildable. They’d fairly regularly commit a set of files that didn’t match any boundary of the version dependencies. This is particularly prone to happen during refactoring; it’s easy to change a library API, and forget to include one of the files that contains a call to the library when you’re checking in.

Besides, why do a partial commit anyway, when you could just turn what you think would be a good partial commit into a new branch, check that it builds and passes the unit tests, and then merge it? Isn’t that the whole point of having a VCS with fast lightweight branching?

Sure, git is fast. But bzr is faster than git 1.0, which was deemed fast enough. Meanwhile, "someone will write a GUI" is a lousy excuse for git’s horrible command line UI. The merge command should be ‘merge’, for example, not a variant of the ‘pull’ command. To me, git’s commands have Arch smell, and that’s not a good smell.

Filed under: Programming | Comments (1)

The fractious leap second debate

September 16, 2008

You might not have heard about it, but there’s a debate going on which threatens to redefine time as we measure it. I’m something of a time nerd; all the computers in our house are synchronized to atomic clocks, as are several of our regular clocks, my wristwatch, and my phone. The debate going on concerns leap seconds. To understand the importance of it, it’s necessary to understand what a leap second is.

Recording time is made difficult for us by the fact that we live on a large rotating object with high mass, in orbit around a star. We like our time measurements to correspond to the apparent observed motion of the star in our sky; in short, we like day to be light, and night to be dark. We also like to set our calendar based on the earth’s orbit around the sun, so that winter is always cold and summer is always hot.

Inconveniently, the earth does not make an exact number of rotations per year. Hence every now and again it’s necessary to have a leap year, inserting an extra day into the calendar to bring it back into sync with the earth’s orbit, so that the months don’t gradually drift against the cycle of hot and cold weather.

The problem of wanting day to be light is solved by having time zones, with different parts of the world choosing a different offset in hours so that noon is roughly when the sun is overhead.

Historically, the offsets were measured from GMT, time as measured at the Greenwich Observatory in England, calculated from the position of the sun. However, the development of atomic clocks of increasing accuracy, and telescopes of increasing power, made scientists aware of problems with this simple scheme.

(more…)

Filed under: Java | Comments (1)

Pride comes before a fall

September 8, 2008

2006-10-27:

As part of its strategy to win more trading business and new customers, the London Stock Exchange needed a scalable, reliable, high-performance stock exchange ticker plant to replace its earlier system. [...] Using the Microsoft® .NET Framework in Windows Server® 2003 and the Microsoft SQL Server™ 2000 database, the new Infolect® system has been built to achieve unprecedented levels of performance, availability, and business agility.

Benefit: One-hundred-percent reliable on high-volume trading days

Or as Microsoft headlined it:

London Stock Exchange: Achieving Record Reliability Using Windows over Linux

Contrast with 2008-09-08:

The London Stock Exchange (LSE.L: Quote, Profile, Research, Stock Buzz) suffered its worst systems failure in eight years on Monday, forcing the world’s third largest share market to suspend trading for about seven hours and infuriating its users. [...]

The Johannesburg Stock Exchange, which uses the LSE’s trading platform TradElect, also suspended trading.

Meanwhile, the New York Stock Exchange uses AIX and Linux.

I wonder how long it will take Microsoft to take down the banner ad.

Filed under: Microsoft | Comments (0)

Java, JDBC and “memory leaks”

September 5, 2008

Every time Java is discussed on Slashdot, someone says that the overheads of automatic memory management aren’t worth it because Java still has memory leaks.

After further discussion, it generally turns out that they’re not talking about memory leaks; rather, they are talking about failure to free up resources in a timely manner–resource hogging. It’s a subtle distinction. In a memory leak, the system loses track of the memory, so it never gets freed during the life of the program. In the case of Java resource hogging, the Java system is still keeping track of the resources, and will eventually free them–it just doesn’t do it soon enough.

A common situation where resource hogging occurs is JDBC, querying a SQL database from a Java application or servlet environment. The problem is, JDBC query code is surprisingly tricky to get completely correct. It’s easy to write code where an exception causes active JDBC objects to be left unclosed, leading to the application being unreliable, overloading the database server, or using more memory than it needs.

MySQL and PostgreSQL are extremely liberal in what they are prepared to accept. For example, you can generally close a connection and rely on the database to implicitly close everything else, including abandoning any uncommitted transactions. This is not the case with IBM DB2, which will actually refuse to let you close a connection unless you have cleared out everything properly. So it’s not just a resource usage issue–you can also suddenly find yourself having to do a ton of debugging when your data load increases and you need to swap out your development database engine for something more scalable.

So, it pays to get your JDBC code right the first time. To illustrate the painful construction of some hopefully correct JDBC query code, I’m going to discuss the process of writing a simple example program in Eclipse.

(more…)

Filed under: Java | Comments (0)

A wakeup call for commercial Linux

September 1, 2008

PC World is reporting on a survey showing that community-supported Linux is growing in business at the expense of RedHat and SuSE.

This doesn’t surprise me at all. I run systems using both free and commercial Linux distributions, and the free ones are consistently less of a pain to maintain.

One of the biggest factors is that RedHat and SuSE’s business model involves selling you a new set of CDs for each new release. The assumption is that you will down your servers and do a clean reinstall from CD. Actually upgrading via the package management system is an afterthought, if it’s supported at all. Even OpenSuSE doesn’t support upgrades properly–the 10.3 upgrade hosed /boot/grub/menu.lst on every system I performed it on, which is the kind of bug that should be caught by even a small amount of testing.

I see this commercial Linux requirement for CD-based reinstalls as frankly stupid. If you’re willing to do a full reinstall every year, why not stick with Windows? In fact, it was the annoyance of having to keep reinstalling and reconfigure that drove me away from IBM’s internal Linux desktop image, which is based on one of the commercial distributions.

Furthermore, there’s an issue of practicality. Even if I were willing to reinstall and reconfigure everything for the upgrade, the fact is that I’m in Austin, and the servers are in Dallas. I don’t want to have to travel for hours and spend time in a cold, dry machine room.

Reinstalls aside, there’s also a quality issue. My experience with RedHat has been consistently negative, with problems ranging from broken Unicode terminal support to broken executables hidden in library packages. Plus, of course, the running sore that is RPM, and the incredible slowness of the Band-Aid that is Yum.

A lot of the time, people run commercial Linux because they have to, because the commercial software they want to use is only supported on commercial Linux. IBM Lotus Domino Server is a classic example–it’s only supported on RedHat Enterprise Linux or SuSE Linux Enterprise Server. The same is true of Oracle. Oh, sure, you can get them to run on community-supported Linux distributions, but if you hit technical problems, there’s no support.

So right now, the commecial Linux business model is to get people to put up with a sub-standard product, in return for having it be someone else’s fault when something breaks. I don’t think that’s a viable long-term strategy. I see signs that the application vendors are getting tired of the commercial Linux value-subtract; they want customers to have a good experience in migrating to Linux, and OS maintenance issues detract strongly from that experience. Hence Oracle’s decision to maintain their own Linux, and IBM’s decision to support Ubuntu as a desktop Linux platform.

It’s time for RedHat and SuSE to put some serious effort into quality control, and ensure that their Linux offerings exceed the reliability and maintainability of the free Linux distributions. The "Oh, you need to reinstall from CD" excuse is wearing thin.

[As always, opinions are mine, not IBM's.]

Filed under: Linux | Comments (0)

Java annoyance: final parameters

August 26, 2008

First, the summary:

Q: What does it mean when a method I’m calling declares that its parameters are final?
A: To you, nothing. It’s a safety feature for the author of the method.

Q: OK, so when should I declare method parameters as final?
A: Absolutely always.

Q: Isn’t that just annoying busywork?
A: Pretty much so, yes. You must be new to Java.

And now, the lengthy discussion.

(more…)

Filed under: Annoyances | Comments (2)

Java: not all that bad these days

August 26, 2008

It took Sun a long time to get there, but as of Java 5 I began to quite like Java.

  • Proper collection classes with generics were added, with a non-painful syntax for iterating over collections. No more for (Iterator it = container.iterator (); it.hasNext (); ) { ... }, you can just do (say) for (String x : container) { ... } and get proper type safety into the bargain.
  • The startup time for well-written small to medium size programs is tolerable.
  • Java Web Start actually works, as does double-clicking .jar executables in Windows.
  • JAXP provides a standard API for XML processing, allowing me to remove dependencies on third party libraries.

However, there are still some things I don’t like about the language.

Many of the more annoying aspects of Java are reduced in intensity by using Eclipse. Although I’m now comfortable with that, I can’t help feeling that a programming language shouldn’t need a fancy IDE in order to be painless to write. Am I turning into the old guy waving a stack of punched cards and telling the young programmers to get off his lawn?

Filed under: Java | Comments (0)

The dark romance of computer science

August 22, 2008

From Amazon.com’s page for Discrete Mathematical Structures for Computer Science by Kolman and Busby:

Suggested Tags from Similar Products

harry potter (1303)
vampire (839)
stephenie meyer (812)
breaking dawn (748)
jk rowling (708)
romance (661)
fantasy (619)
edward cullen (567)
vampire romance (535)
mathematics (7)

And the review… Yeah, it’s not a book for beginning programmers.

Filed under: Uncategorized | Comments (0)

Looking for debug information in all the wrong places

August 11, 2008

Today I took some Java code which I had been running on my laptop, and tried to deploy it to the server where it belongs.

The code in question is a command-line utility designed to be run from cron. It connects to a Domino server via Web Services, connects to a DB2 server via JDBC, and then pumps data between the two via a protocol designed to try to minimize the number of updates that need to be transferred.

I deployed using Fat Jar, which is one of the most awesome Eclipse plugins around. It makes it trivially easy to take an entire working project, including all the non-standard frameworks and libraries it depends on, and bundle it into one big .jar file. This allows me to deploy things as a single jar which can be double-clicked or run using java -jar whatever.jar. Since I use the JDBC type IV driver, I don’t even need to have DB2 installed on the deployment host; just a JRE.

To make life easier, I obtain all the runtime configuration using the Java Preferences API. I store all the invocation-to-invocation state information that way too, using a Plain Old Java Object (POJO) to wrap the Preferences API and supply all the application’s preferences and persistence needs.

In the event that the preferences aren’t there, the object creates a suitable set of preferences with default values, and tells the user where to find the XML file Java stores them in before shutting down cleanly.

So, I wasn’t expecting any big problems. Copy over the .jar and .xml files, put them in the appropriate places, and run the .jar.

Unfortunately, something unexpected happened: an SSL handshake exception. Since I’ve had SSL certificate problems with Java’s JSSE before, I immediately assumed certificates were once more to blame. I copied over a trivial Java HTTPS client, and tried connecting to the top level https URL of the server–same problem.

I then spent quite a lot of time fiddling around with certificates–connecting to the web site with Firefox, exporting the chain of certs in PEM format, converting them to DER format with OpenSSL, locating the Java CA certs on the server, feeding the DER format certs to Java’s keytool to add to the default CA certs… Still no luck. I even tried dumping the list of CAs on the server and comparing it with the list on my laptop.

Next, I belatedly wondered if there was something odd about the server’s HTTPS handshaking. It was working fine with Firefox, but what about other programs? I tried connecting with curl.

By sheer luck, I tried connecting with curl from inside an SSH session to the server. It failed, with curl: (35) Unknown SSL protocol error. Trying curl -v for more debugging info didn’t help much:

 

* successfully set certificate verify locations:
*   CAfile: /usr/share/curl/curl-ca-bundle.crt
  CApath: none
* SSLv2, Client hello (1):
* Unknown SSL protocol error in connection to myserver.example.com

(Hostname changed for obvious reasons.)

It was because of this lack of useful trace output that I finally, much too late, resorted to checking the server log:

SSL handshake failure, no website found for IP address [127.0.0.2]

This format of error typically occurs when you’ve got an IP address set up for your server, say www.store.example.com, but the HTTP server only knows about www.example.com. So when you request www.store.example.com in the HTTP protocol, the server complains that there’s no website found for that address.

I stared at the error log for a while. Was it really possible that curl was issuing a Host: 127.0.0.2 header as part of the HTTP request? No, that couldn’t be it because it wasn’t getting as far as the HTTP request.

Finally, I decided to look at /etc/hosts on the server, and found:

127.0.0.1 localhost
127.0.0.2 myserver.example.com myserver

Well, that certainly explained why the HTTP server was confused. I’m still not sure I understand why the server only gets confused on loopback connections, but since fixing /etc/hosts made the problem go away all the time, I’m happy. Perhaps I’ll wake up tomorrow and it’ll all be obvious.

Right now, my hypothesis is that the entry for 127.0.0.2 is something YaST thought was a good idea, as I can’t imagine any human setting up the configuration that way.

Filed under: Domino, Java | Comments (0)

JavaScript: The Good Parts

August 9, 2008

I remember back in the 1980s paying something like £40 for a copy of Kernighan and Ritchie’s original book on "The C Programming Language". An outrageous amount for a book which was a little over 200 pages, I thought.

However, during the 1990s, something happened to books about computer programming. Like the fast food junkies of our nations, they started to get bigger and bigger, with the essential meat surrounded by more and more extraneous fat. These days, most computer books are Doorstop Books. They’re at least 4cm thick, and make a great tool for propping open a door or killing a cockroach. As a tool for learning a programming language, however, I find them not so good. I’ve come to value conciseness in technical literature. I have things to do, and a bloated book too often stands between me and my task.

I’m sure there are people who work better with something like The Complete Book of C Programming, a mammoth tome with more than 1000 pages. But K&R is still out there getting rave reviews, gently updated for ANSI C, and a quarter of the length–because when you get right down to it, ANSI C just isn’t that complicated of a language. (And for the bits that are that complicated, you need the actual standard.)

Bloat is even more apparent if you look at books on UML. When you come right down to it, UML is just a bunch of conventions for how to draw diagrams that represent how pieces of software behave (or should behave)–glorified flowcharts. UML does not require 984 pages to explain. If you’re a programmer, you can get at least 90% of what you need to know about UML from Fowler’s UML Distilled, a mere 208 pages.

O’Reilly’s publishing genius has been that they generally find a happy medium between conciseness and bloat. They publish tech books which are concise enough to be useful, but large enough to look reassuringly definitive. Some of their books are excellent, and very few of them are truly bad. However, their …in a Nutshell series has started to show worrying signs of drift, with Java in a Nutshell becoming bloated with discussions of frameworks and reproductions of the API documentation, rather than sticking to its subject matter.

So I was a little skeptical of their new book, JavaScript: The Good Parts. I gave it some very careful perusal in the bookstore before buying. But I’m happy to say that it’s as good as the Amazon reviews suggest. My one-line blurb would be: Finally, a book on JavaScript for computer scientists.

JavaScript is a much misunderstood language–deliberately so. In case you’re one of the few people who hasn’t gotten the message yet, it has absolutely nothing to do with Java. The language was originally called LiveScript, but was renamed by Netscape as a cynical marketing ploy. It may seem hard to believe now, but at the time Java was seen as really cool, the language to be learning and using. Now, ironically, the association with Java probably hurts JavaScript more than anything else.

Did you know that JavaScript is a proper object-oriented programming language with exception handling? That it supports functional programming–including anonymous functions, functions as first class objects, and currying? I didn’t for the longest time, because all I ever saw in actual use was a clunky scripting language that looked like C.

What Douglas Crockford has done in his book is set out a clean subset of JavaScript, and show how to construct essential features of a good programming language (such as modules and limited scope variables) from the features supplied by the language. Alongside this is discussion of the bad (or in some cases, awful) bits of JavaScript, the parts to avoid if you want to write clean, maintainable code.

Make no mistake, though, this is a book for software engineers. The chapter on functions was heavy going, doubtless because I haven’t done any real functional programming in years. I found myself re-reading sections multiple times, as the squirrel of recollection scurried around in the dusty attic of my memory.

Overall, though, I’d say that this book provides pretty much everything the experienced software engineer needs to learn JavaScript, the language, and start coding in it. Understanding other people’s JavaScript might be more of a challenge, and the DOM is a whole separate minefield.

In fact, I’d love to see the obvious next step: DOM: The Good Parts, a book which describes only the standard parts of the DOM, and then flags the bits that don’t work on common browsers.

And then perhaps someone who likes a real challenge can attempt to write C++: The Good Parts–with Appendices A thru E being the bad parts, the awful parts, the abysmal parts, the parts even worse than that, and the parts that make you wonder what Stroustrup was smoking.

Filed under: JavaScript | Comments (2)