Main Contents
May 18, 2012
Earlier this week, I wrote some data pump code to move information between DB2 and Domino, filtering and cleaning it on the way. I wrote the code using JPA 2 so that I’d have less code to write, and so that I could more easily turn it over to someone else some day and not have a lot of questions to answer.
Performance seemed really slow, so I decided to measure just how slow. I grabbed my stopwatch class, hacked it in, replaced the output database with a file, and looked at the results in seconds taken per thousand records read:
JPA 2:
81.54 s/krec
77.82 s/krec
81.89 s/krec
89.60 s/krec
Taking 81 seconds per thousand records might seem pretty good, but it meant my data pump was going to take over 5 hours each night. And that assumed multiple threads so that the writing part of the problem didn’t slow it down any further.
At this point, I could have done the Rails hipster thing, and said “Oh dear, relational databases are slow and useless, let’s use NoSQL”, but instead I decided to investigate further. To find out how much of the overhead was down to JPA 2, I hacked together some raw JDBC code to compare:
JDBC:
0.42 s/krec
0.42 s/krec
0.42 s/krec
0.42 s/krec
So, about 200x faster. Not looking good for JPA 2 and EclipseLink. I turned up the logging level, and took a look at the actual SQL being executed.
The actual column retrieval part of the query looked fine; exactly how I’d written it in the JDBC code, in fact. The problem is that it was wrapped in this:
SELECT * FROM (
SELECT * FROM (
SELECT EL_TEMP.*, ROWNUMBER() OVER() AS EL_ROWNM
FROM ( sensible query goes here ) AS EL_TEMP)
AS EL_TEMP2 WHERE EL_ROWNM <= ?)
AS EL_TEMP3 WHERE EL_ROWNM > ?
This was being done to chop the result table into reasonable slices. I was using a wrapper class which called JPA’s setFirstResult()/setMaxResult() to page through the results, cache them, and hand the objects back to my code. JPA was then generating the nested queries to implement this pagination. Timing the paginated query revealed that it was responsible for the slowdown.
I couldn’t eliminate paging; the dataset was just too big. So I tried altering the code to make the cache and page size much larger. That helped a bit, and allowed me to get the delay down to 4.21s/krec. Still 10x slower than raw JDBC, but 20x faster than it had been.
I looked around for some information about how to best do result pagination with DB2, and found a developerWorks article. I used the techniques from that article to put together a paged query equivalent to the one EclipseLink was generating, ran them both through DB2′s command-line performance measurement tool, and verified that both gave the same output.
The EclipseLink query was 3x faster than the new one.
Next, I decided to see how fast the DB2 command line client could dump the data. It was only slightly faster than the raw JDBC; 0.399 s/krec rather than 0.42.
I reached several conclusions.
Firstly, EclipseLink is pretty good from a performance point of view. Given the problem of issuing paginated queries against a database, chances are EclipseLink will do better than a random non-expert programmer writing raw SQL.
However, that might not be good enough. Pagination can be expensive, so if you’re batch-processing data you might be much better off avoiding ORM entirely. So like most things in software development, ORM is not always the right answer.
I’m also disappointed that there’s still no good way to use JPA to get your query results in Iterator form, so that pagination becomes unnecessary.
Filed under: Java |
Comments (0)
May 10, 2012
A common style guideline in JavaScript is to use a single var declaration at the top of your functions. This is because even if you use multiple var declarations scattered throughout the code, JavaScript’s scoping rules mean that the variables are actually defined at the top of the function that contains them — so why not make your code reflect reality?
So you end up with functions that look like this:
var foo = function (x) {
var y = document.getElementById(x),
z = y.checked,
s = y.style;
...
}
Lispers will recognize that JavaScript var is just Scheme’s let*.
Recently discussion has wandered across the net arguing that this is bad practice, because if you want to re-order your variables in the var declaration, it’s too much editing grunt-work. So people have proposed that it’s better to do this:
var foo = function (x) {
var y = document.getElementById(x);
var z = y.checked;
var s = y.style;
...
}
I write in single-var style all the time, and frankly I’ve never found myself cursing at the difficulty of reordering the variables. However, for fun I decided to put together a Vim macro to handle it. Here’s my attempt:
map <C-a> :.s#^\(\s*var\s\+\)\(.*\)\([,;]\)\s*\n
\(\s*\)\(.*\)\([,;]\)\s*$#\1\5\3\r\4\2\6<C-m><C-m>
(Remove the linebreak after ‘\n’; it should be one line.)
To use it, you put the cursor anywhere on the first of the two lines you want to swap, and hit the keystroke to fire it off; Ctrl-A in the example above. It attempts to preserve whatever indentation you’re using.
So if you find yourself considering having a var explosion in your code just because you find editing too difficult, maybe a macro is in order. I assume it would be just as easy to implement something like this in Emacs.
Filed under: JavaScript |
Comments (0)
March 22, 2012
A few days ago I did some code refactoring:
| File |
Bytes before |
Bytes after |
| requestform.js |
41491 |
45884 |
| requestform-min.js |
23565 |
21058 |
What’s going on here? Well, the code is quite old, and was written using the direct assignment method of global abatement. Now that I’m more comfortable with JavaScript, I rewrote the code to use the revealing module pattern.
The direct assignment module pattern is simple to understand, but the code ended up with ‘this.’ and the name of the object everywhere. While the object name was short, repeating it many times bloated the code size.
So why did the refactored source file end up bigger? Well, I use JSLint, so I end up with a single huge var declaration for all of the object’s members. All the methods end up indented one more level when using the new module pattern. And of course, the indentation is done using spaces. For a less misleading impression, I compared the code size minus all whitespace. Even with the new pattern’s requirement to re-list all public methods, the non-whitespace parts of the code ended up around 2KB smaller, thanks to removing all the ‘this’s and object names.
Obviously all the indentation whitespace is removed during minimization, so the ‘-min’ version of the refactored code should end up no larger after refactoring. In fact, it ends up significantly smaller. The gain is better than the gain from removing all those object names. This is because I’m minifying using Google’s Closure compiler. Using the revealing module pattern, the internal names of public methods are never visible, and don’t have to match the public names. This allows the compiler to rename all the functions within the module to impenetrable single character names. In addition if you use unobtrusive JavaScript, as mentioned in the previous posting, then your callbacks don’t have to be visible. That means the compiler can rename all of those to shorter names too.
This is the additional benefit to unobtrusive JavaScript that I alluded to: while the ‘source’ JavaScript code is more verbose, the compiled/minified version will generally end up smaller.
Filed under: JavaScript, Programming |
Comments (0)
March 19, 2012
The HTML standards provide attributes on many elements, enabling you to
set them up to call JavaScript when an event occurs. For example, a text
field might use the onchange attribute to call some JavaScript to trim
whitespace from the beginning and end of the value:
<input type="text" id="field1"
name="username" onchange="trim('field1');">
However, this method of working has gradually become unpopular. Instead,
most now seem to recommend setting up all the events on a page using a
JavaScript routine; a practice called unobtrusive
JavaScript.
The thing is, none of the rationales provided for unobtrusive JavaScript
ever made any sense to me, and it wasn’t until recently that I worked
out some valid reasons for the practice.
Bogus reasons for unobtrusive JavaScript
Let me run through a few of the bogus reasons first:
“The old-fashioned method only lets you run one event.”
<a onclick="event1(); event2();">
“The old-fashioned method makes event callbacks visible in the HTML.”
You say that like it’s a bad thing. If I’m editing the HTML, I need to
know if an element has associated JavaScript functionality. Otherwise, I
might break the JavaScript by changing the element.
Hiding the link between the script and the HTML means a person looking
at the HTML has no idea it’s invoking some code somewhere else.
Congratulations, you’ve just reinvented the
COMEFROM statement.
“But we should keep behavior separate from layout.”
In that case, we should stop putting A elements in HTML, and instead
parse the DOM and insert all the links later using JavaScript. And we
should stop using features like :hover in CSS, and don’t even think
about using the new CSS 3 behavior property. Right?
“The old-fashioned method is ugly.”
Old method:
<a id="myid" onclick="event('myid');>
New method:
<a id="myid">
and elsewhere
document.getElementById('myid').addEventListener('onclick',
function () { event('myid'); });
Or the even newer way:
<a id="myid">
and elsewhere
document.getElementById('myid').addEventListener('onclick',
event);
var event = function (e) {
if (window.event && window.event.srcElement)
el = window.event.srcElement;
if (e && e.target)
el = e.target;
...
}
You’re telling me the first one is the ugly one? Really?
Face it, you’re not removing the ugliness. You’re making it worse, it’s
just that you’re moving it out of the HTML and into the JavaScript.
“OK, but if you’ve got a dozen related elements to add behavior to, the
onclick attributes get ugly.”
Sure, if you (say) want to add behavior to every image on the page,
that’s a good use of the new technique. Scan through the DOM and add
behavior to all the appropriate nodes, that’s the smart thing to do in
that case. But that doesn’t justify a blanket statement that it’s always
better to do it that way.
“But unobtrusive scripting is better for accessibility and graceful
degradation.”
How, exactly? Browsers with JavaScript turned off will ignore onclick
attributes, just like they’ll ignore your JavaScript-created event
callbacks. In both cases the page needs to be developed in such a way
that it still works without the script active. In both cases
accessibility requires using appropriate markup to associate labels with
actions. Where the callbacks get created makes no difference at all.
Good reasons for unobtrusive JavaScript
So spurious arguments dealt with, what do I think are the good reasons
for unobtrusive JavaScript?
The first good reason is scope.
When you use the HTML method, the method you call has to be globally
visible to the page. You can place all your callbacks in a single global
object to reduce the problem (MyApp.nameClick(), MyApp.validate() and so
on), but the name of the global object is then fixed by how the HTML
refers to it. If you need to use some code which uses the same global,
you’re stuck, unless you rewrite all the HTML to use a different global.
With the unobtrusive method, your callbacks don’t need a globally
visible name. In fact, they don’t need a name at all — they can be
anonymous
functions.
The second good reason is performance. Page speed is critical to user
experience,
and a difference of just 1 second in the time taken to make the page
readable can significantly increase the number of readers who bail on
your site. For this reason, modern practice is to try to defer as much
browser work as possible until after the page has been displayed.
With the old fashioned method, the browser attempts to set up all the
JavaScript callbacks while it is parsing the page. With the new method,
you can load your JavaScript file at the bottom of the page, and have it
run after the initial render is complete. As an added bonus, you can
write a single script that sets up callbacks for multiple pages, and it
will be cached browser-side — and meanwhile, all those onclick=”…”
occurrences can be removed from every HTML page load.
There are also some other performance benefits, which I’ll consider in
my next posting.
Filed under: JavaScript, Programming, Uncategorized |
Comments (0)
January 10, 2012
Problem:
When you use PuTTY to SSH to a RHEL machine, you get an “Access denied” error displayed before the prompt for your password.
Solution:
Edit /etc/ssh/sshd_config and set GSSAPIAuthentication no
Then /etc/init.d/sshd restart
Filed under: Linux |
Comments (0)
December 14, 2011
Sometimes the hardest part of improving your productivity is being able to notice that you’re doing something sub-optimal multiple times each day.
I was sitting hacking on some documents this morning when I realized that I frequently follow this usage pattern:
- Locate a file, using the locate command—often a Linux configuration file of some sort.
- Change directory to the directory containing that file, which is often a long way from the root directory or my current directory.
- Do some stuff using the file.
- Go back to what I was doing before.
A typical interaction:
w510:~/WIP 740$ locate s-pre-01.tex
/usr/local/context/tex/texmf-context/tex/context/base/s-pre-01.tex
w510:~/WIP 741$ cd /usr/local/context/tex/texmf-context/tex/context/base
w510:/usr/local/context/tex/texmf-context/tex/context/base 742$ cp s-pre-01.tex ~/WIP/
w510:/usr/local/context/tex/texmf-context/tex/context/base 743$ cd ~/WIP/
w510:~/WIP 744$
The problem here is that long path. I either have to type it with assistance from tab completion, or copy and paste it using the mouse.
I realized it would be really handy if I could do something like cd `locate s-pre-01.tex`
Of course, that doesn’t work for a couple of reasons, most notably that locate outputs a path to a file, not to a directory.
I checked to see if locate had an option to output only the directory name of the match, or if cd had an option to accept a filename and move to the same directory as the file. No on both counts.
Next, I checked to see if something like cdargs would solve the problem, but it seemed not.
I had the feeling a lot of other people had probably wanted to do what I wanted to do, so my next stop was Google. That turned up unhelpful monstrosities like cd "$(dirname "$(find / -type f -name ls | head -1)")"
Due diligence done, it looked like I wasn’t about to reinvent the wheel. I hacked together a couple of bash shell functions. My new improved interaction:
w510:~/WIP 757$ cdlocate s-pre-01.tex
w510:/usr/local/context/tex/texmf-context/tex/context/base 758$ cp s-pre-01.tex ~/WIP/
w510:/usr/local/context/tex/texmf-context/tex/context/base 759$ cd -
/home/meta/WIP
w510:~/WIP 760$
Note the use of cd -, an undocumented feature of bash which returns to the previous working directory. (It’s undocumented in help cd, at any rate.)
Filed under: Linux, System administration |
Comments (0)
October 27, 2011
Via careful use of nroff directives and comments, it’s possible to make a Ruby script that works both as a command line utility you can run, and as a man page. It’s a floor wax and a dessert topping.
This is handy, as it means you can distribute both the utility and the documentation in a single file, and symlink the script into the appropriate man page.
You can also, of course, look for the standard --help command line option, and make your utility call man with its own source file as argument to provide the help.
Note that if your utility makes extensive use of command line options, you’ll probably want to use GetoptLong.
Filed under: Ruby |
Comments (0)
October 25, 2011
Things that make me happy about Notes 8.5.3:
- I can export my database designs to plain files, and put them in Git for version control.
- I can grep all my source code with ack.
- The built-in editor will let me see all the code for a design object in one window.
- Hover-over display of documentation comments for functions.
- Eclipse-style error summary.
- Real HTML and XML generation capabilities.
- HTML, ECMAScript and CSS in the Notes client.
- Server and client side JavaScript. I look forward to the day when I never have to write any LotusScript ever again.
Things that make me unhappy about Notes 8.5.3:
- Still no Linux design client.
- Still no Mac design client.
- No way to display bookmarks as icons in a grid. (And no, the old legacy Workspace is not an adequate substitute.)
- I can’t believe they rewrote it from the ground up and didn’t make it fully multithreaded.
- Errors are often reported on the wrong line number. That is, server-side error messages give a line number that may be 3 or 4 lines off from the actual place where the error occurred.
- LotusScript code editor no longer autocompletes the end of a block when you open a block, so now I need to remember which keywords are used to end each different kind of block.
- LotusScript code editor doesn’t clean up indentation automatically.
Filed under: Domino |
Comments (0)
October 21, 2011
One of the big new features in IBM Lotus Notes 8.5.3—specifically, in the Domino Designer—is support for external version control tools. To me, this is the single best thing ever to happen to Notes and Domino. Here’s how to use it.
Start off by opening your database in Domino Designer. Right-click the icon, choose Team Development
from the popup, then choose Set Up Source Control for this Application
.
Go through the wizard which appears. Give your project a name, and set the location on disk where files should be placed. (If you override the default, the location you choose should be a directory where you want the files to go, so you’ll probably want to create one.)
Now the magic happens. Notes writes out the entire design of the database, as a set of files neatly arranged in the following subdirectories under the project directory:
WebContent
Forms
Pages
Resources
Resources/Files
Resources/StyleSheets
Resources/Images
SharedElements
SharedElements/Columns
SharedElements/Subforms
SharedElements/Fields
SharedElements/Outlines
Views
Code
Code/ScriptLibraries
Code/Actions
Code/Agents
Code/WebServices
AppProperties
Framesets
Now you can open up the project directory that contains those folders, and stick the whole thing in version control. For example:
$ cd ~/workspace/MyProject
$ git init
$ git add *
$ git commit -m 'Initial import'
Congratulations, your Domino database is now under version control. You can tag releases, examine changes, fork and merge, and do all the other great stuff version control allows. I run RedMine at work, and push each bug fix as a commit to a repository on the machine. I put the number of the ticket in the commit message, and RedMine automatically cross references everything and makes the changes browsable.
Browsable? Yup, because most of the files Domino generates are XML, in Domino XML (DXL) format. There are still a few binary blobs, but I hear that these will gradually become DXL too. So changes to views, form fields, agents and the like can sensibly be examined in version control. And of course, all your LotusScript, ECMAScript, CSS and HTML resources are there in plain text.
This also means you can grep all your source code. Or even better, use ack, which by default will skip the GIFs, JPEGs and other binaries.
Behind the scenes, the Domino Designer watches the directories and the NSF file, and synchronizes changes—bidirectionally. Change something in the Design client, and it gets quietly updated on disk. You can edit a style sheet file in $PROJECT/Resources/StyleSheets using Vim; when you save it, Notes notices and updates the NSF. There’s a manual sync option for when you want to be absolutely certain everything’s in sync (say, before a commit), but for general development everything seems to work automagically.
This is the best thing to happen in the world of Notes development since… since ever. All I want now is a non-Windows Domino Designer and I’ll be happy.
Filed under: Domino |
Comments (0)
October 21, 2011
Recent versions of IBM Lotus Domino have included updates to the On-Disk Structure (ODS), the file format of Notes and Domino databases. However, if you have local replicas, they will not automatically be upgraded to the newer formats. In fact, even newly-created local replicas aren’t created in the new formats by default.
To fix this, add the following two lines to NOTES.INI while Notes isn’t running:
NSF_UpgradeODS=1
CREATE_R85_DATABASES=1
Now you can use the Admin client to compact your local replicas using copy mode, and it will upgrade them to 8.5 on-disk format, which is more space efficient and provides better performance.
I’ve no idea why this isn’t the default behavior. It made a huge difference to how snappy e-mail was for me.
Filed under: Domino |
Comments (0)