Tuesday, June 16, 2009

Shrinking rt.jar

I was reading somewhere that one of the reasons java applications are slow to load is that the large, 50 MB rt.jar file (94 MB extracted) must be read from disk. A jar file is basically a zip archive, so I wondered if I could make this smaller by recompressing it.

First I used advzip, from the AdvanceCOMP set of utilities that uses the 7zip compression algorithms. Copying rt.jar to my home directory, and recompressing it using -z4 I compressed it to quite a remarkable 24 MB, for a 52% size reduction. Surely this would take less time to load? Though I'm, not sure if it would take longer to decompress, and whether the bottleneck is reading from disk, or decompressing it. Though for a device with limited hard disk space, these 26 MB could make a difference.

For a second comparison, I extracted the jar file, and made a tar.lzma version, which came in at a lean 10 MB, though this did take about 2.6 s to decompress on my computer, so this is unlikely to be a viable option. The tar.gz version was still 17 MB (as was the tar.zip version), and only took 1 s to decompress.

For a third comparison, I remade the jar file using the command line zip utility, which itself produced a 25 MB file, so I'm starting to wonder why the version that comes with java is so big.

From what I understand of the zip format is that each individual file is compressed separately within the archive, which makes it good for this kind of thing in that only the classes that need to be decompressed have to, not the entire archive, which knocks out the tar.x files (and is also why they have better compression, the one file as a whole is compressed).

The thing I don't get though is why is this file so big? I tried replacing rt.jar in my java folder with the recompressed one, and everything seemed to work fine, but I did not get any noticeable performance improvement (probably in part due to the files being in the disk cache), but I didn't want to break anything so put the original ones back.

The last thing I'm wondering is that presumably these jar files are made by a java program, using the java libraries. Does that mean the java libraries are really bad in making jar (zip) files? And if so, surely the code can be replaced with some from say the 7-zip project, at least in OpenJDK, both being FOSS projects. Maybe it already has and it'll all be seen when java 7 comes out. Who knows?

No comments:

Post a Comment